APPEND -- IGNORE
CONSTRAINTS
Each parallel process will write to its own extent , they will not share extent while writing, So Extent allocation property is important
Append -- does not
use buffer cache
Append - By passes
UNDO , does a direct path insert
Append - Only
visible after commit
Append - By passes
Redo log (What is difference between Redo and Undo--Undo is for modification to
indexes )
Note on UNDO -- undo is not generated for table
but it would be generated for the index define on the table
Append - No one else
can see any data when you are doing insert. Other are waiting in serial order
Append - If you
append in a incremental load operation. Then you have to make sure there are no
delete on the table. It write above high water mark. Every time you delete the
high water mark does not change. So you will be wasting the blocks on which you
deleted the data
Append - Writes
above high water mark
Very important -- parallel is
always a direct path, if you go parallel, you will be appending.
From ASKTom below
Insert /*+ APPEND */ - why it would be horrible for
Oracle to make that the "default".
a) it isn't necessarily faster in general. It does a
direct path load to disk - bypassing the buffer cache. There are many cases -
especially with smaller sets - where the direct path load to disk would be far
slower than a conventional path load into the cache.
Each parallel process will write to its own extent , they will not share extent while writing, So Extent allocation property is important
Disadvantage of
Direct path load over time ( Leads to lot of waste of space and these blocks get read daily when
doing full table scan ). You need to see if
the exten allocation if uniform or Auto allocate , In case of Uniform there is
lot of wastage
b) a direct path load always loads above the high
water mark, since it is formatting and writing blocks directly to disk - it
cannot reuse any existing space. Think about this - if you direct pathed an
insert of a 100 byte row that loaded say just two rows - and you did that 1,000
times, you would be using at least 1,000 blocks (never reuse any existing
space) - each with two rows. Now, if you did that using a conventional path
insert - you would get about 70/80 rows per block in an 8k block database. You
would use about 15 blocks. Which would you prefer?
c) you cannot query a table after direct pathing into
it until you commit.
d) how many people can direct path into a table at the
same time? One - one and only one. It would cause all modifications to
serialize. No one else could insert/update/delete or merge into this table
until the transaction that direct paths commits.
Direct path inserts should be used with care, in the
proper circumstances. A large load - direct path. But most of the time -
conventional path
What is buffer cache and Why
bypassing it speeds up the inserts using append
Data is read from the disk and is stored in area of
SGA called buffer cache. Reading from buffer cache is called logical read.
Buffer cache sits in RAM
Recent Addition ( 31-OCT-15)
The main blocker in parallel queries in the Query coordinator, Suppose you want to sum on 500 million rows, Then even if you run in parallel , Each query coordinator will do the sum
and then give result to query coordinator to further sum it.
Now consider you have grouping on 2 columns, Then it has to sort the data (group) so the
query coordinator has lot of work to do in case of 500 million. You will see in parallel plans
p->s which means parallel to sequential
If you really want to make use of parallel , then best to go for procedure and parallelise
procedure using dbms_parallel . This is truly parallel in database
( Assumption -In most of cases you will never be slowed by read speed, 1 billion rows is not a
issue, if you are doing in TB then have to think)
Table Functions (Very important for datawarehouse implementation and Next is dbms_parallel)
How to make pl/sql go in parallel
Append hint is usally combined with Parallel and No logging . still with very large data you are likely to wait event stating Query coordinator is very busy to take more input from parallel slave processes and they are waiting for QC to become free to send data