Friday 28 February 2014

Append hint to make Inserts faster in datawarehouse

APPEND -- IGNORE CONSTRAINTS
Append -- does not use buffer cache
Append - By passes UNDO , does a direct path insert
Append - Only visible after commit
Append - By passes Redo log (What is difference between Redo and Undo--Undo is for modification to indexes )
Note on UNDO -- undo is not generated for table but it would be generated for the index define on the table
Append - No one else can see any data when you are doing insert. Other are waiting in serial order
Append - If you append in a incremental load operation. Then you have to make sure there are no delete on the table. It write above high water mark. Every time you delete the high water mark does not change. So you will be wasting the blocks on which you deleted the data
Append - Writes above high water mark
Very important -- parallel is always a direct path, if you go parallel, you will be appending.


From ASKTom below

Insert /*+ APPEND */ - why it would be horrible for Oracle to make that the "default".

a) it isn't necessarily faster in general. It does a direct path load to disk - bypassing the buffer cache. There are many cases - especially with smaller sets - where the direct path load to disk would be far slower than a conventional path load into the cache.

Each parallel process will write to its own extent , they will not share extent while writing, So Extent allocation property is important


Disadvantage of Direct path load over time ( Leads to lot of waste  of space and these blocks get read daily when doing full table scan ). You need to see if the exten allocation if uniform or Auto allocate , In case of Uniform there is lot of wastage

b) a direct path load always loads above the high water mark, since it is formatting and writing blocks directly to disk - it cannot reuse any existing space. Think about this - if you direct pathed an insert of a 100 byte row that loaded say just two rows - and you did that 1,000 times, you would be using at least 1,000 blocks (never reuse any existing space) - each with two rows. Now, if you did that using a conventional path insert - you would get about 70/80 rows per block in an 8k block database. You would use about 15 blocks. Which would you prefer?

c) you cannot query a table after direct pathing into it until you commit.

d) how many people can direct path into a table at the same time? One - one and only one. It would cause all modifications to serialize. No one else could insert/update/delete or merge into this table until the transaction that direct paths commits.

Direct path inserts should be used with care, in the proper circumstances. A large load - direct path. But most of the time - conventional path

What is buffer cache and Why bypassing it speeds up the inserts using append

Data is read from the disk and is stored in area of SGA called buffer cache. Reading from buffer cache is called logical read. Buffer cache sits in RAM



Recent Addition ( 31-OCT-15)
The main blocker in parallel queries in the Query coordinator, Suppose you want to sum on 500 million rows, Then even if you run in parallel , Each query coordinator will do the sum 
and then give result to query coordinator to further sum it. 

Now consider you have grouping on 2 columns, Then it has to sort the data (group) so the 
query coordinator has lot of work to do in case of 500 million. You will see in parallel plans
p->s which means parallel to sequential 

If you really want to make use of parallel , then best to go for procedure and parallelise 
procedure using dbms_parallel . This is truly parallel in database 

( Assumption -In most of cases you will never be slowed by read speed, 1 billion rows is not a 
issue, if you are doing in TB then have to think)

Table Functions  (Very important for datawarehouse implementation and Next is dbms_parallel) 


How to make pl/sql go in parallel 



Append hint is usally combined with Parallel and No logging . still with very large data you are likely to wait event stating Query coordinator is very busy to take more input from parallel slave processes and they are waiting for QC to become free to send data 

Oracle RAC and Why it is better than individual VM


RAC stands for Real Application Cluster

In oracle RAC we have multiple instances on multiple nodes accessing the same database (disk). So we will have 2 machine whose RAM will be used as part of SGA and PGA and they will be accessing the database that is hard disk

Oracle RAC has technology called cache fusion which makes sure that cache maintained by the two separate machine is store in global area. This makes sure that we always see the current data irrespective of changes

Why RAC are better than Virtual Machines

Oracle running on VM has advantages when it comes to deployment,

A scalable environment enables you to improve performance and add capacity by adding nodes. On some platforms you can add nodes dynamically while your cluster is running. So with more nodes you have more processing power

Classic Statement -- If a application cannot scale up on single node system. It will not scale up on RAC. That is if your SQL is limited by processors and adding processor in single node system increases it then the same will hold true in RAC. Consider you are limited by I/o adding more node will not help


When you execute a statement in parallel from one node does the other node comes into play because it will also have processors

when we execute query in parallel the session which fires query become the coordinator and it generates lot of slave processes. Slave processes can be in any of the nodes. They use the node interconnect to transfter the data. Sometime the amount of data trasferred by slaves can be so large it can be itself a bottleneck . 

Distinct slowing down query

Does inner join perform better than outer join