Tuesday 1 April 2014

SQL loader does not perform Bulk load when using Pentaho

Hi Guys,

We are trying to bulk load 200 million rows into staging table from source by using Sql Loader direct path load (Bulk load). When we tried using explain plan on the sql fired by SQL loader it uses a hint called SYS_DL_CURSOR which does a conventional load.

Speeds worth to be mentioned  ---SQL loader loads 180k pers seconds which is 11 million per minute.
Db links directly can load -- 3.5 millions per minute . Loaded 180 million in 60 minutes. Some time spent on sql read

Reason -- We were using a option in Pentaho which says load on the fly. So it reads the data from sql and starts loading into oracle as rows come in. The problem is you never have enough rows available for sql loader to go for Bulk load. so it goes for conventional load.

Solution -- Use Automatic load at the end. Any option which creates a Data file on the ETL server and then SQL loader will load from this data file. This will allow sql loader to go for direct load

Note -- The hint  SYS_DL_CURSOR does not matter . Even if the explain plan shows that it is doing a conventional load. You should look into log file of Sql loader you will come to know whether it is doing direct load or conventional load. I have lot of confusion because the explain plan was showing conventional load.

Analysis which was flawed and you might even do this so just noting down. Even this did not work because it never had enough data to do direct load. 


Note - I have tried to manually create SQL loader control file and triggering it from the Linux server. Still it adds the hint

SQL loader control file (.ctl)
load data
 infile '/archive/Load_trial.txt'
 into table ABC
 fields terminated by "," optionally enclosed by '"'
( SRC,
CD ,
SRC_VAL_ID,
GG_CD ,
NMT_IND,
DENOM_IND,
NMT_XCLS_IND,
DENOM_XCLS_IND,
DENOM_XCPT_IND,
RSLT_ANS,
SR_EPSD_ID,
XCLS_RSN_CD,
VAL_TYP_CD,
CREAT_USER_ID,
CREAT_DTM DATE 'mm/dd/yy')

Command to load using SQL loader

sqlldr userid=username/password@servicename control=archive/Trial_load1.ctl log=archive/Trial_logg.log DIRECT=TRUE

 Alternate solution 

Create a procedure using db links which loads data from source schema to target schema. Procedure will perform good because you are eliminating ETL server in the middle. But you will be limited by the speed of the db link connection. DBA will not agree because DB links are a security issue as they expose the data over the network. But keep the procedure ready as contigency measure if load fails and you want urgent load 

Alternate Solution 2 

Data pump.  Pull the tables required from source schema to target schema using data pump and write procedure in target schema to load the data from these pulled source tables. Increadible speed can be achieved.  200  million records can be inserted in less than 1 hour in oracle. 
Data pump is fast and can easily give you speed of 20 million per minute when creating a dump file for import/Export 









1 comment:

  1. Your blog is in a convincing manner, thanks for sharing such an information with lots of your effort and time
    datastage online training
    datastage online training India
    datastage online training Hyderabad

    ReplyDelete