Wednesday, 28 October 2015

ETL Session monitoring while loading data

Hi All,

This are sql which are handy while running ETL jobs/Procedure and monitoring what is happening in your db. Can give us good insight why our code is running slow

/* First understand how much UNDO, Tempspace, CPU you have for env */

select tablespace_name,bytes/1024/1024/1024 in_gb from dba_data_files
where tablespace_name like '%UND%'

SELECT tablespace_name,
  SUM(bytes_used)/1024/1024/1024 IN_GB,
  SUM(bytes_free)
FROM V$temp_space_header
GROUP BY tablespace_name;

show parameter cpu_count
/* important you need to multiply block size with block to get space used */
show parameter block_size

/* How big is the table you are looking at */

select blocks,BYTES/1024/1024/1024 from dba_SEGMENTS
where tableSPACE_NAME = 'TBS_VBR_REP_DAT'
AND SEGMENT_NAME = 'EBM_RESULTS'

select * from dba_SEGMENTS

/* Finding session info, once you are connected */

select * from v$session
where osuser = ' '

select sid,saddr,serial#,state,event,seconds_in_wait time_it_waiting_now,
wait_time/100 time_in_past_waited,username,server,program,type,p1,p2 from v$session
where osuser = ' '

select sid,saddr,serial#,sql_id,state,event,seconds_in_wait time_it_waiting_now,
wait_time/100 time_in_past_waited,username,server,program,type,p1,p2 from v$session
where sid = '90'

/*checking plan for that sql */
select * from table (dbms_xplan.display_cursor('4vs9fmtywqty3'))

/* WHAT IS THE EVENT YOUR SESSION MOST WAITING ON */
select sid,Event,total_waits,time_waited/100,Average_wait,max_wait/100,wait_class from V$SESSION_EVENT
where sid = '90'
order by time_waited desc

/* Temp space your session is currently using */
select * from v$tempseg_usage
where  session_num = '1031'
and session_addr = '00000000DB44C080'
/* 8k is the size of our blocks so multiplied by 8000 to make bytes */
select (sum(blocks)*8000)/1024/1024/1024 in_GB from v$tempseg_usage
where  session_num = '1031'
and session_addr = '00000000DB44C080'

/* used_ublk tells how many undo block used by your session Multiply with block size, urec is number of records */
/* Before measure undo make sure indexes are dropped , Insert will generated least undo , Update slightly more and delete max */
select
   t1.sid,
   t1.username,
   t2.xidusn,
   t2.used_urec,
   t2.used_ublk,
   t2.log_io,
   t2.phy_io,
   t2.START_TIME
from
   v$session     t1,
   v$transaction t2
where
   t1.saddr = t2.ses_addr
   and sid = 90
  

/* Simple query to directly measure undo generation */

Friday, 23 October 2015

My notes on PL/SQL


I found these 3 articles very informative and all that you need to know before using pl/sql in your project. ( If you already know sql )

These links are by Steven Feuerstein ( They are all you need if for PL/SQL if you know sql well )

A Good link on use of Cursor


A Good link on When not to use For loop


A Good link on Bulk collect and For all


Good Way to write PL/SQL

DECLARE
   c_limit PLS_INTEGER := 100;
CURSOR employees_cur
   IS
      SELECT employee_id
        FROM employees
       WHERE department_id = department_id_in;

TYPE employee_ids_t IS TABLE OF 
      employees.employee_id%TYPE;

l_employee_ids   employee_ids_t;

BEGIN
   OPEN employees_cur;
LOOP
      FETCH employees_cur
      BULK COLLECT INTO l_employee_ids
      LIMIT c_limit;
EXIT WHEN l_employee_ids.COUNT = 0;
   END LOOP;
END;

Monday, 28 September 2015

Selection of Datasource at Runtime



Each Cognos connection to a database is composed of three parts : DataSource -> Connection -> Signon. 

- Each DataSource can have 1..N Connections 
- Each Connection can have 1..N Signons 

You should be able to accomplish your goal by having 1 DataSource with 2 Connections (one connection points at your production DB, the second connection points at your day old copy). 

Then you point your single Package at the 1 DataSource. Each time you run a report associated with this Package, the user will be prompted : "Which connection would you like to use?" 



Sunday, 27 September 2015

Using Merge statement for SCD type 2

When we are in Type2
  1. Insert new rows in case of new records
  2. Update old rows in case of mismatch for value columns like State of residence for person , make end date as null
  3. Insert new records in case of changes and make end date as 31-12-9999

If we use Merge statement in oracle we get only 2 options . Insert or Update . If a new record easy to insert . If a existing record we have 2 steps
  1. End date the current record
  1. Insert the new record

Since merge executes only once per record there is no way of doing insert and update for same record . So we have to create a dataset which has 2 records in case of SCD Type2 changes. Which means we will have the same record twice , with row_type_indactor . When 1 then insert and when 2 then update .
Key
Value
1
A
2
B

Key_code
State
2
Hyd

When we join on Condition Key = Key_code   then we will get only one row

Key_code
State
2
Hyd

When we join on Condition Key < =  Key_code   then we will get only one row. This is because 2 is greater than 1 and 2 is equal to 2. So 2 rows. It looks simple but took me more than 1 hour to figure out

Key_code
State
2
Hyd
2
Hyd

Below is a good link on the subject


Monday, 7 September 2015

ETL vs ELT

ETL -- Extract transform ( in etl tool like datastage) and Load (database like oracle)

ELT - Extract  load into database and do the transformation in database works well if you have huge volumes of data Now even IBM agress to it , they have come up with Balanced optimization which pushes queries to database







Understanding Hive