Friday, 20 December 2013

Clustering Factor for Index in Oracle



Moral of the story --- Though it sound more common sense that table where data is organised such that in a index leaf block all entries point to one block in table should reduce the number of blocks read and speed up the query but it is not the case always.

Special case (Composite indexes)  --Sometime you might have a composite index and to read that composite index oracle needs more time than fetching rows from that table based on index. Index itself is also like a table so in case of composite index ( ie index on (a,b) and you are filtering on b ) where values of b are in different blocks of index. So it leads to lot of i/o.

Cases to look out for improvement by reducing clustering factor

Cases where oracle decides to go for full table scans instead of using index. You can have good improvement by reducing clustering factor.

Consider case of range scan on index that is causing a large physical i/o on the table. Which means for each row of index you are fetching data from different blocks. Usually in such cases we notice full table scans and oracle decides not to use the index
 

Practical application in datwarehouse 

Suppose you have a fact which is partition by customer id such that all data relating to a customer is in a partition and you frequently have queries which supply customer id .

You have shipment date which is a degenerate dimension that is it is not associated with any dimension . and you are filtering frequently for date in where condition so you decide to go for index on that date. Apart from index on date there are B tree indexes on other dimension sequence key in the fact. 

Now your problem is when you apply shipment date - 1 jan 2014 it does not use the index. What can be reason ??

Ans -- Its due to high cost associated with going for this index. The high cost is due to the high clustering factor that is the data is not stored in db blocks on date so data for a date is spread across the table in different blocks this leads to high i/o as you need to read all the blocks where data is and so oracle prefers full scan over using index 

Clustering Factor

Clustering factor of index determines how ordered the index is in comparison to table. Suppose index is stored in one leaf block and that block has entries for rows 1 to 10 with rowid of where they are stored. Ideally if all of those 10 are stored in the same block in table then the index would be faster as it has to fetch only 1 block

Supose all of 10 are in 10 separate blocks then index has to fetch all the 10 blocks


You want clustering factor close to number of nodes

  • The index is scanned in order.
  • The block portion of the ROWID pointed at by the current indexed valued is compared to the previous indexed value (comparing adjacent rows in the index).
  • If the ROWIDs point to different TABLE blocks, the clustering factor is incremented (this is done for the entire index).


Clustering defines how ordered the rows are in the index.  If CLUSTERING_FACTOR approaches the number of blocks in the table, the rows are ordered.  If it approaches the number of rows in the table, the rows are randomly ordered.  In such a case (clustering factor near the number of rows), it is unlikely that index entries in the same leaf block will point to
rows in the same data blocks.

Note that typically only 1 index per table will be heavily clustered (if any).  It would be extremely unlikely for 2 indexes to be very clustered.

If you want an index to be very clustered -- consider using index organized tables.  They force the rows into a specific physical location based on their index entry.

Otherwise, a rebuild of the table is the only way to get it clustered (but you really don't want to get into that habit for what will typically be of marginal overall improvement).

clustering factor is more "informational", it can be used to tell you
a) that your index will or will not be used for large range scan's
b) how organized your table is with respect to some index key values
c) that you might need to use a structure like an IOT or cluster if having the data organized by
that key is important.
So, it is just more information -- sort of like num rows and blocks, they are just numbers.


Below are few scripts that I used to check the clustering factor

  1. select clustering_factor,leaf_blocks  from user_indexes
  1. create index ind_fact_cli_pers on fact_mo_msr_trial(dim_cli_pers_seq_key)
  1. execute dbms_stats.gather_table_stats(user,'fact_mo_msr_trial')
  1. select clustering_factor,leaf_blocks from user_indexes where UPPER(index_name) like '%IND%FACT%'

The clustering factor is 689 which poses a challenge . Data in the table is not organized

Composite indexes and Clustering factor

Moral ---Sometime you might have a composite index and to read that composite index oracle needs more time than fetching rows from that table based on index. Index itself is also like a table so in case of composite index ( ie index on (a,b) and you are filtering on b ) where values of b are in different blocks of index. So it leads to lot of i/o. 

In a composite index the order of columns in that index matters . So you need to be sure which column comes first. Composite indexes are stored sorted in key order.

The basic idea of creating a two column index is that first column reduces the number of records and second column reduces the rows further.

Consider you have a composite index on emp table  on(empno,deptno)
A composite index is stored sorted in key order (by empno and then deptno).

select count(distinct deptno) from t and either of EMPNO or DEPTNO is defined as "not null" -- we may very well use the INDEX via a FAST FULL INDEX SCAN over the table (the index being a "skinny version" of the table in this case.

If you query "select empno, deptno from t where deptno = :x"  we MAY use the index again (as a skinny table). If you query "select ename, deptno from t where deptno = :x" we will NOT use the index as we would have to full scan the entire index AND then access the table anyway.  We would just full scan the table instead.

Now think of index as a table. You want the rows you access via index to close together in the index so that for a range scan of index you don’t end up doing a lot of i/o on the index 

Below is example where it takes  more i/o to read the composite index then to read the actual table

To check where is the problem you should be able to see which step has maximum i/o in plan. ( need to set setting)

 290539     TABLE ACCESS BY INDEX ROWID MFG_DIST_DLR_PROD_HISTORY (cr=1144406 r=731922 w=0
time=532734557 us)
 290539      INDEX RANGE SCAN MDDP_PK (cr=872544 r=622899 w=0 time=183584801 us)(object id 40318)
  19784     TABLE ACCESS BY INDEX ROWID PERIOD_TABLE (cr=290541 r=1 w=0 time=3097982 us)
 290539      INDEX UNIQUE SCAN PERIOD_TABLE_PK (cr=2 r=0 w=0 time=1030705 us)(object id 40381)

Cr--- logical read r -- physical read

Note that to read the composite index it is doing so much i/o which means rows in index are not ordered that is suppose the index is on (a,b) and composite index will always be ordered on first column




No comments:

Post a Comment