Sunday 14 July 2013

Oracle Database Concepts III

Hi Guys,

This is Part 3 of Oracle database concepts .For Part 1 and Part 2 use links below

http://cognossimplified.blogspot.in/2013/07/oracle-database-concepts.html

http://cognossimplified.blogspot.in/2013/07/oracle-database-concepts-ii.html

Partition in Oracle 

They are very important in datawarehouse.Especially for facts as most of the time we are playing with date ranges and partition can speed up the query 

There are two type of partitioning 

  • Single level 
  • Composite 

There are 3 types for each

  • Range
  • Hash 
  • List 

Hash is used when there is no obvious partitioning key available makes sure each partition has same amount of data 

Range to be used when you have a fact which keeps montly data so Jan 2013 , Feb2013 and so on each partition for each of the months.


what is partition pruning


Partition pruning dramatically reduces the amount of data retrieved from disk and shortens processing time, thus improving query performance and optimizing resource utilization. If you partition the index and table on different columns (with a global partitioned index), then partition pruning also eliminates index partitions even 
when the partitions of the underlying table cannot be eliminated.

Depending upon the actual SQL statement, Oracle Database may use static or dynamic pruning. Static pruning occurs at compile-time, with the information about the partitions accessed beforehand. Dynamic pruning occurs at run-time, meaning that the exact partitions to be accessed by a statement are not known beforehand. A sample scenario for static pruning is a SQL statement containing a WHERE condition with a constant literal on the partition key column. An example of dynamic pruning is the use of operators or functions in the WHERE condition.Partition pruning affects the statistics of the objects where pruning occurs and also affects the execution plan of a statement.

Below sql will give which tables in your schema are partitioned 

 select * from USER_PART_TABLES

Very good article on partition pruning 
http://www.orafaq.com/tuningguide/partition%20prune.html



create table part_trial (col1 int , col2 int )

 partition by range (col1)

 ( partition p1 values less than (1)

 )



alter table part_trial add

 partition p2 values less than (3)



While creating table you need to have a table with partition then only you can add one more partition to it. And note in the alter statement look we are not specifying the range keyword



insert into part_trial values (1,1);

insert into part_trial values (1,2);

insert into part_trial values (1,3);

insert into part_trial values (1,4);

insert into part_trial values (1,5);

insert into part_trial values (1,6);





 explain plan for

 select * from part_trial

 where col1 = 1



Notice that it accesses only the single partition which is required



-----------------------------------------------------------------------------------------------------

| Id  | Operation              | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |

-----------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT       |            |     6 |   156 |    14   (0)| 00:00:01 |       |       |

|   1 |  PARTITION RANGE SINGLE|            |     6 |   156 |    14   (0)| 00:00:01 |     2 |     2 |

|*  2 |   TABLE ACCESS FULL    | PART_TRIAL |     6 |   156 |    14   (0)| 00:00:01 |     2 |     2 |

-----------------------------------------------------------------------------------------------------



execute dbms_stats.gather_table_stats(user,'trial')



select partition_position, partition_name, num_rows  from user_tab_partitions where  table_name = 'TRIAL'



alter table TRIAL add partition

Below 7 points are taken from ASK tom question on hash partitioning


1) Hash partitions number is always in power of two, yes - 2, 4, 8, 16, 32, 64, ... and NO other numbers in between.
2) Else the data will be skewed across partitions with most of the data in the "middle" partitions and little data on the end partitions.
3)Also, a hash key should be fairly unique - lots and lots and lots of distinct values, else, you won't achieve even distribution of data
4) it is not record driven. 1,000,000 records might be 10mb, 100mb, 1tb - it is about volume.
5)What is the biggest segment you want to have?
What are you trying to accomplish with partitioning - what is your reason for using it - that will drive your partition sizes and your partitioning scheme.
6)understand WHY you are partitioning. then and only then APPLY it
7) when you APPLY it, make sure you understand the ramifications of doing so and consider each and ever index independent of every other index. eg: say this is an employee table and you hash partition into 16 partitions by empno.

Say you have an index on last_name. Say you just locally partition that index on last_name

when you query: select * from t where last_name=:x

you will do 16 index range scans - you will do 16 times the work!

you would have wanted to either

a) not partition that index
b) partition that index by range - A-K go into this partition, L-Z go into this one
c) hash partition it (if you always use equals)
 


Types of dimensions in datawarehouse
Before analysing explain plan indexes please have a look at the type of dimension implemented in your datawarehouse.

Below is a wikipedia link. I cant explain better than that


Type 1 -- No history information

Type 2 -- Has a column star date and End date

Type 3 - Adds one addition column last value , last to last value.Its number of columns will increase with each addition of history information


Difference between a index and rowid

  • Rowid is a part of table it gives the direct address of the row.
  • whereas indexes are separate from table they contain the block address of a row.
  • Oracle needs to read the full block and discard the unnecessary rows .

What are hash joins 

In HASH JOIN method of oracle, HASH TABLE will be built on one of the tables and other will be joined depending on the values in the hash table.

So, what is it? it's just a bunch of key-value pairs. The data is stored as a linked list with head nodes that group the data by the use of something called HashCode to find things faster. Something like this:

a    ->     b     ->    c
Any         Bitter      Class
Array       Bold        Count

A hash table is a table where you can store stuff by the use of a key. It is like an array but stores things differently

what are bind variables(They are generally for OLTP not Datawarehouse)

To understand bind variables, consider an application that generates thousands of SELECT statements against a table; for example:

SELECT fname, lname, pcode FROM cust WHERE id = 674;
SELECT fname, lname, pcode FROM cust WHERE id = 234;
SELECT fname, lname, pcode FROM cust WHERE id = 332;

Each time the query is submitted, Oracle first checks in the shared pool to see whether this statement has been submitted before. If it has, the execution plan that this statement previously used is retrieved, and the SQL is executed. If the statement cannot be found in the shared pool, Oracle has to go through the process of parsing the statement, working out the various execution paths and coming up with an optimal access plan before it can be executed. This process is know as a «hard parse» and for OLTP applications can actually take longer to carry out that the DML instruction itself.

When looking for a matching statement in the shared pool, only statements that exactly match the text of the statements are considered; so, if every SQL statement you submit is unique (in that the predicate changes each time, from id = 674 to id=234 and so on) then you'll never get a match, and every statement you submit will need to be hard parsed. Hard parsing is very CPU intensive, and involves obtaining latches on key shared memory areas, which whilst it might not affect a single program running against a small set of data, can bring a multi-user system to it's knees if hundreds of copies of the program are trying to hard parse statements at the same time. The extra bonus with this problem is that contention caused by hard parsing is pretty much immune to measures such as increasing available memory, numbers of processors and so on, as hard parsing statements is one thing Oracle can't do concurrently with many other operations, and it's a problem that often only comes to light when trying to scale up a development system from a single user working on subset of records to many hundreds of users working on a full data set.

The way to get Oracle to reuse the execution plans for these statements is to use bind variables. Bind variables are «substituion» variables that are used in place of literals (such as 674, 234, 332) and that have the effect of sending exactly the same SQL to Oracle every time the query is executed. For example, in our application, we would just submit

SELECT fname, lname, pcode FROM cust WHERE id = :cust_no;

and this time we would be able to reuse the execution plan every time, reducing the latch activity in the SGA, and therefore the total CPU activity, which has the effect of allowing our application to scale up to many users on a large dataset

Conclusion on Bind Variables

Lastly, it's worth bearing in mind that there are some instances where bind variables are probably not appropriate, usually where instead of your query being executed many times a second (as with OLTP systems) your query in fact actually takes several seconds, or minutes, or hours to execute - a situation you get in decision support and data warehousing. In this instance, the time taken to hard parse your query is only a small proportion of the total query execution time, and the benefit of avoiding a hard parse is probably outweighed by the reduction in important information you're making available to the query optimizer - by substituting the actual predicate with a bind variable, you're removing the ability for the optimiser to compare your value with the data distribution in the column, which might make it opt for a full table scan or an index when this isn't appropriate. Oracle 9i helps deal with this using a feature known as bind variable peeking, which allows Oracle to look at the value behind a bind variable to help choose the best execution plan.

Another potential drawback with bind variables and data warehousing queries is that the use of bind variables disallows the potential for star transformations, taking away this powerful option for efficiently joining fact and dimension tables in a star schema.

TableSpace

Oracle has tablespaces inside which tables , index , materialised views are created

I am not clear how data can be kept in different tablespaces for a partitions

Below are types of tablespaces 

Permanent tablespaces
Permanent tablespaces are used to store user data and user created objects like tables, indexes and materialized views. Sample create statements:

Temp tablespaces
Temp or temporary tablespaces are used to store data with short lifespan (transient data), for example: global temporarily tables or sort results.

Undo tablespaces
Undo tablespaces are used to store "before image" data that can be used to undo transactions. See ROLLBACK.This needs to be understood in more detail as in datawarehousing tools we have commit intervals which are related to this. You have set commit interval such that you are making maximum use of your undo tablespace.


Locking on Tables -- Have you ever noticed ORA-00054 -No wait on resource.

consider you are loading a table through ETL tool and you try to drop it you will get this error. You can also get this error while loading data when you try to drop a index. I will try to update more on locking later


How to create a dummy table

INSERT INTO
  T1
SELECT
  ROWNUM,
  DECODE(MOD(ROWNUM,5),0,NULL,ROWNUM),
  DECODE(MOD(ROWNUM,5),0,NULL,ROWNUM),
  TRUNC(SYSDATE)+TRUNC(ROWNUM/100),
  DECODE(MOD(ROWNUM,5),0,NULL,TRUNC(SYSDATE)+TRUNC(ROWNUM/100)),
  DECODE(MOD(ROWNUM,5),0,NULL,TRUNC(SYSDATE)+TRUNC(ROWNUM/100)),
  RPAD(CHR(MOD(ROWNUM-1,26)+65)||
    CHR(MOD(ROWNUM,26)+65)||
    CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65),20,'A'),
  DECODE(MOD(ROWNUM,5),0,NULL,
   RPAD(CHR(MOD(ROWNUM-1,26)+65)||
    CHR(MOD(ROWNUM,26)+65)||
    CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65),20,'A')),
  DECODE(MOD(ROWNUM,5),0,NULL,
   RPAD(CHR(MOD(ROWNUM-1,26)+65)||
    CHR(MOD(ROWNUM,26)+65)||
    CHR(MOD(ROWNUM+1,26)+65)||
    CHR(MOD(ROWNUM+2,26)+65),20,'A'))
FROM
  DUAL
CONNECT BY

  LEVEL<=200000;


CREATE TABLE dummy (USERID NUMBER,
                  USERNAME VARCHAR2(20),
                    USERPWD VARCHAR2(10),
                    EMAILID VARCHAR2(20),
                    FIRSTNAME VARCHAR2(10),
                    LASTNAME VARCHAR2(10),
                    ISACTIVE NUMBER)
                   
                    select count(*) from dummy
                   
                    commit          
                                     
                   
                    INSERT INTO dummy
   SELECT srl,
             name,
             pwd,
             LOWER(SUBSTR(name, 1, 10)) || '@abc.com',
             SUBSTR(name, 1, 10),
            SUBSTR(name, 11, 20),
             1
        FROM (
     SELECT level srl,
            dbms_random.string('U', 20) name,
            dbms_random.string('A', 10) pwd
       FROM DUAL
    CONNECT BY LEVEL <= 10000000)

Disclaimer and Citations 

The content here is taken from various sources found by googling. I have given links wherever possible.For me i dont need in detail information so i have copy pasted the basic information for my use.Also taken are comments from blogs , forum. I have added lot of information according to my understanding of subjects. If anyone finds anything objectionable please leave a comment.

No comments:

Post a Comment