Friday 26 July 2013

Datawarehousing Fundamentals

Hi Guys,

This month July 2013 i am working on optimization of report on datawarehouse. So there are several post on query optimizatoin in July .Different techniques i tried .

We can achieve some amount of optimization by Partitining , indexing and making sure that those indexes are used correctly . Reading explain plans will help.I have some articles on reading explain plans and Oracles basics in July 2013.Now after all this you will achieve some improvement.But still you want better then it comes to design on datawarehouse .What attributes are kept in your dimension.How is your fact table structured.

I found below article by Kimball Group very helpful.I think most datawarehousing guys have read his book (Ralph Kimball) .If not do read.I found below article on basic design helpful

http://www.kimballgroup.com/2009/05/29/the-10-essential-rules-of-dimensional-modeling/

Now this is standard point.but in some cases we see a deviation

 Ensure that every fact table has an associated date dimension table.

Consider you need to store 24 months data for a particular dimension.You have a product , A product group that can be dynamically created by user like suppose he want to group ( toothpaste, biscuits , soap, tea,Milk) into highly sellable items .He creates group Sellable A. So you will have.

Sorry 24 rows till July 2011 i meant there

Now this has been model as shown below



We can combine product and product group together into single dimension . Also we can add one more time dimension to take care of months .Now look at the fact.Its a wide table because of 24 months columns.Now you want to make it a long table if you add a time dimension



Rule #5: Resolve many-to-many relationships in fact tables.
Since a fact table stores the results of a business process event, there’s inherently a many-to-many (M:M) relationship between its foreign keys, such as multiple products being sold in multiple stores on multiple days. These foreign key fields should never be null. Sometimes dimensions can take on multiple values for a single measurement event, such as the multiple diagnoses associated with a health care encounter or multiple customers with a bank account. In these cases, it’s unreasonable to resolve the many-valued dimensions directly in the fact table, as this would violate the natural grain of the measurement event. Thus, we use a many-to-many, dual-keyed bridge table in conjunction with the fact table.


Rule #6: Resolve many-to-one relationships in dimension tables.
Hierarchical, fixed-depth many-to-one (M:1) relationships between attributes are typically denormalized or collapsed into a flattened dimension table. If you’ve spent most of your career designing entity-relationship models for transaction processing systems, you’ll need to resist your instinctive tendency to normalize or snowflake a M:1 relationship into smaller subdimensions; dimension denormalization is the name of the game in dimensional modeling.
It is relatively common to have multiple M:1 relationships represented in a single dimension table. One-to-one relationships, like a unique product description associated with a product code, are also handled in a dimension table. Occasionally many-to-one relationships are resolved in the fact table, such as the case when the detailed dimension table has millions of rows and its roll-up attributes are frequently changing. However, using the fact table to resolve M:1 relationships should be done sparingly.





No comments:

Post a Comment