Data quality in datawarehouse is usually ignored .
Medical Records. All
of us are affected by the quality of data in medical records. We want the
diagnosis and the procedure to be correct and readable, both for our medical
safety and for billing and insurance purposes. A major headache in the medical
insurance industry is the correct identification of the provider physician,
clinic, or hospital. This is another variant of the customer matching
application I discussed previously
You want to find the
same customer when the customer buys a second or a third product from you.
Customer matching is a major issue in banking and healthcare, where separate
customer (or patient) encounters are often listed separately. The average bank
has great difficulty listing all of the separate accounts of a given
individual, although the reverse process of listing all the individuals in a
specific account causes no trouble
One way to implement this is by use of Quality screens
- Column Screens
- Structure screens
- Business rule screens
Column Screens
- Value must not be null , Should be of fixed length, Should not be within list of excluded values, Should confirm spell check
Structure screen
They gurantee that
entire data has come to DW, Some data has not been missed. Like each patient
information should have source of data .
Each data source present in patient is there in the data source master
table. Making sure entire structure has some and some data is not misssed
Business Rules screens
Business rules can
be applied to data to confirm that data is consistent with business rules. Like
ordered quantity cannot be less than minimum quantity. Or put a busineess rule
for checking that number of orders per month is not increased by more than 20%
. The idea is to check whether we have inserted duplicates in system may be due
to source system changes and that is causing all wrong data to show up
Example - Checking
the number of patient plans . They should not exceed by huge number than the
number of patients. If this occurs it means we might be duplicating data
somewhere. They are like sanity checks after each load
How are Errors from Column Screens Treated
Usually best idea is
to write them to log files and at the end tracking them into a fact table . But this is questionable
because we are missing some data best idea is to tag the data saying its bad
Creation of Audit tables to track each record of Fact
table ( For this the error logic has to be done real time rather than saving in
log files )
We can keep column
like completleness , validation, Version number saying which version of Screen
this record belongs to .
Points to think are
- What happens when updation are handled for the same record
well explained .Keep updating Cognos TM1 online training hyderabad
ReplyDelete