Hi All,
Hive for Data analysis due to it structural similarity with SQL
Pig for Data loading due to its similarity with procedure language
Swoop - Bulk movement
Flume - Aggregate streaming of data
Steps for loading data into Hadoop
Using Hadoop and Pig
from the data warehousing perspective. Below are the tasks which can be easily
done.
Important note – Pig has script like structure
which is not very friendly for quering data. Hive offers familiar sql interace
as usual relational sql. Pig scripts are like procedure language and easy for
users to code transformations
- Take data from OLTP system as dump files either Delta dump or full dump
- Transform the data using Pig scripts and save it into tables
Using Hive to Query data from the Dump file
- Take a dump file from OLTP system or load the data to external tables
- Dump the file into the HDFS system
- Now register the file with the HC Catalog , register the file as new table when doing this its same as doing flat file with datastage, field delimiters
- Hcatalog separates the schema and metadata information from the query. Without this you would need to write full names
Using Pig to transform the data using Pig
- The table information is stored in the HC catalog
- We have to write a Pig script to transform this data below is a pig script. Once the script is executed the data is stored into another table in Hadoop
A Site for Very good Tutorial on Hive and Pig
No comments:
Post a Comment