Thursday 9 October 2014

Understanding Hive and Pig


Hi All,

Hive for Data analysis due to it structural similarity with SQL 
Pig for Data loading due to its similarity with procedure language
Swoop - Bulk movement
Flume - Aggregate streaming of data 
  
Steps for loading data into Hadoop

Using Hadoop and Pig from the data warehousing perspective. Below are the tasks which can be easily done.

Important note – Pig has script like structure which is not very friendly for quering data. Hive offers familiar sql interace as usual relational sql. Pig scripts are like procedure language and easy for users to code transformations

  1. Take data from OLTP system as dump files either Delta dump or full dump
  1. Transform the data using Pig scripts and save it into tables


Using Hive to Query data from the Dump file

  1. Take  a dump file from OLTP system or load the data to external tables
  1. Dump the file into the HDFS system
  2. Now register the file with the HC Catalog , register the file as new table when doing this  its same as doing flat file with datastage, field delimiters
  1. Hcatalog separates the schema and metadata information from the query. Without this you would need to write full names

Using Pig to transform the data using Pig

  1. The table information is stored in the HC catalog
  2. We have to write a Pig script to transform this data below is  a pig script. Once the script is executed the data is stored into another table in Hadoop


 A Site for Very good Tutorial on Hive and Pig 


No comments:

Post a Comment