Cognossimplified: Understanding Hive and Pig

Thursday, 9 October 2014

Understanding Hive and Pig

Hi All,

Hive for Data analysis due to it structural similarity with SQL

Pig for Data loading due to its similarity with procedure language

Swoop - Bulk movement

Flume - Aggregate streaming of data

Steps for loading data into Hadoop

Using Hadoop and Pig from the data warehousing perspective. Below are the tasks which can be easily done.

Important note – Pig has script like structure which is not very friendly for quering data. Hive offers familiar sql interace as usual relational sql. Pig scripts are like procedure language and easy for users to code transformations

Take data from OLTP system as dump files either Delta dump or full dump

Transform the data using Pig scripts and save it into tables

Using Hive to Query data from the Dump file

Take a dump file from OLTP system or load the data to external tables

Dump the file into the HDFS system
Now register the file with the HC Catalog , register the file as new table when doing this its same as doing flat file with datastage, field delimiters

Hcatalog separates the schema and metadata information from the query. Without this you would need to write full names

Using Pig to transform the data using Pig

The table information is stored in the HC catalog
We have to write a Pig script to transform this data below is a pig script. Once the script is executed the data is stored into another table in Hadoop

A Site for Very good Tutorial on Hive and Pig

http://hortonworks.com/tutorials/

Cognossimplified

Thursday, 9 October 2014

Understanding Hive and Pig

No comments:

Post a Comment