Wednesday 31 July 2013

What is Hadoop & BIG data

Below is a good link

http://readwrite.com/2013/05/23/hadoop-what-it-is-and-how-it-works#awesm=~odc8IiyozQvfDZ

also can check out on you tube

apche HDFS , mapreduce , apache hbase

Below is article by Akash Mitra . I liked it

http://www.dwbiconcepts.com/data-warehousing/18-dwbi-basic-concepts/176-what-is-big-data.html

http://www.dwbiconcepts.com/data-warehousing/18-dwbi-basic-concepts/2-map-reduce.html

Vijay Thakorals Blog on Hadoop Name node and its details

http://vijayjt.blogspot.in/2013/02/hadoop-namenode-web-interface.html

Good article on Hadoop from Yahoo developers site

http://developer.yahoo.com/hadoop/tutorial/module1.html

Mongo db tutorial videos on youtube

http://www.youtube.com/watch?v=bVRqd8mnQ6c&list=PLw2e3dFxewkIS1YjkLcdCUI5BPBg_YMwD






---------------------

GFS -- white paper

Mapreduce --

Big table -- these are google whitepaper

---------------------------------

To read 1 terabyte of data from one machine which

has 4 hard drives so 4 i/o channels each channel

has 100M/B per second speed

240000mb per minute 24gb per minute s0 240 gb

every 10 minutes so 43 or so minutes to read 1

terabyte of data

--------------------------------

Now hadoop allows you to spread that across 10

machines so to read 1 terabyte it takes 4.5

minutes

----------------------------------

name node is the master of file system

and data node is the slave
--------------------------------------

name node has metadata for data like which block

the data is broken into and where the file

resides

-------------------------------

multiple data nodes are having a copy of file so

single file is stored on multiple nodes

-------------------------

name node has a webserver which gives the

information like how many webnodes makes

theserver

---------------------------
what is the format of data in HDFS .like we

usually format our harddisk to ntfs

---------------------------------

for hadoop -- we have ext3 , ext4 ,XFS


------------------------------------

your command will depends on file system you use

like command prompt.

-----------------------------------

how does your racks work like core switches.

Each data node is kept in rack.

------------------------------------

Few thing to note about Hadoop

Hadoop can be deployed on raid configured drives.Raid is just configuring hard drives it not files system .Hadoop can be also deployed on cassandra file system

Apache Hbase -- Big table
Apache Flume -- RDBMS connect to hadoop
OOzie -- used for scheduling
HUE -- graphical interface for hadoop -- all these are offering by cloudera

PIG and Hive i am not sure that they do .


Latency --  it is the measure of time delay

Throughput -- amount of work a computer can do in given time frame.

Hadoop is designed for through put and not latency.Its like a train and not a sports car.It can pull huge data and you can notice it going only when data is really big .




No comments:

Post a Comment