Below is a good link
http://readwrite.com/2013/05/23/hadoop-what-it-is-and-how-it-works#awesm=~odc8IiyozQvfDZ
also can check out on you tube
apche HDFS , mapreduce , apache hbase
Below is article by Akash Mitra . I liked it
http://www.dwbiconcepts.com/data-warehousing/18-dwbi-basic-concepts/176-what-is-big-data.html
http://www.dwbiconcepts.com/data-warehousing/18-dwbi-basic-concepts/2-map-reduce.html
Vijay Thakorals Blog on Hadoop Name node and its details
http://vijayjt.blogspot.in/2013/02/hadoop-namenode-web-interface.html
Good article on Hadoop from Yahoo developers site
http://developer.yahoo.com/ hadoop/tutorial/module1.html
Mongo db tutorial videos on youtube
http://www.youtube.com/watch?v=bVRqd8mnQ6c&list=PLw2e3dFxewkIS1YjkLcdCUI5BPBg_YMwD
---------------------
GFS -- white paper
Mapreduce --
Big table -- these are google whitepaper
---------------------------------
To read 1 terabyte of data from one machine which
has 4 hard drives so 4 i/o channels each channel
has 100M/B per second speed
240000mb per minute 24gb per minute s0 240 gb
every 10 minutes so 43 or so minutes to read 1
terabyte of data
--------------------------------
Now hadoop allows you to spread that across 10
machines so to read 1 terabyte it takes 4.5
minutes
----------------------------------
name node is the master of file system
and data node is the slave
--------------------------------------
name node has metadata for data like which block
the data is broken into and where the file
resides
-------------------------------
multiple data nodes are having a copy of file so
single file is stored on multiple nodes
-------------------------
name node has a webserver which gives the
information like how many webnodes makes
theserver
---------------------------
what is the format of data in HDFS .like we
usually format our harddisk to ntfs
---------------------------------
for hadoop -- we have ext3 , ext4 ,XFS
------------------------------------
your command will depends on file system you use
like command prompt.
-----------------------------------
how does your racks work like core switches.
Each data node is kept in rack.
------------------------------------
Few thing to note about Hadoop
Hadoop can be deployed on raid configured drives.Raid is just configuring hard drives it not files system .Hadoop can be also deployed on cassandra file system
Apache Hbase -- Big table
Apache Flume -- RDBMS connect to hadoop
OOzie -- used for scheduling
HUE -- graphical interface for hadoop -- all these are offering by cloudera
PIG and Hive i am not sure that they do .
Latency -- it is the measure of time delay
Throughput -- amount of work a computer can do in given time frame.
Hadoop is designed for through put and not latency.Its like a train and not a sports car.It can pull huge data and you can notice it going only when data is really big .
http://readwrite.com/2013/05/23/hadoop-what-it-is-and-how-it-works#awesm=~odc8IiyozQvfDZ
also can check out on you tube
apche HDFS , mapreduce , apache hbase
Below is article by Akash Mitra . I liked it
http://www.dwbiconcepts.com/data-warehousing/18-dwbi-basic-concepts/176-what-is-big-data.html
http://www.dwbiconcepts.com/data-warehousing/18-dwbi-basic-concepts/2-map-reduce.html
Vijay Thakorals Blog on Hadoop Name node and its details
http://vijayjt.blogspot.in/2013/02/hadoop-namenode-web-interface.html
Good article on Hadoop from Yahoo developers site
http://developer.yahoo.com/
Mongo db tutorial videos on youtube
http://www.youtube.com/watch?v=bVRqd8mnQ6c&list=PLw2e3dFxewkIS1YjkLcdCUI5BPBg_YMwD
---------------------
GFS -- white paper
Mapreduce --
Big table -- these are google whitepaper
---------------------------------
To read 1 terabyte of data from one machine which
has 4 hard drives so 4 i/o channels each channel
has 100M/B per second speed
240000mb per minute 24gb per minute s0 240 gb
every 10 minutes so 43 or so minutes to read 1
terabyte of data
--------------------------------
Now hadoop allows you to spread that across 10
machines so to read 1 terabyte it takes 4.5
minutes
----------------------------------
name node is the master of file system
and data node is the slave
--------------------------------------
name node has metadata for data like which block
the data is broken into and where the file
resides
-------------------------------
multiple data nodes are having a copy of file so
single file is stored on multiple nodes
-------------------------
name node has a webserver which gives the
information like how many webnodes makes
theserver
---------------------------
what is the format of data in HDFS .like we
usually format our harddisk to ntfs
---------------------------------
for hadoop -- we have ext3 , ext4 ,XFS
------------------------------------
your command will depends on file system you use
like command prompt.
-----------------------------------
how does your racks work like core switches.
Each data node is kept in rack.
------------------------------------
Few thing to note about Hadoop
Hadoop can be deployed on raid configured drives.Raid is just configuring hard drives it not files system .Hadoop can be also deployed on cassandra file system
Apache Hbase -- Big table
Apache Flume -- RDBMS connect to hadoop
OOzie -- used for scheduling
HUE -- graphical interface for hadoop -- all these are offering by cloudera
PIG and Hive i am not sure that they do .
Latency -- it is the measure of time delay
Throughput -- amount of work a computer can do in given time frame.
Hadoop is designed for through put and not latency.Its like a train and not a sports car.It can pull huge data and you can notice it going only when data is really big .
No comments:
Post a Comment