Saturday, 14 September 2013

Hadoop V/s Raid

Hi Guys

Now HDFS allow you to store data across multiple disks so that it provides redundancy in case of failure .that is you are having copy of same data in other hard disk .Second use it it allows for speed of access if one file is distributed across 2 hard disks then time it can access the same file from two disks at 100mb , 100mb speed so that access is faster

Now same can be done with RAID  most enterprise software like your db , webserver are definately using RAID.It is like hard disk connected in parallel its a way of arranging your hard disk such that there is redundancy in case of failure and speed of access if faster.

It basically distributes the file over many hard disk so that part of file are simultaneouly read in parallel .

For more information on RAID and how your office servers work read below article

http://cognossimplified.blogspot.com/2013/08/basics-about-servers.html

So which one to Use RAID which we already have OR HDFS 

Now this is a silly question and i thought over it for a long time.What you need to understand is Raid and HDFS are different .  RAID is way of arranging Hard disk for redundancy and speed .You can have your HDFS system sitting on raid configured hard disk the only thing you will do is to keep the replication to 0.Now consider HDFS is able to recover from disk failure by keepign data in separate machine but in enterprise (corporate) environment you will not use commodity hardware.So you dont need to go for replication by HDFS you already have world class hardware which is fail proof . and has the speed .You still need HDFS or cassandra or other file system on which your Hadoop Map and reduce jobs can work

So HDFS replication was designed keeping in mind the low cost commodity hardware and where people could not afford enterprise hardware

So you might think what will happen in processor fails.There is a back up server on the same rack which will take care of this .In most cases things are managed in cloud.To know more of cloud computing.

So you will not be having a physical server a virtual server and the till last level like a cooling fan failure everthing will have a backup done.So You can keep replication to 0 and use Hadoop to work on your big data that is saved on your Enterprise hardware .

Good Article on Hadoop from Yahoo developers site 

http://developer.yahoo.com/hadoop/tutorial/module1.html









No comments:

Post a Comment