Why Hadoop?

This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 3 reply threads

Author

Posts
- September 20, 2018 at 12:10 pm #4757
  
  DataFlair Team
  Spectator
  
  Why Hadoop is so popular in the industry?
  Why Hadoop is lightning fast as compared to the traditional system?
  why hadoop is used, why hadoop is needed?
- September 20, 2018 at 12:10 pm #4759
  
  DataFlair Team
  Spectator
  
  Hadoop was the best solution for storing and processing big data because:
  1. It stores huge files as they are (raw) without specifying any schema.
  2. High scalability – any number of nodes can be added at once hence enhancing performance dramatically.
  3. It’s economic so it suits the purse of anyone starting from a startup to a tech giant. Commodity hardware can be efficiently used with Hadoop.
  4. Reliable – As there is no danger of losing data even if nodes in the clusters fail, it’s highly reliable. Recovery and backup of data are automatic.
  5. Open source – No headache of licensing. Download and enjoy the power of Hadoop.
  The above and many more interesting and useful characteristics of Hadoop combined make it so popular in the industry.
  
  Hadoop is lightning fast because of data locality – move computation to data rather than moving the data, as it is easier and make processing lightning fast. The Same algorithm is available for all the nodes in the cluster to process on chunks of data stored in them. So data processing is not done on one big piece of data, but rather smaller distributed pieces thus enhancing performance by processing them distributedly.
  
  For more details, please visit: http://data-flair.training/blogs/hadoop-introduction-tutorial-quick-guide/
- September 20, 2018 at 12:10 pm #4760
  
  DataFlair Team
  Spectator
  
  Hadoop handles big data the reason is;-.
  
  1. Robust and Scalable – We can add new nodes as needed as well modify them.
  2. Affordable and Cost Effective – We do not need any special hardware for running Hadoop. We can just use commodity server.
  3. Adaptive and Flexible – Hadoop is built keeping in mind that it will handle structured and unstructured data.
  4. Highly Available and Fault Tolerant – When a node fails, the Hadoop framework automatically fails over to another node.
- September 20, 2018 at 12:10 pm #4763
  
  DataFlair Team
  Spectator
  
  Hadoop framework allows for distributed processing of large datasets across clusters of computers using simple programming models.
  Some of the advantages of using Hadoop are listed below
  1) Used for processing huge amount of data – Using hadoop we can process terabytes and petabytes of data.
  2) Storing diverse set of data – structured, unstructured, semi-structured , text, binary etc can be stored in HDFS
  3) Used for parallel data processing – With Mapreduce algorithm we can do parallel processing of large set of data at once
  4) Hadoop is open source- It can be downloaded and directly used. computer clusters build on commodity hardware can be used in hadoop
  5) Fault tolerent – Multiple copies of data is kept in HDFS in different data racks. So if any data block gets corrupted then its copy can be used.
  
  It is lighting fast as compared to traditional systems because HDFS is optimized to read and write large data blocks (64 MB in Hadoop v1 and 128MB in Hadoop v2) at a time. Modification of data cannot be done so only processing will be done which will produce new data.
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses