Why Hadoop?

Viewing 3 reply threads
  • Author
    Posts
    • #4757
      DataFlair TeamDataFlair Team
      Spectator

      Why Hadoop is so popular in the industry?
      Why Hadoop is lightning fast as compared to the traditional system?
      why hadoop is used, why hadoop is needed?

    • #4759
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop was the best solution for storing and processing big data because:
      1. It stores huge files as they are (raw) without specifying any schema.
      2. High scalability – any number of nodes can be added at once hence enhancing performance dramatically.
      3. It’s economic so it suits the purse of anyone starting from a startup to a tech giant. Commodity hardware can be efficiently used with Hadoop.
      4. Reliable – As there is no danger of losing data even if nodes in the clusters fail, it’s highly reliable. Recovery and backup of data are automatic.
      5. Open source – No headache of licensing. Download and enjoy the power of Hadoop.
      The above and many more interesting and useful characteristics of Hadoop combined make it so popular in the industry.

      Hadoop is lightning fast because of data locality – move computation to data rather than moving the data, as it is easier and make processing lightning fast. The Same algorithm is available for all the nodes in the cluster to process on chunks of data stored in them. So data processing is not done on one big piece of data, but rather smaller distributed pieces thus enhancing performance by processing them distributedly.

      For more details, please visit: http://data-flair.training/blogs/hadoop-introduction-tutorial-quick-guide/

    • #4760
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop handles big data the reason is;-.

      1. Robust and Scalable – We can add new nodes as needed as well modify them.
      2. Affordable and Cost Effective – We do not need any special hardware for running Hadoop. We can just use commodity server.
      3. Adaptive and Flexible – Hadoop is built keeping in mind that it will handle structured and unstructured data.
      4. Highly Available and Fault Tolerant – When a node fails, the Hadoop framework automatically fails over to another node.

    • #4763
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop framework allows for distributed processing of large datasets across clusters of computers using simple programming models.
      Some of the advantages of using Hadoop are listed below
      1) Used for processing huge amount of data – Using hadoop we can process terabytes and petabytes of data.
      2) Storing diverse set of data – structured, unstructured, semi-structured , text, binary etc can be stored in HDFS
      3) Used for parallel data processing – With Mapreduce algorithm we can do parallel processing of large set of data at once
      4) Hadoop is open source- It can be downloaded and directly used. computer clusters build on commodity hardware can be used in hadoop
      5) Fault tolerent – Multiple copies of data is kept in HDFS in different data racks. So if any data block gets corrupted then its copy can be used.

      It is lighting fast as compared to traditional systems because HDFS is optimized to read and write large data blocks (64 MB in Hadoop v1 and 128MB in Hadoop v2) at a time. Modification of data cannot be done so only processing will be done which will produce new data.

Viewing 3 reply threads
  • You must be logged in to reply to this topic.