What is the difference between Hadoop 2.x and Hadoop 3.x?

This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 3 reply threads

Author

Posts
- September 20, 2018 at 5:27 pm #6240
  
  DataFlair Team
  Spectator
  
  What are the major differences between Hadoop 2 and Hadoop 3?
  Comparison Between Hadoop 2.x vs Hadoop 3.x?
  How Hadoop 2.0 is different from Hadoop 3.0?
- September 20, 2018 at 5:27 pm #6241
  
  DataFlair Team
  Spectator
  
  Hadoop 2:
  • At least supports Java version 6
  • fault tolerance is achieved using replication
  • Limited shell scripting
  • YARN timeline service Introduced
  • Default port was conflicting with Linux port
  • Disk balancer is used
  • Doesn’t support Microsoft file system
  
  Hadoop 3:
  • At least supports Java version 8
  • Fault tolerance is achieved using erasure(more efficient than replication since space sharing is enabled)
  • Enhance shell scripting with bug fixing
  • Improved YARN timeline service with more scalability and reliability
  • Optimized port range
  • Intra-DataNode balancing is added, which is invoked via the HDFS disk balancer CLI
  • Supports Microsoft Azure data lake
  
  Follow the link to learn more about: Difference between Hadoop 2.x and Hadoop 3.x
- September 20, 2018 at 5:28 pm #6243
  
  DataFlair Team
  Spectator
  
  Major differences between Hadoop 2 and Hadoop 3
  
  Hadoop 2:
  1. Hadoop 2 supports minimum Java Version 7
  2. In Hadoop 2, Fault tolerance is handled through HDFS built in architecture where blocks are replicated a number of times to ensure high data availability
  3. To achieve fault tolerance Hadoop 2 uses 3X replication factor
  4. In Hadoop 2 due to 3X replication factor HDFS has 200% overhead in storage space.
  5. For data balancing Hadoop 2 uses HDFS Disk balancer
  6. Scalibility – In Hadoop 2 we can scale up to 10000 Nodes per cluster.
  
  Hadoop 3:
  1. Hadoop 3 supports minimum Java Version 8
  2. In Hadoop 3, Fault tolerance can be handled by erasure coding
  3. In Hadoop 3 we can achieve fault tolerance just by having replication factor as 1.5X by using erasure encoding in HDFS.
  4. In Hadoop 3 HDFS has only 50% overhead in storage space.
  5. For data balancing Hadoop 3 uses Intra-data node balancer, which is invoked via the HDFS disk balancer Command Line Interface
  6. Scalability – In Hadoop 3 we can scale above 10000 Nodes per cluster.
  
  For more detail follow: Difference between Hadoop 2.x and Hadoop 3.x
- September 20, 2018 at 5:28 pm #6244
  DataFlair Team
  Spectator
  I would like to add few more differences, apart from the ones already mentioned above.
  
  Hadoop 2:
  1. Handles Single Point of Failure (SPOF) which usually occurs in the NameNode by having a secondary NameNode in place. So, whenever a NameNode fails, it recovers automatically
  2. MapReduce became fast due to YARN.
  3. Default ports were Conflicting in Linux port range. Which leads to failure in port reservation.
  4. The host needs to set the Heap Size for JAVA and Hadoop task.
  5. The compatible file systems are: HDFS (Default FS), FTP File system which stores all its data on remotely accessible FTP servers. Amazon S3 (Simple Storage Service) file system Windows Azure Storage Blobs (WASB) file system.
  Hadoop 3:
  1. Single Point of Failure (SPOF) was overcome by introducing more than 2 NameNodes
  2. MapReduce became faster, particularly at map output collector and shuffle jobs by 30%.
  3. Port range has been optimized and hence there were no conflicts.
  4. New methods for configuring daemon heap sizes. Notably, auto-tuning is now possible based on the memory size of the host, and the HADOOP_HEAPSIZE variable has been deprecated.
  5. Hadoop 3.x supports all the file systems supported by Hadoop 2.x and also the Microsoft Azure Data Lake filesystem
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

What is the difference between Hadoop 2.x and Hadoop 3.x?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses