What is the difference between Hadoop 2.x and Hadoop 3.x?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What is the difference between Hadoop 2.x and Hadoop 3.x?

Viewing 3 reply threads
  • Author
    Posts
    • #6240
      DataFlair TeamDataFlair Team
      Spectator

      What are the major differences between Hadoop 2 and Hadoop 3?
      Comparison Between Hadoop 2.x vs Hadoop 3.x?
      How Hadoop 2.0 is different from Hadoop 3.0?

    • #6241
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop 2:
      • At least supports Java version 6
      • fault tolerance is achieved using replication
      • Limited shell scripting
      YARN timeline service Introduced
      • Default port was conflicting with Linux port
      Disk balancer is used
      • Doesn’t support Microsoft file system

      Hadoop 3:
      • At least supports Java version 8
      • Fault tolerance is achieved using erasure(more efficient than replication since space sharing is enabled)
      • Enhance shell scripting with bug fixing
      • Improved YARN timeline service with more scalability and reliability
      • Optimized port range
      • Intra-DataNode balancing is added, which is invoked via the HDFS disk balancer CLI
      • Supports Microsoft Azure data lake

      Follow the link to learn more about: Difference between Hadoop 2.x and Hadoop 3.x

    • #6243
      DataFlair TeamDataFlair Team
      Spectator

      Major differences between Hadoop 2 and Hadoop 3

      Hadoop 2:
      1. Hadoop 2 supports minimum Java Version 7
      2. In Hadoop 2, Fault tolerance is handled through HDFS built in architecture where blocks are replicated a number of times to ensure high data availability
      3. To achieve fault tolerance Hadoop 2 uses 3X replication factor
      4. In Hadoop 2 due to 3X replication factor HDFS has 200% overhead in storage space.
      5. For data balancing Hadoop 2 uses HDFS Disk balancer
      6. Scalibility – In Hadoop 2 we can scale up to 10000 Nodes per cluster.

      Hadoop 3:
      1. Hadoop 3 supports minimum Java Version 8
      2. In Hadoop 3, Fault tolerance can be handled by erasure coding
      3. In Hadoop 3 we can achieve fault tolerance just by having replication factor as 1.5X by using erasure encoding in HDFS.
      4. In Hadoop 3 HDFS has only 50% overhead in storage space.
      5. For data balancing Hadoop 3 uses Intra-data node balancer, which is invoked via the HDFS disk balancer Command Line Interface
      6. Scalability – In Hadoop 3 we can scale above 10000 Nodes per cluster.

      For more detail follow: Difference between Hadoop 2.x and Hadoop 3.x

    • #6244
      DataFlair TeamDataFlair Team
      Spectator

      I would like to add few more differences, apart from the ones already mentioned above.

      Hadoop 2:

      1. Handles Single Point of Failure (SPOF) which usually occurs in the NameNode by having a secondary NameNode in place. So, whenever a NameNode fails, it recovers automatically
      2. MapReduce became fast due to YARN.
      3. Default ports were Conflicting in Linux port range. Which leads to failure in port reservation.
      4. The host needs to set the Heap Size for JAVA and Hadoop task.
      5. The compatible file systems are: HDFS (Default FS), FTP File system which stores all its data on remotely accessible FTP servers. Amazon S3 (Simple Storage Service) file system Windows Azure Storage Blobs (WASB) file system.

      Hadoop 3:

      1. Single Point of Failure (SPOF) was overcome by introducing more than 2 NameNodes
      2. MapReduce became faster, particularly at map output collector and shuffle jobs by 30%.
      3. Port range has been optimized and hence there were no conflicts.
      4. New methods for configuring daemon heap sizes. Notably, auto-tuning is now possible based on the memory size of the host, and the HADOOP_HEAPSIZE variable has been deprecated.
      5. Hadoop 3.x supports all the file systems supported by Hadoop 2.x and also the Microsoft Azure Data Lake filesystem
Viewing 3 reply threads
  • You must be logged in to reply to this topic.