Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › What is the difference between Hadoop 2.x and Hadoop 3.x?
- This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 5:27 pm #6240DataFlair TeamSpectator
What are the major differences between Hadoop 2 and Hadoop 3?
Comparison Between Hadoop 2.x vs Hadoop 3.x?
How Hadoop 2.0 is different from Hadoop 3.0? -
September 20, 2018 at 5:27 pm #6241DataFlair TeamSpectator
Hadoop 2:
• At least supports Java version 6
• fault tolerance is achieved using replication
• Limited shell scripting
• YARN timeline service Introduced
• Default port was conflicting with Linux port
• Disk balancer is used
• Doesn’t support Microsoft file systemHadoop 3:
• At least supports Java version 8
• Fault tolerance is achieved using erasure(more efficient than replication since space sharing is enabled)
• Enhance shell scripting with bug fixing
• Improved YARN timeline service with more scalability and reliability
• Optimized port range
• Intra-DataNode balancing is added, which is invoked via the HDFS disk balancer CLI
• Supports Microsoft Azure data lakeFollow the link to learn more about: Difference between Hadoop 2.x and Hadoop 3.x
-
September 20, 2018 at 5:28 pm #6243DataFlair TeamSpectator
Major differences between Hadoop 2 and Hadoop 3
Hadoop 2:
1. Hadoop 2 supports minimum Java Version 7
2. In Hadoop 2, Fault tolerance is handled through HDFS built in architecture where blocks are replicated a number of times to ensure high data availability
3. To achieve fault tolerance Hadoop 2 uses 3X replication factor
4. In Hadoop 2 due to 3X replication factor HDFS has 200% overhead in storage space.
5. For data balancing Hadoop 2 uses HDFS Disk balancer
6. Scalibility – In Hadoop 2 we can scale up to 10000 Nodes per cluster.Hadoop 3:
1. Hadoop 3 supports minimum Java Version 8
2. In Hadoop 3, Fault tolerance can be handled by erasure coding
3. In Hadoop 3 we can achieve fault tolerance just by having replication factor as 1.5X by using erasure encoding in HDFS.
4. In Hadoop 3 HDFS has only 50% overhead in storage space.
5. For data balancing Hadoop 3 uses Intra-data node balancer, which is invoked via the HDFS disk balancer Command Line Interface
6. Scalability – In Hadoop 3 we can scale above 10000 Nodes per cluster.For more detail follow: Difference between Hadoop 2.x and Hadoop 3.x
-
September 20, 2018 at 5:28 pm #6244DataFlair TeamSpectator
I would like to add few more differences, apart from the ones already mentioned above.
- Handles Single Point of Failure (SPOF) which usually occurs in the NameNode by having a secondary NameNode in place. So, whenever a NameNode fails, it recovers automatically
- MapReduce became fast due to YARN.
- Default ports were Conflicting in Linux port range. Which leads to failure in port reservation.
- The host needs to set the Heap Size for JAVA and Hadoop task.
- The compatible file systems are: HDFS (Default FS), FTP File system which stores all its data on remotely accessible FTP servers. Amazon S3 (Simple Storage Service) file system Windows Azure Storage Blobs (WASB) file system.
- Single Point of Failure (SPOF) was overcome by introducing more than 2 NameNodes
- MapReduce became faster, particularly at map output collector and shuffle jobs by 30%.
- Port range has been optimized and hence there were no conflicts.
- New methods for configuring daemon heap sizes. Notably, auto-tuning is now possible based on the memory size of the host, and the HADOOP_HEAPSIZE variable has been deprecated.
- Hadoop 3.x supports all the file systems supported by Hadoop 2.x and also the Microsoft Azure Data Lake filesystem
-
-
AuthorPosts
- You must be logged in to reply to this topic.