What are the core components of Apache Hadoop?

This topic has 3 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 3 reply threads

Author

Posts
- September 20, 2018 at 5:17 pm #6182
  
  DataFlair Team
  Spectator
  
  What are the different components of Hadoop Framework?
- September 20, 2018 at 5:17 pm #6184
  
  DataFlair Team
  Spectator
  
  Two Core Components of Hadoop are:
  
  1. HDFS: Distributed Data Storage Framework of Hadoop
  2. MapReduce : Distributed Data Processing Framework of Hadoop
  
  HDFS – is the storage unit of Hadoop, the user can store large datasets into HDFS in a distributed manner. Several replicas of the data block to be distributed across different clusters for data availability.
  HDFS consists of 2 components
  
  a) Namenode: It acts as the Master node where Metadata is stored to keep track of storage cluster (there is also secondary name node as standby Node for the main Node)
  b) Datanode: it acts as the slave node where actual blocks of data are stored
  
  MapReduce- It is the processing unit of Hadoop, it is a Java-based system where the actual data from the HDFS store gets processed.The principle of operation behind MapReduce is that the MAP job sends a query for processing data to various nodes and the REDUCE job collects all the results into a single value. Scheduling, monitoring, and re-executes the failed task is taken care by MapReduce.
  
  Along with HDFS and MapReduce, there are also Hadoop common(provides all Java libraries, utilities and necessary Java files and script to run Hadoop), Hadoop YARN(enables dynamic resource utilization )
  
  Follow the link to learn more about: Core components of Hadoop
- September 20, 2018 at 5:18 pm #6186
  
  DataFlair Team
  Spectator
  
  The core components in Hadoop are,
  
  1. HDFS (Hadoop Distributed File System)
  HDFS is the storage layer of Hadoop which provides storage of very large files across multiple machines. It was derived from Google File System(GFS). HDFS is highly fault tolerant, reliable,scalable and designed to run on low cost commodity hardwares. It divides each file into blocks and stores these blocks in multiple machine.The blocks are replicated for fault tolerance. The block size and replication factor can be specified in HDFS. The default block size and replication factor in HDFS is 64 MB and 3 respectively.
  HDFS works in Master- Slave Architecture. An HDFS cluster consists of Master nodes(Name nodes) and Slave nodes(Data odes). Name node stores metadata about HDFS and is responsible for assigning handling all the data nodes in the cluster. Before Hadoop 2 , the name node was single point of failure in HDFS Cluster. Data nodes store actual data in HDFS. They are responsible for block creation, deletion and replication of the blocks based on the request from name node.
  
  2.MapReduce
  Map Reduce is the processing layer of Hadoop. It is used to process on large volume of data in parallel. MapReduce splits large data set into independent chunks which are processed parallel by map tasks. The output of the map task is further processed by the reduce jobs to generate the output. The MapReduce works in key – value pair. It has a resource manager on aster node and NodeManager in each data node.
  
  The other components of Hadoop are,
  
  1. YARN – YARN stands for Yet Another Resource Negotiator. It is used to manage distributed systems. YARN consists of a central Resource Manager and per node Node Manager.
  
  2. HIVE- HIVE is a data warehouse infrastructure. It provides an SQL like language called HiveQL.
  
  3. PIG – Its a platform for analyzing large set of data. It uses MApReduce o execute its data processing.
  
  4. FLUME – Its used for collecting, aggregating and moving large volumes of data.
  
  5. Sqoop – Its a system for huge data transfer between HDFS and RDBMS.
  
  6. Oozie – Its a workflow scheduler for MapReduce jobs.
  
  7.HBase – Its a non – relational distributed database. It provides random real time access to data.
  
  Follow the link to learn more about: Core components of Hadoop
- September 20, 2018 at 5:18 pm #6187
  
  DataFlair Team
  Spectator
  
  Two core components of Hadoop are
  
  HDFS and MapReduce
  
  HDFS: HDFS (Hadoop Distributed file system)
  HDFS is storage layer of hadoop, used to store large data set with streaming data access pattern running cluster on commodity hardware. HDFS is world’s most reliable storage of the data.
  It works on master/slave architecture. Where Name node is master and Data node is slave.
  
  MapReduce
  Map-Reduce is a Programming model for the large volume of data processing in parallel by dividing work into set of independent task. Map-Reduce is also known as computation or processing layer of hadoop. It processes the data in two phases i.e. Map & Reduce. This two phases solves query in HDFS.
  MAP is responsible for reading data from input location and based on the input type it will generate a key/value pair (intermediate output) in local machine.
  Reducer is responsible for processing this intermediate output and generates final output.
  
  Other components of hadoop ecosystem are:
  
  YARN (Yet another resource negotiator): YARN is also called as MapReduce2.0.
  Unlike Mapreduce1.0 Job tracker, resource manager and job scheduling/monitoring done in separate daemons.
  
  Ambari: For Management & Monitoring
  
  PIG: For Scripting
  
  HIVE: For Query
  
  Mahout:Machine Learning
  
  Ozzie: Workflow & scheduling
  
  Zookeeper: Coordination
  
  HBase: No-SQL database
  
  Sqoop: Data integration
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

What are the core components of Apache Hadoop?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses