1. Features of Hadoop and Design Principles – Objective
In this features of Hadoop and design principles tutorial we will discuss the Hadoop Features, characteristics and design principles of Hadoop. In this feratures of hadoop blog we will also discuss the assumptions around which the Hadoop in built. So let’s get started with the characteristics of Hadoop and design principles tutorial.
To install and configure Hadoop follow this installation guide.
2. Features of Hadoop
Apache Hadoop is the most popular and powerful big data tool, Hadoop provides world’s most reliable storage layer – HDFS, a batch Processing engine – MapReduce and a Resource Management Layer – YARN. In this section of features of Hadoop, Let us discuss important features of Hadoop which are given below-
2.1. Open Source
Apache Hadoop is an open source project. It means its code can be modified according to business requirements.
2.2. Distributed Processing
As data is stored in a distributed manner in HDFS across the cluster, data is processed in parallel on a cluster of nodes.
2.3. Fault Tolerance
This is one of the very important features of Hadoop. By default 3 replicas of each block is stored across the cluster in Hadoop and it can be changed also as per the requirement. So if any node goes down, data on that node can be recovered from other nodes easily with the help of this characteristic. Failures of nodes or tasks are recovered automatically by the framework. This is how Hadoop is fault tolerant.
Due to replication of data in the cluster, data is reliably stored on the cluster of machine despite machine failures. If your machine goes down, then also your data will be stored reliably due to this charecteristic of Hadoop.
2.5. High Availability
Data is highly available and accessible despite hardware failure due to multiple copies of data. If a machine or few hardware crashes, then data will be accessed from another path.
Hadoop is highly scalable in the way new hardware can be easily added to the nodes. This feature of Hadoop also provides horizontal scalability which means new nodes can be added on the fly without any downtime.
Apache Hadoop is not very expensive as it runs on a cluster of commodity hardware. We do not need any specialized machine for it. Hadoop also provides huge cost saving also as it is very easy to add more nodes on the fly here. So if requirement increases, then you can increase nodes as well without any downtime and without requiring much of pre-planning.
2.8. Easy to use
No need of client to deal with distributed computing, the framework takes care of all the things. So this feature of Hadoop is easy to use.
2.9. Data Locality
This one is a unique features of Hadoop that made it easily handle the Big Data. Hadoop works on data locality principle which states that move computation to data instead of data to computation. When a client submits the MapReduce algorithm, this algorithm is moved to data in the cluster rather than bringing data to the location where the algorithm is submitted and then processing it.
These were the Characteristics of Hadoop that differentiates it from the other Data Management systems. Now, after the features of Hadoop, lets go through some Hadoop Assumptions that are to be considered before using Hadoop.
3. Hadoop Assumptions
Hadoop is written with large clusters of computers in mind and is built around the following hadoop assumptions:
- Hardware may fail, (as commodity hardware can be used)
- Processing will be run in batches. Thus there is an emphasis on high throughput as opposed to low latency.
- Applications that run on HDFS have large data sets. A typical file in HDFS is gigabytes to terabytes in size.
- Applications need a write-once-read-many access model.
- Moving Computation is Cheaper than Moving Data.
Read: How Hadoop Works?
4. Design Principles of Hadoop
Below are the design principles of Hadoop on which it works:
a) System shall manage and heal itself
- Automatically and transparently route around failure (Fault Tolerant)
- Speculatively execute redundant tasks if certain nodes are detected to be slow
b) Performance shall scale linearly
- Proportional change in capacity with resource change (Scalability)
c) Computation should move to data
- Lower latency, lower bandwidth (Data Locality)
d) Simple core, modular and extensible (Economical)
5. Conclusion – Characteristics of Hadoop and Design Principles Tutorial
In this tutorial we have not only covered features of Hadoop but also its design principle on which Hadoop works.Now when you know about the characteristics of hadoop and its design principles, go through our Hadoop installation guide to use Hadoop functionality.
This was all about the characteristics of Hadoop and design principles tutorial. Hope you like the characteristics of Hadoop tutorial.
Got any queries or feedback about this features of hadoop and design principle tutorial? Just leave your message in the comment section and we will get back to you.