Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Hadoop › Why Hadoop? What are the features of Hadoop?
- This topic has 2 replies, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 5:43 pm #6321DataFlair TeamSpectator
Why Hadoop?
Why Hadoop is the poster boy of Big Data?
What are the features of Hadoop which differentiate it from other frameworks? -
September 20, 2018 at 5:43 pm #6322DataFlair TeamSpectator
Before Hadoop, traditional databases were used largely. Traditional databases are very efficient of handling structured data with small to medium file sizes.
But over the time data becomes larger, semi-structured and unstructured. Also, the speed of data generation also were huge. These become limitations of traditional databases.Following are the few problems with traditional databases:
1. Limited processing capacity
2. Limited storage capacity
3. Sequential processing
4. RDBMS capable of handling structured (in the form of rows and columns) data only
5. It requires pre-processing of data
6. Single-point of failure (means if few files missing, you can’t process whole data)
7. No scalabilityAlso, there is another way to handle the data i.e. Distribution File System. However, typical Distributed system has it’s own limitations like:
1. Huge dependency on network (network failure that one can not handle)
2. Date transportation over the network is costly
3. Data synchronization is must
4. Partial failure of resources are difficult to handle
5. Scaling up and down is not smooth processBecause of these issues with traditional and distributed databases of handling big data, there was need of new technology, Hadoop.
Hadoop solves above problems/limitations of traditional databases very efficiently. No tool or technology stands the Hadoop in handling big data. That makes Hadoop, poster boy of Big Data.
Following are the key characteristics of Hadoop:
1. Hadoop is flexible and faster at data processing
2. Hadoop is scalable
3. Hadoop is fault tolerant
4. Hadoop ecosystem is robust and rich
5. Hadoop is cost effectiveFollow the link for more detail Hadoop Features
-
September 20, 2018 at 5:43 pm #6324DataFlair TeamSpectator
Why Hadoop ?
Over the time data size has increased tremendously to the range of tera/peta/exabytes. RDBMS finds it challenging to handle such huge data volumes.
Yes, here RDBMS can be added with more central processing units (or CPUs) or more memory to the database management system to scale up vertically. But again the processing becomes slow (because it’s sequential) and it also becomes case of Single-point of failure ( means if few files missing, you can’t process whole data)Another thing is, the majority of the data (almost 80%) comes in a semi-structured or unstructured format from social media, audio, video, texts, and emails. However, the problem related to unstructured data is completely outside the purview of RDBMS because relational databases just can’t categorize unstructured data. They’re designed and structured to accommodate structured data such as weblog sensor and financial data.
Also, data is generated at a very high velocity. RDBMS lacks in high velocity because it’s designed for steady data retention rather than rapid growth.
Even if RDBMS is used to handle and store big data, it will turn out to be very expensive.
As a result, the inability of relational databases to handle Big Data led to the emergence of new technologies.
As Hadoop resolves almost limitations or the issues, users were facing with traditional databases, and that too very efficiently.
Why Hadoop is the poster boy of Big Data?
As no other tool or technology stands the Hadoop in handling big data. That makes Hadoop, poster boy of Big Data.
Key features of Hadoop which differentiate it from other frameworks:-
Open-source – Hadoop is an open source project by Apache. It can be modified as per business needs of individual. It is also available in proprietary versions.
Distributed Processing – As data is stored in a distributed manner in HDFS across the cluster, data is processed in parallel on a cluster of nodes.
Fault Tolerance – As multiple replicas of each block is stored across the cluster in Hadoop, So if any node goes down, data on that node can be recovered from other nodes easily. Failures of nodes or tasks are recovered automatically by the framework.
Reliability – Due to replication of data in the cluster, data is reliably stored on the cluster of machine despite machine failures. If your machine goes down, then also your data will be stored reliably.
High Availability – Data is highly available and accessible despite hardware failure due to multiple copies of data. If a machine or few hardware crashes, then data will be accessed from another path.
Scalability – Hadoop is highly scalable in the way new hardware can be easily added to the nodes. It also provides both vertical and horizontal scalability which means adding new disks in existing node or adding a new node itself respectively.
Economic – Hadoop is not very expensive as it runs on a cluster of commodity hardware. Hadoop provides huge cost saving also as it is very easy to add more nodes on the fly as well.
Easy to use – No need of client to deal with distributed computing, the framework takes care of all the things. So it is easy to use.
Data Locality – Hadoop works on principle which states that move computation to data instead of data to computation, which is known as data locality. As we are not moving data again and again, this makes hadoop more efficient.
Follow the link for more detail Hadoop Features
-
-
AuthorPosts
- You must be logged in to reply to this topic.