Why Hadoop? What are the features of Hadoop?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop Why Hadoop? What are the features of Hadoop?

Viewing 2 reply threads
  • Author
    Posts
    • #6321
      DataFlair TeamDataFlair Team
      Spectator

      Why Hadoop?
      Why Hadoop is the poster boy of Big Data?
      What are the features of Hadoop which differentiate it from other frameworks?

    • #6322
      DataFlair TeamDataFlair Team
      Spectator

      Before Hadoop, traditional databases were used largely. Traditional databases are very efficient of handling structured data with small to medium file sizes.
      But over the time data becomes larger, semi-structured and unstructured. Also, the speed of data generation also were huge. These become limitations of traditional databases.

      Following are the few problems with traditional databases:
      1. Limited processing capacity
      2. Limited storage capacity
      3. Sequential processing
      4. RDBMS capable of handling structured (in the form of rows and columns) data only
      5. It requires pre-processing of data
      6. Single-point of failure (means if few files missing, you can’t process whole data)
      7. No scalability

      Also, there is another way to handle the data i.e. Distribution File System. However, typical Distributed system has it’s own limitations like:
      1. Huge dependency on network (network failure that one can not handle)
      2. Date transportation over the network is costly
      3. Data synchronization is must
      4. Partial failure of resources are difficult to handle
      5. Scaling up and down is not smooth process

      Because of these issues with traditional and distributed databases of handling big data, there was need of new technology, Hadoop.

      Hadoop solves above problems/limitations of traditional databases very efficiently. No tool or technology stands the Hadoop in handling big data. That makes Hadoop, poster boy of Big Data.

      Following are the key characteristics of Hadoop:
      1. Hadoop is flexible and faster at data processing
      2. Hadoop is scalable
      3. Hadoop is fault tolerant
      4. Hadoop ecosystem is robust and rich
      5. Hadoop is cost effective

      Follow the link for more detail Hadoop Features

    • #6324
      DataFlair TeamDataFlair Team
      Spectator

      Why Hadoop ?

      Over the time data size has increased tremendously to the range of tera/peta/exabytes. RDBMS finds it challenging to handle such huge data volumes.
      Yes, here RDBMS can be added with more central processing units (or CPUs) or more memory to the database management system to scale up vertically. But again the processing becomes slow (because it’s sequential) and it also becomes case of Single-point of failure ( means if few files missing, you can’t process whole data)

      Another thing is, the majority of the data (almost 80%) comes in a semi-structured or unstructured format from social media, audio, video, texts, and emails. However, the problem related to unstructured data is completely outside the purview of RDBMS because relational databases just can’t categorize unstructured data. They’re designed and structured to accommodate structured data such as weblog sensor and financial data.

      Also, data is generated at a very high velocity. RDBMS lacks in high velocity because it’s designed for steady data retention rather than rapid growth.

      Even if RDBMS is used to handle and store big data, it will turn out to be very expensive.

      As a result, the inability of relational databases to handle Big Data led to the emergence of new technologies.

      As Hadoop resolves almost limitations or the issues, users were facing with traditional databases, and that too very efficiently.

      Why Hadoop is the poster boy of Big Data?

      As no other tool or technology stands the Hadoop in handling big data. That makes Hadoop, poster boy of Big Data.

      Key features of Hadoop which differentiate it from other frameworks:-

      Open-source – Hadoop is an open source project by Apache. It can be modified as per business needs of individual. It is also available in proprietary versions.

      Distributed Processing – As data is stored in a distributed manner in HDFS across the cluster, data is processed in parallel on a cluster of nodes.

      Fault Tolerance – As multiple replicas of each block is stored across the cluster in Hadoop, So if any node goes down, data on that node can be recovered from other nodes easily. Failures of nodes or tasks are recovered automatically by the framework.

      Reliability – Due to replication of data in the cluster, data is reliably stored on the cluster of machine despite machine failures. If your machine goes down, then also your data will be stored reliably.

      High Availability – Data is highly available and accessible despite hardware failure due to multiple copies of data. If a machine or few hardware crashes, then data will be accessed from another path.

      Scalability – Hadoop is highly scalable in the way new hardware can be easily added to the nodes. It also provides both vertical and horizontal scalability which means adding new disks in existing node or adding a new node itself respectively.

      Economic – Hadoop is not very expensive as it runs on a cluster of commodity hardware. Hadoop provides huge cost saving also as it is very easy to add more nodes on the fly as well.

      Easy to use – No need of client to deal with distributed computing, the framework takes care of all the things. So it is easy to use.

      Data Locality – Hadoop works on principle which states that move computation to data instead of data to computation, which is known as data locality. As we are not moving data again and again, this makes hadoop more efficient.

      Follow the link for more detail Hadoop Features

Viewing 2 reply threads
  • You must be logged in to reply to this topic.