What are the differences between traditional RDBMS and Hadoop?

Free Online Certification Courses – Learn Today. Lead Tomorrow. Forums Apache Hadoop What are the differences between traditional RDBMS and Hadoop?

Viewing 2 reply threads
  • Author
    Posts
    • #6229
      DataFlair TeamDataFlair Team
      Spectator

      Apache Hadoop and Databases both handle the data, they store and process the data. What is the difference between Hadoop and RDBMS ?
      Comparison between Hadoop vs RDBMS?

    • #6231
      DataFlair TeamDataFlair Team
      Spectator

      Architecture
      • RDBMS have ACID properties.
      Hadoop is distributed computing framework having two main components: Distributed file system (HDFS) and MapReduce.
      Data Size
      • RDMS: Giga bytes of data
      • Hadoop: petabytes of data
      Updates
      • RDMS: we can able to read and write many times
      • Hadoop: we can read many times and writeis limited

      Data acceptance
      • RDBMS has reasonable data sets.
      • Hadoop has inexpensive data storage,(structured, semi-structured). So we can able store everything in our database and there will be no data loss.
      Scalability
      • RDBMS is provided vertical scalability. So if the data increases for storing then we have to increase particular system configuration.
      • Hadoop provides horizontal scalability. So we just have to add one or more node to the cluster if there is any requirement for an increase in data.
      OLTP
      • RDMS support OLTP (Real-time data processing).
      • OLTP is not supported in Apache Hadoop.
      • Hadoop supports large scale Batch Processing workloads (OLAP).
      Cost
      • RDMS -Licensed software, therefore we have to pay for the software.
      • Hadoop is open source framework, so we don’t need to pay for software.

    • #6233
      DataFlair TeamDataFlair Team
      Spectator

      Firstly, Hadoop is not a database, it is basically a distributed file system which is used to process and store large data sets across the computer cluster.

      It has two main core components (HDFS) and MapReduce. HDFS is the storage layer which is used to store a large amount of data across computer clusters.

      MapReduce is a programming model that processes the large data sets by splitting them into several Blocks of data. These blocks are then distributed across the nodes on different machines present inside the computer cluster.

      On the other hand, RDBMS is a database which is used to store data in the form of tables comprising of several rows and columns. It uses SQL, Structured Query Language, to update and access the data present in these tables.

      Lets compare hadoop and RDBMS with following parameter:

      Data Volume-
      Hadoop was meant to handle very large data size . So Hadoop works better when the data size is big. It can easily process and store large amount of data quite effectively as compared to the traditional RDBMS.
      RDBMS works better when the volume of data is low(in Gigabytes). But when the data size is huge i.e, in Terabytes and Petabytes, RDBMS fails to give the desired results.

      Architecture-

      Hadoop as following core components : HDFS(Hadoop Distributed File System), Hadoop MapReduce(a programming model to process large data sets) and Hadoop YARN(used to manage computing resources in computer clusters).

      Traditional RDBMS possess ACID properties which are Atomicity, Consistency, Isolation, and Durability.

      These properties are responsible to maintain and ensure data integrity and accuracy when a transaction takes place in a database.

      Throughput-

      Throughput means the total volume of data processed in a particular period of time so that the output is maximum. RDBMS fails to achieve a higher throughput as compared to the Apache Hadoop Framework.

      This is one of the reason behind the heavy usage of Hadoop than the traditional Relational Database Management System.

      Data Variety-

      Data Variety generally means the type of data to be processed. It may be structured, semi-structured and unstructured.

      Hadoop has the ability to process and store all variety of data whether it is structured, semi-structured or unstructured. Although, it is mostly used to process large amount of unstructured data.

      Traditional RDBMS is used only to manage structured and semi-structured data. It cannot be used to manage unstructured data. So we can say Hadoop is way better than the traditional Relational Database Management System.

      Latency/ Response Time –

      Hadoop has higher throughput, you can quickly access batches of large data sets than traditional RDBMS, but you cannot access a particular record from the data set very quickly. Thus Hadoop is said to have low latency.

      But the RDBMS is comparatively faster in retrieving the information from the data sets. It takes a very little time to perform the same function provided that there is a small amount of data.

      Scalability-

      RDBMS provides vertical scalability which is also known as ‘Scaling Up’ a machine. It means you can add more resources or hardwares such as memory, CPU to a machine in the computer cluster.

      Whereas, Hadoop provides horizontal scalability which is also known as ‘Scaling Out’ a machine. It means adding more machines to the existing computer clusters as a result of which Hadoop becomes a fault tolerant. There is no single point of failure. Due to the presence of more machines in the cluster, you can easily recover data irrespective of the failure of one of the machines.

      Data Processing-

      Apache Hadoop supports OLAP(Online Analytical Processing), which is used in Data Mining techniques.

      OLAP involves very complex queries and aggregations. The data processing speed depends on the amount of data which can take several hours. The database design is de-normalized having fewer tables.

      On the other hand, RDBMS supports OLTP(Online Transaction Processing), which involves comparatively fast query processing. The database design is highly normalized having a large number of tables.

      Cost-

      Hadoop is a free and open source software framework, you don’t have to pay in order to buy the license of the software.

      Whereas RDBMS is a licensed software, you have to pay in order to buy the complete software license.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.