Comparison between hadoop and RDBMS

Viewing 2 reply threads
  • Author
    Posts
    • #4827
      DataFlair TeamDataFlair Team
      Spectator

      Compare Hadoop and RDBMS from following perspectives:
      1. Volume of data
      2. Latency
      3. Throughput
      4. ACID property
      5. Schema
      6. Variety of data handling
      7. Response time

    • #4828
      DataFlair TeamDataFlair Team
      Spectator

      First of all, I would like to tell you that Hadoop is not a Database Management System so a direct comparison cannot be done. But still, I’ll try to compare them in the terms of the perspectives you have asked.

      1) Volume of data: For the lower volume of data such as few GB’s if RDBMS fulfills your requirement it is the best. When the data size exceeds, RDBMS becomes very slow. In contrast to this, Hadoop framework’s processing power comes into realization when the file sizes are very large and streaming reads and processing is the demand of the situation.

      2) Latency: RDBMS can give a very quick response when the data size is ideal for its processing power. In the case of Hadoop, it’s very different. First of all, Hadoop is efficient for batch processing of data. Hence, the results are only available after a large amount of data has been processed. Therefore, Hadoop is not the ideal platform to use when immediate results are expected.

      3) Throughput: Throughput refers to the amount of data processed in a period of time. And Hadoop’s throughput if higher than RDBMS.

      4) ACID Property: ACID property is for transaction based systems. Whereas, in the case of Hadoop nothing like ACID is existent. But if we want to talk in the context of Distributed Databases there is a HBASE property (Basically Available, Soft State, Eventually Consistent). You can dig into it for more info or we can discuss it in a separate thread.

      5) Schema: If we talk about RDBMS, it is used to store structured data or semi -structured data with null values in certain columns in the tables. Hadoop is used to store semi – structured data and unstructured data in files. All the processing algorithms are implemented on the files stored in HDFS in case of Hadoop. In the case of RDBMS, querying languages such as SQL are used to fetch data from the tables.

      6) Variety of Data Handling: In case of RDBMS, only that data can be stored which can be represented in a certain format in a combination of row and column of the table. In Hadoop, any kind of data can be stored but it’s only productive if you can process it using MapReduce job. There are two terms I’ll like to discuss. One is schema-on-write which is used by traditional RDBMS where data should be in a specific format before writing it to the table. In Hadoop, schema-on-read is used where you can store any data in raw format and the structure is imposed at processing time based on the requirements of the processing application.

      7) Response Time: Response time for RDBMS is very less if the data is in its processing limits whereas, Hadoop is very fast to process very large files but its jobs are executed in batches from time to time.

    • #4830
      DataFlair TeamDataFlair Team
      Spectator

      1. Volume of data : RDBMS can handle data up to Gigabytes but not above that, but Hadoop can handle and process even terabytes and petabytes of data.

      2. Latency: Latency refers to delay produced while processing data and fetching the response for it. Hadoop is not ideal for immediate retrieval of response from data. For that purpose, we can use RDBMS provided the data size is limited.

      3. Throughput: Throughput refers to the amount of data processed in a period of time. And Hadoop’s throughput if higher than RDBMS.

      4. ACID property: RDBMS does follow the ACID properties for data up to few Gigabytes, but not for data in Tera/ Petabytes.
      And Hadoop is not really meant for real-time transactions using ACID properties, rather it is used for business reports or batch processing.

      5. Schema: Unlike RDBMS Hadoop works in a schema-less storage system. RDBMS mandatorily need to have a schema defined for it to store and process data and it handles only structured or semi-structured data, but that is not true in the case of Hadoop. Hadoop handles even unstructured data and is scalable, so it cannot have any particular schema defined for it.

      6. Variety of data handling: Data comes from various sources, so it can have any structure (for eg., audio, video, social media posts etc,).
      These data cannot be handled or efficiently processed by the traditional database systems as they can process only structured data( in rows and columns format), whereas Hadoop is designed to process any kind of huge data efficiently.

      7. Response time: Response time of RDBMS can be very fast if data is in Gigabytes. But if the data exceeds this particular limit RDBMS is not very helpful. On the other hand, Hadoop can process this huge amount of data, but it is not helpful when we need an immediate response for any data as Hadoop is used for analysing or creating reports or any other huge data.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.