Is Hadoop a database?

Viewing 2 reply threads
  • Author
    Posts
    • #4765
      DataFlair TeamDataFlair Team
      Spectator

      Is Hadoop just another database like oracle or DB2?

    • #4766
      DataFlair TeamDataFlair Team
      Spectator

      The database is something which stores structured data. Hadoop is not a database, it’s the superset of database, which can store any format of data. Hadoop ecosystem can perform:

      Data Storage
      Data Access
      Data Serialization
      Data Intelligence
      Data Integration
      Management, Monitoring – Orchestration
      Interaction -Visualization- execution-development

      Now, lets compare the difference between Hadoop’s Database and Native database.

      Hadoop’s HDFS:
      Hadoop stores very large amounts of structured, non-structured and semi-structured data on the HDFS in the flat file format in clusters. In HDFS data is stored reliably. Files are broken into blocks and distributed across nodes in a cluster. After that each block is replicated, means copies of blocks are created on different machines. Hence if a machine goes down or gets crashed, then also we can easily retrieve and access our data from different machines. By default 3 copies of a file are created on different machines. Hence it is highly fault tolerant. We use map-reduce to process the data in HDFS, but this doesn’t provide very fast results. Since it doesn’t support the random search.

      Native Database:
      This can handle only structured data and also it needs the data which is processed. When the volume of data increases, then it becomes an inefficient method to handle data. Data here is stored in tables and can be accessed using SQL.

    • #4768
      DataFlair TeamDataFlair Team
      Spectator

      Hadoop is not a database storage or relational storage. It is mainly used for processing huge amounts of data on distributed servers. It stores files in HDFS (Hadoop distributed file system) but does not qualify as a relational database. Relational databases store information in tables defined by the specific schema. Hadoop can store unstructured, semi-structured and structured data while traditional databases can store only structured data. We cannot do update/modify on data in HDFS which can be done in a traditional DB.

      There are components like Hive which works on top of HDFS and allows users to query data stored in HDFS with SQL-like syntax called HiveQL. It internally uses MapReduce to get the results.

Viewing 2 reply threads
  • You must be logged in to reply to this topic.