Can I run Apache Spark without Hadoop?

Viewing 1 reply thread
  • Author
    Posts
    • #5690
      DataFlair TeamDataFlair Team
      Spectator

      Can Spark deployed without Hadoop ? or we need to use Spark and Hadoop together only.

    • #5691
      DataFlair TeamDataFlair Team
      Spectator

      Yes, Apache Spark can run without Hadoop, standalone, or in the cloud. Spark doesn’t need a Hadoop cluster to work. Spark can read and then process data from other file systems as well. HDFS is just one of the file systems that Spark supports.

      Spark does not have any storage layer, so it relies on one of the distributed storage systems for distributed computing like HDFS, Cassandra etc.

      However, there are a lot of advantages to running Spark on top of Hadoop (HDFS (for storage) + YARN (resource manager)), but it’s not the mandatory requirement. Spark is a meant for distributed computing. In this case, the data is distributed across the computers and Hadoop’s distributed file system HDFS is used to store data that does not fit in memory.

      One more reason for using Hadoop with Spark is they both are open source and both can integrate with each other rather easily as compared to other data storage system.

      For more details, please refer:
      Apache Spark Compatibility with Hadoop

Viewing 1 reply thread
  • You must be logged in to reply to this topic.