Can I run Apache Spark without Hadoop?

This topic has 1 reply, 1 voice, and was last updated 7 years, 10 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 3:57 pm #5690
  
  DataFlair Team
  Spectator
  
  Can Spark deployed without Hadoop ? or we need to use Spark and Hadoop together only.
- September 20, 2018 at 3:57 pm #5691
  
  DataFlair Team
  Spectator
  
  Yes, Apache Spark can run without Hadoop, standalone, or in the cloud. Spark doesn’t need a Hadoop cluster to work. Spark can read and then process data from other file systems as well. HDFS is just one of the file systems that Spark supports.
  
  Spark does not have any storage layer, so it relies on one of the distributed storage systems for distributed computing like HDFS, Cassandra etc.
  
  However, there are a lot of advantages to running Spark on top of Hadoop (HDFS (for storage) + YARN (resource manager)), but it’s not the mandatory requirement. Spark is a meant for distributed computing. In this case, the data is distributed across the computers and Hadoop’s distributed file system HDFS is used to store data that does not fit in memory.
  
  One more reason for using Hadoop with Spark is they both are open source and both can integrate with each other rather easily as compared to other data storage system.
  
  For more details, please refer:
  Apache Spark Compatibility with Hadoop
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.