Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Spark › Can I run Apache Spark without Hadoop?
- This topic has 1 reply, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 3:57 pm #5690DataFlair TeamSpectator
Can Spark deployed without Hadoop ? or we need to use Spark and Hadoop together only.
-
September 20, 2018 at 3:57 pm #5691DataFlair TeamSpectator
Yes, Apache Spark can run without Hadoop, standalone, or in the cloud. Spark doesn’t need a Hadoop cluster to work. Spark can read and then process data from other file systems as well. HDFS is just one of the file systems that Spark supports.
Spark does not have any storage layer, so it relies on one of the distributed storage systems for distributed computing like HDFS, Cassandra etc.
However, there are a lot of advantages to running Spark on top of Hadoop (HDFS (for storage) + YARN (resource manager)), but it’s not the mandatory requirement. Spark is a meant for distributed computing. In this case, the data is distributed across the computers and Hadoop’s distributed file system HDFS is used to store data that does not fit in memory.
One more reason for using Hadoop with Spark is they both are open source and both can integrate with each other rather easily as compared to other data storage system.
For more details, please refer:
Apache Spark Compatibility with Hadoop
-
-
AuthorPosts
- You must be logged in to reply to this topic.