Free Online Certification Courses – Learn Today. Lead Tomorrow. › Forums › Apache Spark › Why does the picture of Spark come into existence?
- This topic has 2 replies, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.
-
AuthorPosts
-
-
September 20, 2018 at 10:10 pm #6443DataFlair TeamSpectator
What is the need of Apache Spark?
List the Drawbacks of Apache Hadoop. -
September 20, 2018 at 10:10 pm #6444DataFlair TeamSpectator
Let’s first discuss some major issues with Apache Hadoop.
1. Issue with Small Files
2. It has Slow Processing Speed
3. Only support for Batch Processing only
4. There is no support for Real-time Data ProcessingThere are some more limitations of Apache Hadoop. To learn all, follow link: 13 Big Limitations of Hadoop
To overcome all these issues, Apache Spark comes into the picture. one more reason behind Evolution of Apache Spark is that there were many general purpose computing engines those can perform operations but again they attain some limitations with their functionality itself.
For Example:
1. Hadoop MapReduce is limited to batch processing.
2. Apache Storm / S4 can only support stream processing.
3. Apache Impala / Apache Tez can only allow interactive processing
4. Neo4j / Apache Giraph can only support graph processingTherefore, If we want to use them together, reduces the efficiency and also increases the complexity. So, there is a big demand for a powerful engine. By that, we can process the data in real-time (streaming) as well as in batch mode. Also, there was a requirement for an engine which can respond in sub-second. And also perform in-memory processing.
In this way, Apache Software foundation introduces Apache Spark. It is a powerful open-source engine. It offers real-time stream processing, interactive processing, graph processing, in-memory processing as well as batch processing. Even with very fast speed, ease of use and standard interface. That makes difference between Hadoop vs Spark. Also, makes a huge comparison between Spark vs Storm.
For detailed insights about Apache Spark. follow the link: Apache Spark – A Complete Spark Tutorial for Beginners
-
September 20, 2018 at 10:11 pm #6445DataFlair TeamSpectator
Let’s first discuss some major issues with Apache Hadoop.
1. Issue with Small Files
2. It has Slow Processing Speed
3. Only support for Batch Processing only
4. There is no support for Real-time Data ProcessingThere are some more limitations of Apache Hadoop. To learn all, follow link: 13 Big Limitations of Hadoop
To overcome all these issues, Apache Spark comes into the picture. one more reason behind Evolution of Apache Spark is that there were many general purpose computing engines those can perform operations but again they attain some limitations with their functionality itself.
For Example:
1. Hadoop MapReduce is limited to batch processing.
2. Apache Storm / S4 can only support stream processing.
3. Apache Impala / Apache Tez can only allow interactive processing
4. Neo4j / Apache Giraph can only support graph processingTherefore, If we want to use them together, reduces the efficiency and also increases the complexity. So, there is a big demand for a powerful engine. By that, we can process the data in real-time (streaming) as well as in batch mode. Also, there was a requirement for an engine which can respond in sub-second. And also perform in-memory processing.
In this way, Apache Software foundation introduces Apache Spark. It is a powerful open-source engine. It offers real-time stream processing, interactive processing, graph processing, in-memory processing as well as batch processing. Even with very fast speed, ease of use and standard interface. That makes difference between Hadoop vs Spark. Also, makes a huge comparison between Spark vs Storm.
For detailed insights about Apache Spark. follow the link: Apache Spark – A Complete Spark Tutorial for Beginners
-
-
AuthorPosts
- You must be logged in to reply to this topic.