What are the limitations of Spark?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 4:55 pm #6002
  
  DataFlair Team
  Spectator
  
  write the shortcomings of Apache Spark.
  what are the constraints with Apache Spark?
- September 20, 2018 at 4:56 pm #6003
  DataFlair Team
  Spectator
  The various disadvantages of Apache Spark are:
  - There is no support for real-time processing in Spark. It supports near real-time processing of live data. The real time data is divided into batches of the predefined interval. And also the result of the computation is returned in batches.
  - Problem with small file comes when we use Spark with a large number of small files. As HDFS allows a limited number of large files. Another place where Spark legs behind are we store the data gzipped in S3. This pattern is very nice except when there are lots of small gzipped files.
  - There is no dedicated file management system. It does not have its own file management system, so it relies on some other platform. For example, Hadoop or another cloud-based platform.
  - It is expensive. Because to keep data in-memory is quite expensive. Also, the memory consumption is very high, and it is not handled in a user-friendly manner. Apache Spark requires lots of RAM to run in-memory, thus the cost of Spark is quite high.
  - Apache Spark lags behind in a number of algorithms. MLlib legs behind in a number of an available algorithm like Tanimoto distance.
  - The job requires being manually optimized and adequate to specific datasets. The partitioning and caching are controlled manually for an authentic solution.
  - In Spark, the data iterates in batches. Also, scheduling and execution of each iteration take place separately.
  - High latency than Apache Flink.
  - Spark does not support record based window criteria. It only has time-based window criteria.
  - Back pressure Handling – Back pressure is buildup of data at an input-output when the buffer is full and not able to receive the additional incoming data. No data is transferred until the buffer is empty. Apache Spark is not capable of handling pressure implicitly rather it is done manually.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

What are the limitations of Spark?

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses