What are the limitations of Spark?

Viewing 1 reply thread
  • Author
    Posts
    • #6002
      DataFlair TeamDataFlair Team
      Spectator

      write the shortcomings of Apache Spark.
      what are the constraints with Apache Spark?

    • #6003
      DataFlair TeamDataFlair Team
      Spectator

      The various disadvantages of Apache Spark are:

      • There is no support for real-time processing in Spark. It supports near real-time processing of live data. The real time data is divided into batches of the predefined interval. And also the result of the computation is returned in batches.
      • Problem with small file comes when we use Spark with a large number of small files. As HDFS allows a limited number of large files. Another place where Spark legs behind are we store the data gzipped in S3. This pattern is very nice except when there are lots of small gzipped files.
      • There is no dedicated file management system. It does not have its own file management system, so it relies on some other platform. For example, Hadoop or another cloud-based platform.
      • It is expensive. Because to keep data in-memory is quite expensive. Also, the memory consumption is very high, and it is not handled in a user-friendly manner. Apache Spark requires lots of RAM to run in-memory, thus the cost of Spark is quite high.
      • Apache Spark lags behind in a number of algorithms. MLlib legs behind in a number of an available algorithm like Tanimoto distance.
      • The job requires being manually optimized and adequate to specific datasets. The partitioning and caching are controlled manually for an authentic solution.
      • In Spark, the data iterates in batches. Also, scheduling and execution of each iteration take place separately.
      • High latency than Apache Flink.
      • Spark does not support record based window criteria. It only has time-based window criteria.
      • Back pressure Handling – Back pressure is buildup of data at an input-output when the buffer is full and not able to receive the additional incoming data. No data is transferred until the buffer is empty. Apache Spark is not capable of handling pressure implicitly rather it is done manually.
Viewing 1 reply thread
  • You must be logged in to reply to this topic.