what are the common faults of developer while using Apache Spark?

Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Apache Spark what are the common faults of developer while using Apache Spark?

Viewing 1 reply thread
  • Author
    Posts
    • #6071
      DataFlair Team
      Moderator

      What mistake do developers generally commit while using Apache Spark?

    • #6073
      DataFlair Team
      Moderator

      1) Management of DAG’s– People often do mistakes in DAG controlling. Always try to use reducebykey instead of groupbykey. The ReduceByKey and GroupByKey can perform almost similar functions, but GroupByKey contains large data. Hence, try to use ReduceByKey to the most. Always try to lower the side of maps as much as possible. Try not to waste more time in Partitioning.Try not to shuffle more. Try to keep away from Skews as well as partitions too.

      2) Maintain the required size of the shuffle blocks.

Viewing 1 reply thread
  • You must be logged in to reply to this topic.