Live instructor-led & Self-paced Online Certification Training Courses (Big Data, Hadoop, Spark) Forums Apache Spark what are the common faults of developer while using Apache Spark?

This topic contains 1 reply, has 1 voice, and was last updated by  dfbdteam5 9 months, 4 weeks ago.

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #6071

    dfbdteam5
    Moderator

    What mistake do developers generally commit while using Apache Spark?

    #6073

    dfbdteam5
    Moderator

    1) Management of DAG’s– People often do mistakes in DAG controlling. Always try to use reducebykey instead of groupbykey. The ReduceByKey and GroupByKey can perform almost similar functions, but GroupByKey contains large data. Hence, try to use ReduceByKey to the most. Always try to lower the side of maps as much as possible. Try not to waste more time in Partitioning.Try not to shuffle more. Try to keep away from Skews as well as partitions too.

    2) Maintain the required size of the shuffle blocks.

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.