what are the common faults of developer while using Apache Spark?

This topic has 1 reply, 1 voice, and was last updated 5 years, 7 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 5:04 pm #6071
  
  DataFlair Team
  Spectator
  
  What mistake do developers generally commit while using Apache Spark?
- September 20, 2018 at 5:04 pm #6073
  
  DataFlair Team
  Spectator
  
  1) Management of DAG’s– People often do mistakes in DAG controlling. Always try to use reducebykey instead of groupbykey. The ReduceByKey and GroupByKey can perform almost similar functions, but GroupByKey contains large data. Hence, try to use ReduceByKey to the most. Always try to lower the side of maps as much as possible. Try not to waste more time in Partitioning.Try not to shuffle more. Try to keep away from Skews as well as partitions too.
  
  2) Maintain the required size of the shuffle blocks.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.