Explain shared variable in Spark.

This topic has 1 reply, 1 voice, and was last updated 5 years, 6 months ago by DataFlair Team.

Viewing 1 reply thread

Author

Posts
- September 20, 2018 at 9:44 pm #6401
  
  DataFlair Team
  Spectator
  
  What are shared variables?
  What is need of Shared variable in Apache Spark?
- September 20, 2018 at 9:44 pm #6402
  
  DataFlair Team
  Spectator
  
  Apache Spark has two types abstractions. The main abstraction Spark provides is Resilient Distributed Dataset(RDD) and another is Shared Variables.
  
  Shared variables:
  Shared variables are the variables that are required to be used by many functions & methods in parallel. Shared variables can be used in parallel operations.
  
  Spark segregates the job into the smallest possible operation, a closure, running on different nodes and each having a copy of all the variables of the Spark job. Any changes made to these variables doesn’t reflect in the driver program and hence to overcome this limitation Spark provides two special type of shared variables – Broadcast Variables and Accumulators.
  
  Broadcast variables:
  Used to cache a value in memory on all nodes. Here only one instance of this read-only variable is shared between all computations throughout the cluster.
  Spark sends the broadcast variable to each node concerned by the related task. After that, each node caches it locally in serialised form.
  Now before executing each of the planned tasks instead of getting values from the driver system retrieves them locally from the cache.
  Broadcast variables are:
  
  Immutable (Unchangeable)
  Distributed i.e. broadcasted to the cluster
  Fit in memory
  Syntax to create Broadcast variable:
  SparkContext.broadcast(Value)
  
  Accumulators:
  As its name suggests Accumulators main role is to accumulate values. The accumulator is variables that are used to implement counters and sums. Spark provides accumulators of numeric type only.
  The user can create named or unnamed accumulators.
  Unlike Broadcast Variables, accumulators are writable. However, written values can be only read in driver program. It’s why accumulators work pretty well as data aggregators.
  
  Syntax to create accumulator:
  SparkContext.accumulator(orgnlValue)
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

Explain shared variable in Spark.

About DataFlair

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Web Dev Courses

Trending Courses

Trending Python Courses

Trending Java Courses

Trending DSA Courses