Explain shared variable in Spark.

Viewing 1 reply thread
  • Author
    Posts
    • #6401
      DataFlair TeamDataFlair Team
      Spectator

      What are shared variables?
      What is need of Shared variable in Apache Spark?

    • #6402
      DataFlair TeamDataFlair Team
      Spectator

      Apache Spark has two types abstractions. The main abstraction Spark provides is Resilient Distributed Dataset(RDD) and another is Shared Variables.

      Shared variables:
      Shared variables are the variables that are required to be used by many functions & methods in parallel. Shared variables can be used in parallel operations.

      Spark segregates the job into the smallest possible operation, a closure, running on different nodes and each having a copy of all the variables of the Spark job. Any changes made to these variables doesn’t reflect in the driver program and hence to overcome this limitation Spark provides two special type of shared variables – Broadcast Variables and Accumulators.

      Broadcast variables: 
      Used to cache a value in memory on all nodes. Here only one instance of this read-only variable is shared between all computations throughout the cluster.
      Spark sends the broadcast variable to each node concerned by the related task. After that, each node caches it locally in serialised form.
      Now before executing each of the planned tasks instead of getting values from the driver system retrieves them locally from the cache.
      Broadcast variables are:

      Immutable (Unchangeable)
      Distributed i.e. broadcasted to the cluster
      Fit in memory
      Syntax to create Broadcast variable:
      SparkContext.broadcast(Value)

      Accumulators: 
      As its name suggests Accumulators main role is to accumulate values. The accumulator is variables that are used to implement counters and sums. Spark provides accumulators of numeric type only.
      The user can create named or unnamed accumulators.
      Unlike Broadcast Variables, accumulators are writable. However, written values can be only read in driver program. It’s why accumulators work pretty well as data aggregators.

      Syntax to create accumulator:
      SparkContext.accumulator(orgnlValue)

Viewing 1 reply thread
  • You must be logged in to reply to this topic.