Name the two types of shared variable available in Apache Spark.
-
-
Name the two types of shared variable available in Apache Spark.
-
There are two types of shared variables available in Apache Spark:
(1) Accumulators: used to Aggregate the Information.
(2) Broadcast variable: to efficiently distribute large values.
When we pass the function to Spark, say filter(), this function can use the variable which defined outside of the function but within the Driver program but when we submit the task to Cluster, each worker node gets a new copy of variables and update from these variables not propagated back to Driver program.
Accumulators and Broadcast variable are used to remove above drawback ( i.e. we can get the updated values back to our Driver program)
- You must be logged in to reply to this topic.