<li style=”list-style-type: none”>
- It is an action
- It returns the count of each unique value in an RDD as a local Map (as a Map to driver program) (value, countofvalues) pair
- Care must be taken to use this API since it returns the value to driver program so it’s suitable only for small values.
Example:
val rdd1 = sc.parallelize(Seq(("HR",5),("RD",4),("ADMIN",5),("SALES",4),("SER",6),("MAN",8)))
rdd1.countByValue
Output:
scala.collection.Map[(String, Int),Long] = Map((HR,5) -> 1, (RD,4) -> 1, (SALES,4) -> 1, (ADMIN,5) -> 1, (MAN,8) -> 1, (SER,6) -> 1)
val rdd2 = sc.parallelize{Seq(10,4,3,3)}
rdd2.countByValue
Output:
scala.collection.Map[Int,Long] = Map(4 -> 1, 3 -> 2, 10 -> 1)
For more Actions in Apache Spark Refer to Action in ApacheSpark