Explain countByValue() operation in Apache Spark RDD.

Viewing 1 reply thread
  • Author
    Posts
    • #5061
      DataFlair Team
      Moderator

      Explain countByValue() operation in Apache Spark RDD.

    • #5062
      DataFlair Team
      Moderator
        <li style=”list-style-type: none”>
      • It is an action
      • It returns the count of each unique value in an RDD as a local Map (as a Map to driver program) (value, countofvalues) pair
      • Care must be taken to use this API since it returns the value to driver program so it’s suitable only for small values.

      Example:

      val rdd1 = sc.parallelize(Seq(("HR",5),("RD",4),("ADMIN",5),("SALES",4),("SER",6),("MAN",8)))
      rdd1.countByValue

      Output:
      scala.collection.Map[(String, Int),Long] = Map((HR,5) -> 1, (RD,4) -> 1, (SALES,4) -> 1, (ADMIN,5) -> 1, (MAN,8) -> 1, (SER,6) -> 1)

      val rdd2 = sc.parallelize{Seq(10,4,3,3)}
      rdd2.countByValue

      Output:
      scala.collection.Map[Int,Long] = Map(4 -> 1, 3 -> 2, 10 -> 1)

      For more Actions in Apache Spark Refer to Action in ApacheSpark

Viewing 1 reply thread
  • You must be logged in to reply to this topic.