It is an action operation
> Returns (key, noofkeycount) pairs.
From :
http://data-flair.training/blogs/rdd-transformations-actions-apis-apache-spark/#38_CountByKey
It counts the value of RDD consisting of two components tuple for each distinct key. It actually counts the number of elements for each key and return the result to the master as lists of (key, count) pairs.
val rdd1 = sc.parallelize(Seq(("Spark",78),("Hive",95),("spark",15),("HBase",25),("spark",39),("BigData",78),("spark",49)))
rdd1.countByKey
Output:
scala.collection.Map[String,Long] = Map(Hive -> 1, BigData -> 1, HBase -> 1, spark -> 3, Spark -> 1)