val rdd1 = sc.textFile(“/home/hdadmin/wc-data.txt”)
Consider the size of wc-data.txt is of 1280 MB and Default block size is 128 MB. So there will be 10 blocks created and 10 default partitions(1 per block).
For a better performance, we can increase the number of partitions on each block. Below code will create 20 partitions on 10 blocks(2 partitions/block). Performance will be improved but need to make sure that each cluster is running on 2 cores minimum.
val rdd1 = sc.textFile(“/home/hdadmin/wc-data.txt”, 20)