In the famous word count example for spark streaming, the spark configuration object is initialized as follows:
/* Create a local StreamingContext
with two working thread and batch
interval of 1 second.
The master requires 2 cores
to prevent from a starvation scenario. */
val sparkConf = new SparkConf().
Here if I change the master from local to local or does not set the Master, I do not get the expected output and in fact word counting doesn’t happen at all.
The comment says “The master requires 2 cores to prevent from a starvation scenario” that’s why they have done setMaster(“local”).
Can somebody explain me why it requires 2 cores and what is starvation scenario ?
In Apache Spark, the master requires two cores because, one core will be used to run the receiver. Now, at least one core is necessary for processing the received data. The system can not process the data, if the number of allocated cores will not be more than the number of receivers, for the cluster.
Therefore, while running locally or while using a cluster, we at least need 2 cores to be allocated to our system.
Now lets come to Starvation scenario in Spark Streaming,
It refers to this type of problem when some cores are not able to execute at all while others make progress.