PySpark SparkConf – Attributes and Applications

1. Objective

In our last Pyspark tutorial, we saw Pyspark Serializers. Today, we will discuss PySpark SparkConf. Moreover, we will see attributes in PySpark SparkConf and running Spark Applications. Also, we will learn PySpark SparkConf example. As we need to set a few configurations and parameters, to run a Spark application on the local/cluster for that we use SparkConf. So, to learn to run SparkConf using PySpark, this document will help. 
So, let’s start PySpark SparkConf.

PySpark SparkConf

PySpark SparkConf – Attributes and Applications

Test how much you learned about PySpark 

2. What is PySpark SparkConf?

We need to set a few configurations and parameters, to run a Spark application on the local/cluster, this is what SparkConf helps with. Basically,  to run a Spark application, it offers configurations.

  • Code

For PySpark, here is the code block which has the details of a SparkConf class:

class pyspark.SparkConf (
 loadDefaults = True,
 _jvm = None,
 _jconf = None

Basically, with SparkConf() we will create a SparkConf object first. So, that will load the values from spark. Even Java system properties. Hence, by using the SparkConf object, now we can set different parameters and their parameters will take priority over the system properties.
However, there are better methods, which support chaining, in a SparkConf class. Let’s say, we can write conf.setAppName(“PySpark App”).setMaster(“local”). Though, it cannot be modified by any user once we pass a SparkConf object to Apache Spark.
Have a look at 5 best PySpark books

3. Attributes of PySpark SparkConf

Thus here are the most commonly used attributes of SparkConf:

Attributes of PySpark SparkConf

Attributes of PySpark SparkConf

i. set(key, value)

It helps to set a configuration property.

ii. setMaster(value)

In order to set the master URL, we use it.

iii. setAppName(value)

We use it to set an application name.

iv. get(key, defaultValue=None)

It helps to get a configuration value of a key.
Read PySpark Broadcast and Accumulator With Examples

v. setSparkHome(value)

In order to set Spark installation path on worker nodes, we use it.
In the following code, we can use to create SparkConf and SparkContext objects as part of our applications. Also, using sbt console on base directory of our application we can validate:

from pyspark import SparkConf,SparkContext
conf = SparkConf().setAppName("Spark Demo").setMaster("local")
sc = SparkContext(conf=conf)

4. Running Spark Applications Using SparkConf

In addition, here are some different contexts in which we can run spark applications:

  • local – conf

SparkConf.setAppName(“Spark Demo”).setMaster(“local”)

  • yarn-client – conf

SparkConf.setAppName(“Spark Demo”).setMaster(“yarn-client”)

  • mesos URL
  • spark URL – conf

SparkConf.setAppName(“Spark Demo”).setMaster(“spark master URL”)

  • Code snippet to get all the properties

for i in sc.getConf.getAll: print(i)
So, this was all about Pyspark SparkConf. Hope you like our explanation.
Explore PySpark Pros and Cons

5. Conclusion

Hence, we have learned all about PySpark SparkConf, including its code which will help to create one. Moreover, we discussed different attributes of PySpark SparkConf and also running Spark applications. Still, if any doubt, comment below.
See also –
PySpark RDD with Operations and Commands
For reference

Leave a Reply

Your email address will not be published. Required fields are marked *