PySpark SparkConf – Attributes and Applications
In our last Pyspark tutorial, we saw Pyspark Serializers. Today, we will discuss PySpark SparkConf. Moreover, we will see attributes in PySpark SparkConf and running Spark Applications. Also, we will learn PySpark SparkConf example. As we need to set a few configurations and parameters, to run a Spark application on the local/cluster for that we use SparkConf. So, to learn to run SparkConf using PySpark, this document will help.
So, let’s start PySpark SparkConf.
2. What is PySpark SparkConf?
We need to set a few configurations and parameters, to run a Spark application on the local/cluster, this is what SparkConf helps with. Basically, to run a Spark application, it offers configurations.
For PySpark, here is the code block which has the details of a SparkConf class:
class pyspark.SparkConf ( loadDefaults = True, _jvm = None, _jconf = None )
Basically, with SparkConf() we will create a SparkConf object first. So, that will load the values from spark. Even Java system properties. Hence, by using the SparkConf object, now we can set different parameters and their parameters will take priority over the system properties.
However, there are better methods, which support chaining, in a SparkConf class. Let’s say, we can write conf.setAppName(“PySpark App”).setMaster(“local”). Though, it cannot be modified by any user once we pass a SparkConf object to Apache Spark.
Have a look at 5 best PySpark books
3. Attributes of PySpark SparkConf
Thus here are the most commonly used attributes of SparkConf:
i. set(key, value)
It helps to set a configuration property.
In order to set the master URL, we use it.
We use it to set an application name.
iv. get(key, defaultValue=None)
It helps to get a configuration value of a key.
Read PySpark Broadcast and Accumulator With Examples
In order to set Spark installation path on worker nodes, we use it.
In the following code, we can use to create SparkConf and SparkContext objects as part of our applications. Also, using sbt console on base directory of our application we can validate:
from pyspark import SparkConf,SparkContext conf = SparkConf().setAppName("Spark Demo").setMaster("local") sc = SparkContext(conf=conf)
4. Running Spark Applications Using SparkConf
In addition, here are some different contexts in which we can run spark applications:
- local – conf
- yarn-client – conf
- mesos URL
- spark URL – conf
SparkConf.setAppName(“Spark Demo”).setMaster(“spark master URL”)
- Code snippet to get all the properties
for i in sc.getConf.getAll: print(i)
So, this was all about Pyspark SparkConf. Hope you like our explanation.
Explore PySpark Pros and Cons
Hence, we have learned all about PySpark SparkConf, including its code which will help to create one. Moreover, we discussed different attributes of PySpark SparkConf and also running Spark applications. Still, if any doubt, comment below.
See also –
PySpark RDD with Operations and Commands