PySpark Serializers and Its Types – Marshal & Pickle

Boost your career with Data Engineering Courses!!

Today, in this PySpark article, “PySpark Serializers and its Types” we will discuss the whole concept of PySpark Serializers. Moreover, there are two types of serializers that PySpark supports – MarshalSerializer and PickleSerializer, we will also learn them in detail.

So, let’s begin PySpark Serializers.

What is PySpark Serializers?

Basically, for performance tuning on Apache Spark, Serialization is used. However, all that data which is sent over the network or written to the disk or also which is persisted in the memory must be serialized. In addition, we can say, in costly operations, serialization plays an important role.

Types of PySpark Serializers

However, for performance tuning, PySpark supports custom serializers. So, here are the two serializers which are supported by PySpark, such as −

MarshalSerializer
PickleSerializer

So, let’s understand types of PySpark Serializers in detail.

MarshalSerializer

By using PySpark Marshal Serializer, it Serializes Objects. On comparing with PickleSerializer, this serializer of PySpark is faster. However, it supports fewer datatypes.

class MarshalSerializer(FramedSerializer):
     """
      http://docs.python.org/2/library/marshal.html
     """
    dumps = marshal.dumps
    loads = marshal.loads

i. Instance Methods

Inherited from FramedSerializer: __init__, dump_stream, load_stream
Inherited from Serializer: __eq__, __ne__
Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

ii. Class Variables

dumps = marshal.dumps

Generally, it Serializes an object into a byte array.

loads = marshal.loads

Whereas, it Deserialize an object from a byte array.

PickleSerializer

By using PySpark’s Pickle Serializer, it Serializes Objects. However, its best feature is its supports nearly any Python object, although it is not as fast as more specialized serializers.

class PickleSerializer(FramedSerializer):
     """
         http://docs.python.org/2/library/pickle.html
     """
-    def dumps(self, obj):
         return cPickle.dumps(obj, 2)
     loads = cPickle.loads

i. Instance Methods

dumps(self, obj)

It Serializes an object into a byte array. However, this will be called with an array of objects, when batching is used.

Inherited from FramedSerializer: __init__, dump_stream, load_streamInherited from Serializer: __eq__, __ne__
Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

ii. Class Variables

loads = cPickle.loads

Basically, it deserializes an object from a byte array.
So, this is all about PySpark Serializers. Hope you like our explanation.

Conclusion

Hence, we have covered all about PySpark Serializers in this article. Also, we have learned about both the types, Marshal and Pickle Serializers which are supported in PySpark, along with their codes.

Still, if any doubt occurs regarding PySpark Serializers, feel free to ask in the comment section. We will definitely get back to you.

We work very hard to provide you quality material
Could you take 15 seconds and share your happy experience on Google

Dinesh says:
January 26, 2019 at 12:18 am
Can I use Kryo serializer with Pyspark
- Pablo says:
  December 7, 2022 at 11:58 pm
  No, since Kryo it’s a java serializer. It can be use with scala as an example, but for Pyspark we need to use python serializers.

PySpark Serializers and Its Types – Marshal & Pickle

What is PySpark Serializers?

Types of PySpark Serializers

MarshalSerializer

i. Instance Methods

ii. Class Variables

PickleSerializer

i. Instance Methods

ii. Class Variables

Conclusion

2 Responses

Leave a Reply Cancel reply

About DataFlair

Trending Courses

Trending Data Science Courses

Free Big Data Courses

Trending Programming Courses

Trending Data Science Tutorials

Trending Projects

Trending Programming Tutorials

Trending Tutorials