Python Pickle | What is Serialization in Python with Example
1. Python Pickle Tutorial
In the Last tutorial, we talked about Python Virtual Environment. In this Python Pickle tutorial, we will study what is a Pickle in Python and how Python Serialization deals with the ‘pickle’ module of Python for the purpose of serialization. At last, we will discuss some Python Pickle Examples.
So, let’s start the Python Pickle Tutorial.
2. What is Serialization in Python?
In Python, when we want to serialize and de-serialize a Python object, we use functions and methods from the module Python Pickle. Pickling, then, is the act of converting a Python object into a byte stream. We also call this ‘serialization’, ‘marshalling’, or ‘flattening’. Unpickling is its inverse, ie., converting a byte stream from a binary file or bytes-like object into an object. Lets start with comparing Python serialize with other modules of Python.
Learn: Python Modules vs Packages
3. Comparing Python Pickle to Other Python Modules
a. Comparing Python pickle to marshal
‘marshal’ is a more primitive module for serialization in Python, and its purpose is to support .pyc files. However, we prefer Python pickle. The two differ in the following ways:
- Python pickle tracks the objects it has serialized. Because of this, it doesn’t have to serialize the same objects again when it references them again. This is unlike marshal.
- marshal cannot serialize user-defined classes and their instances. If the class definition is importable and in the same module as when we stored the object, pickle can save and restore class instances.
- The serialization format for pickle in Python is backwards-compatible. This isn’t the same with marshal.
b. Comparing Python pickle to json
json is a standard library module for serialization and deserialization with Python.
- Where Python pickle has a binary serialization format, json has a text serialization format.
- Python pickle isn’t human-readable, but marshal isn’t.
- pickle is Python-specific, but JSON is interoperable.
- pickle can represent a very large number of Python types. However, json can only represent a subset of Python’s in-built types.
Learn: Python Modules
4. Python Pickle Supports Data Stream Format
Python pickle uses a Python-specific data format. So, external standards like JSON or XDR impose no restrictions. But this makes for inability of non-Python programs to reconstruct pickled Python objects.
Like we said above, Python pickle uses a data format with a relatively compact binary representation. We can efficiently compress it.
Complementary to Python pickle is the module ‘pickletools’ for analyzing data streams that it generates.
We have five different protocols for pickling:
- Protocol version 0: Original, human-readable protocol; backwards-compatible with earlier versions of Python.
- Protocol version 1: Old, binary format; compatible with earlier versions of Python.
- Protocol version 2: Added in Python 2.3; provides more efficient pickling of new-style classes.
- Protocol version 3: Introduced in Python 3.0; default; supports bytes objects; cannot be unpickled by Python 2.x. It is recommended when we need compatibility with other Python 3 versions.
- Protocol version 4: Introduced in Python 3.4; supports very large objects, more kinds of objects, and certain data format optimizations.
5. Python Pickle Module Interface
To serialize and deserialize, we use functions dumps() and loads(), respectively. Alternatively, we can create our own Pickler and Unpickler objects for more control over this.
Python pickle has two constants:
This is an integer, and it holds the highest protocol version that is available. We can pass this as a protocol value to dump() and dumps(), and to the Pickler constructor.
Also an integer, this holds the default protocol version for pickling. The default is currently Protocol 3.
It also has the following functions:
i. dump(obj, file, protocol=None, *, fix_imports=True)
This writes a pickled representation of object obj to file, an open file object. Consider this equivalent to Pickler(file, protocol).dump(obj)
>>> x=7 >>> import os >>> os.chdir('C:\\Users\\lifei\\Desktop') >>> import pickle >>> f=open('abcde.txt','r+b') //opened it in binary mode to pickle >>> pickle.dump(x,f)
When we checked in the file abcde.txt, we found this:
file, here, must have a write() method accepting a single bytes argument. So, it can be a file you opened in binary mode, an io.BytesIO instance, or a custom object meeting this interface. Protocol lets us choose which protocol to use.
When fix_imports is true and we use a protocol less than 3, pickle maps new Python 3 names to old module names in Python 2. This lets Python 2 read the pickle data stream.
ii. dumps(obj, protocol=None, *, fix_imports=True)
This returns the pickled representation of obj as a bytes object. This does not write it to a file.
iii. load(file, *, fix_imports=True, encoding=”ASCII”, errors=”strict”)
load() takes in file, an open file object, reads a pickled representation from it, and returns the reconstructed object hierarchy. Consider this equivalent to Unpickler(file).load()
file can be a file object opened in the binary reading mode, an io.BytesIO object, or an object that meets its interface. This is because it must have two methods- read(), that takes one integer argument, and readline(), that takes no arguments. Both of these methods must return bytes.
fix_imports , encoding, and errors help control compatibility support for pickle streams by Python 2. When fix_imports is true, pickle maps old Python 2 names to new Python 3 names. The other two guide pickle with decoding 8-but string instances pickled by Python 2. The default encoding is ‘ASCII’, and the default value for errors is ‘strict’. To read such 8-bit string instances as bytes objects, we can set the encoding to ‘bytes’.
Let’s try doing this.
Traceback (most recent call last):
File “<pyshell#63>”, line 1, in <module>
EOFError: Ran out of input
Uh-oh. Let’s get to the beginning of the file.
Now, we can successfully load it.
iv. loads(bytes_object, *, fix_imports=True, encoding=”ASCII”, errors=”strict”)
This function takes in a bytes object, reads a pickled object hierarchy, and returns the reconstructed object hierarchy.
fix_imports, encoding, and errors help control compatibility support for pickle streams that Python 2 generates. When it is true, pickle maps old Python 2 names to new Python 3 names. encoding guides pickle with decoding 8-bit string instances pickled by Python 2. The default for encoding is ‘ASCII’, and that for errors is ‘strict’. To read such 8-byte instances as bytes objects, we can set the encoding to ‘bytes’.
6. Python Pickle Exceptions
The Python pickle module also defines three kinds of exceptions:
This is the common parent class for all other pickling exceptions. It, in turn, inherits from Exception.
When the Pickler encounters an unpicklable object, it raises a PicklingError. This class inherits from PickleError.
When Python pickle cannot unpickle an object due to data corruption or a security violation, it raises an UnpicklingError. This inherits from PickleError.
Some other exceptions we observe when pickling, as we did above, include:
7. Imported Classes in Python Pickle
Python pickle imports two classes- Pickler and Unpickler:
a. Pickler(file, protocol=None, *, fix_imports=True)
Pickler takes in a binary file and writes a pickle data stream.
file must have a write() method accepting a single bytes argument. This can be a file object for a file opened for writing in binary mode, an io.BytesIO instance, or a custom object meeting this interface.
protocol is an integer, and informs the pickler about which protocol to use (0 to HIGHEST_PROTOCOL). Otherwise, it uses DEFAULT_PROTOCOL. On providing a negative number, it selects HOGHEST_PROTOCOL.
When fix_imports is true and the protocol version is less than 3, pickle maps new Python 3 names to old Python 2 names. This makes the Python data stream readable by Python 2.
Python Pickler has the following members:
This takes in obj and writes a pickled representation of it to the open file object specified in the constructor of Pickler.
By default, it does nothing. It only exists to let subclasses override it. If it returns none, pickle pickles obj as usual. Otherwise, Pickler emits the returned value as a persistent ID for obj. Unpickler.persistent_load() defines this context.
For an object of Pickler, a dispatch table is a registry holding reduction functions that we can declare with copyreg.pickle(). This mapping has classes as its keys, and reduction functions as its values.
A reduction function takes one argument of the class, and conforms to this interface as a __reduce__() method.
But pickler objects don’t have dispatch_tables by default. Instead, it makes use of the global dispatch table that the copyreg module manages. To customize pickling for an object of a specific object of Pickler, we can set dispatch_table to a dict-like object. Or, if one of the subclasses of Pickler has dispatch_table, then this serves as the default dispatch table for instances of that class.
Although this is deprecated(no longer advised), it enables fast mode when set to true. This mode disables memo, thereby speeding pickling as it doesn’t generate extra PUT opcodes. However, do not use it with self-referential objects, as it can set Pickler off into infinite recursion.
For more compact pickles, we can use pickletools.optimize().
b. Unpickler(file, *, fix_imports=True, encoding=”ASCII”, errors=”strict”)
The Unpickler takes in a binary file and reads a pickle data stream.
file must have the methods read()- that takes an integer argument, and readline()- that needs no arguments. Both of these methods must return bytes. This can be a file object opened for reading in binary mode, an io.BytesIO object, or a custom object meeting this interface.
pickle automatically detects the version of protocol used; we don’t need an argument for that.
fix_imports, encoding, and errors help control compatibility support for pickle streams generated by Python 2. When fix_imports is true, pickle maps old Python 2 names to new Python 3 names. encoding and errors guide pickle with decoding 8-bit string instances pickled by Python 2. The default for encoding is ‘ASCII’, and that for errors is ‘strict’. When we want to read such 8-bit string instances as bytes objects, we can set the encoding to ‘bytes’.
Unpickler has the following members:
This takes in an open file object, reads a pickled object representation, and returns the reconstructed object hierarchy.
By default, this raises an UnpicklingError. When we define it, however, it must return the object pertaining to the persistent ID pid. If we pass an invalid persistent ID, it raises an UnpicklingError.
If necessary, it imports module, and returns the object name from it. Here, module and name are str objects. We can also use find_class() to find functions.
A subclass can override this to control what kind of objects it can take, and how we can load them. This alleviates security risks.
Any doubt yet in Python Pickle? Please Comment.
7. What Can We Pickle and Unpickle?
We can pickle the following types:
- None, True, and False
- integers, floating point numbers, complex numbers
- strings, bytes, bytearrays
- tuples, lists, sets, and dictionaries holding only picklable objects
- functions defined at a module’s top level (using def, not lambda)
- built-in functions defined at a module’s top level
- classes defined at a module’s top level
- instances of such classes whose __dict__ or the result of calling __getstate__() is picklable
When we try to pickle an unpicklable object, pickle raises the PicklingError exception. In this process, it is possible that an unspecified number of bytes have already been written to the file.
In trying to pickle a highly-recursive data structure, we may exceed the maximum recursion depth. Such a case raises a RecursionError. However, we can raise that limit with sys.setrecursionlimit().
Let’s take a quick look.
We pickle functions by their ‘fully-qualified’ name references, not by their values. This way, we only pickle the function name and the module it resides in. We do not pickle the function’s code or attributes. So, we should be able to import the defining module in the unpickling environment. This module must hold the named object. Otherwise, it raises an exception.
We pickle classes by named reference. This way, the same restrictions apply in the unpickling environment. We do not pickle the class’ code or data. We only pickle instance data.
Such restrictions mandate that we define picklable classes and functions in a module’s top level.
This was all on Python Pickle and Python Serialization. Hope you now understand Python Serialization.
8. Conclusion: Python Pickle and Serialization
Simply speaking, Python serialization is the act of converting a Python object into a byte stream. In Python, we use the module ‘pickle’, which has a binary serializable format. We can also serialize classes and functions. We have also studied in detail about Python Pickle and its comparison with other modules. If you have any queries in Python pickle or Python Serialization, Please Comment.
Python Collection Modules