In python, Pickling and Unpickling are processes used for serializing and deserializing the objects, respectively. These processes are commonly used in data analysis and machine learning tasks to save and load trained models, data structures, or any Python objects to and from disk.
Pickling means serializing objects and unpickling means deserializing objects. Pickling is the process of converting a Python object into a byte stream, which can then be stored in a file or transferred over a network. Unpickling is the process of converting a byte stream back into a Python object. Let’s first understand what is Serialize and Deserialize objects in python.
Serialize and Deserialize
Serializing and deserializing objects in Python refers to the process of converting Python objects into a format that can be easily stored, transferred, or communicated over different platforms, networks, or systems.
Serialize refers to the process of converting a Python object into a byte stream or a string representation. Serialized data is often in a standardized format that preserves the state and structure of the original object. Serialized objects can be stored persistently in files, databases, or transmitted over networks. It is commonly used for tasks like saving model states, data persistence.
Deserialize refers to the process of reconstructing a Python object from its serialized form. Deserialization involves parsing the serialized data and recreating the original object with its original state and structure.
There are various serialization formats in Python, such as JSON, XML, Protocol Buffers, YAML, and Pickle. We are going to use pickle formatting here.
Important Functions for Pickling and Unpickling
- pickle.dump() − It write the pickled representation of the object to the open object file.
- pickle.load() − It read the pickled representation of an object from the open object file.
- pickle.HIGHEST_PROTOCOL − The highest protocol version available. Integer value.
Syntax:
pickle.dump(object, file_obj, protocol)
Here this function takes three arguments –
- Python object to serialize.
- File object in which the serialized python object must be stored.
- Protocol (If a protocol is not specified, protocol 0 is used. If the protocol is specified as a negative value or HIGHEST_PROTOCOL, the highest protocol version available will be used.)
Example of Pickling and Unpickling
import pickle
# Data to pickle
data = {'name': 'Akki', 'age': 30, 'scores': [15, 95, 98]}
# Pickle the data and save it to a file
with open('new_file.pickle', 'wb') as f:
pickle.dump(data, f)
In this above code we took an object name ‘data’ in the form of dictionary and created the new file name new_file.pickle. Here we are opening the file in ‘wb’ write binary mode. This is called pickling the object, here we used dump() method to dump all object data into the file. Now let’s do the unpickling of the same object from this ‘new_file.pickle‘.
import pickle
# Load pickled data from a file
with open('new_file.pickle', 'rb') as f:
loaded_data = pickle.load(f)
print(loaded_data)
In this above code we are loading the pickled data from the file using load() method. We are opening the file in read mode now ‘rb’.
Use Cases
- Saving and loading machine learning models for inference or deployment.
- Storing intermediate results or preprocessing steps in data analysis pipelines.
- Caching expensive computations to avoid recomputation.
Pickling and Unpickling in Data Analysis
Example:
import pickle
import seaborn as sns
df = sns.load_dataset('tips')
df.head()
filename = 'demo.pickle'
pickle.dump(df, open(filename, 'wb'))
df = pickle.load(open(filename, 'rb'))
df.head()
In this above code, we took any random dataset from seaborn python library and pickled this dataset using dump() method into the file name demo. Then we unpickled it with load() method.
The following data types can be pickled:
- Booleans,
- Integers,
- Floats,
- Complex numbers,
- (Normal and Unicode) Strings,
- Tuples,
- Lists,
- Sets, and Dictionaries that contain pickable objects.
- Built-in functions defined at the top level of a module