Python Data Persistence - Object Serialization



Python's built-in file object returned by Python's built-in open() function has one important shortcoming. When opened with 'w' mode, the write() method accepts only the string object.

That means, if you have data represented in any non-string form, the object of either in built-in classes (numbers, dictionary, lists or tuples) or other user-defined classes, it cannot be written to file directly. Before writing, you need to convert it in its string representation.

numbers=[10,20,30,40]
   file=open('numbers.txt','w')
   file.write(str(numbers))
   file.close()

For a binary file, argument to write() method must be a byte object. For example, the list of integers is converted to bytes by bytearray() function and then written to file.

numbers=[10,20,30,40]
   data=bytearray(numbers)
   file.write(data)
   file.close()

To read back data from the file in the respective data type, reverse conversion needs to be done.

file=open('numbers.txt','rb')
   data=file.read()
   print (list(data))

This type of manual conversion, of an object to string or byte format (and vice versa) is very cumbersome and tedious. It is possible to store the state of a Python object in the form of byte stream directly to a file, or memory stream and retrieve to its original state. This process is called serialization and de-serialization.

Python’s built in library contains various modules for serialization and deserialization process.

Sr.No. Name & Description
1

pickle

Python specific serialization library

2

marshal

Library used internally for serialization

3

shelve

Pythonic object persistence

4

dbm

library offering interface to Unix database

5

csv

library for storage and retrieval of Python data to CSV format

6

json

Library for serialization to universal JSON format

Advertisements