Protobuf - Introduction



Before we jump into Protocol Buffer, let us go over a brief background of Serialization which is what Protocol Buffer does.

What is Serialization and Deserialization?

Serialization is the process of converting an object (of any language) into bytes and storing them in persistent memory system. This memory system could be a file on the disk, messaging queue or a database. The major intention with serialization of object is that we can reuse the data and recreate the object on same or different machine. In deserialization, we convert the stored bytes back to an object.

Why do we need Serialization and Deserialization?

While there are a few other use-cases, the most basic and important one is that it provides a way to transfer object data over a network to a different service/machine etc. and then to recreate object for its further use. Transferring object data via API, database or messaging queue requires the object to be converted into bytes so that it can be sent over a network. And this is where serialization becomes important.

In microservice architecture, the application is broken down into small services and these services communicate with each other via messaging queue and APIs. And all of this communication happens over a network which requires frequent conversion of object to bytes and back to objects. So, serialization and deserialization becomes very critical aspects when it comes to distributed environment.

Why Google Protobuf?

Google Protobuf performs the serialization and deserialization of the objects to bytes which can be transferred over the network. But there are some other libraries and mechanisms to transfer data as well.

So, what makes Google Protobuf special? Here are some of its important features −

  • Language independent − Multiple languages have protobuf library, few famous ones being Java, Python, Go, etc. So, a Java object can be serialized into bytes from a Java program and can be deserialized to a a Python object.

  • Efficient Data Compaction − In microservice environment, given that multiple communications take place over a network, it is critical that the data that we are sending is as succinct as possible. We need to avoid any superfluous information to ensure that the data is quickly transferred. Google Protobuf has that as one of the focus areas.

  • Efficient serialization and deserialization − In microservice environment, given that multiple communications take place over a network, it is critical how fast can we serialize and deserialize. Google Protobuf ensures that it is as quick as possible in serializing and deserializing the data.

  • Simple to use − Protobuf library auto-generates serialization code (as we will see in the upcoming chapters), has a versioning scheme to ensure that the creator of data and the user of data can have separate versions of the serialization definition, etc.

Protobuf vs Others (XML/JSON/Java serialization)

Let's take a look how other ways to transfer data over a network stack up against Protobuf.

Feature Protobuf JSON XML
Language independent Yes Yes Yes
Serialized data size Least of three Less than XML Highest among the three
Human Readable No, as it uses separate encoding schema Yes, as it uses text based format Yes, as it uses text based format
Serialization speed Fastest among the three Faster than XML Slowest among the three
Data type support Richer than other two. Supports complex data types like Any, one of etc. Supports basic data types Supports basic data types
Support for evolving schema Yes No No
Advertisements