- Serialization is the process of transforming objects into a structure that can be transmitted through a network or stored in a file. Objects are instances of classes or basic types of a programming language.
- This article discusses on how to use Protocol Buffers offered by Google as a serialization solution from a Python Program.
- Since there are several serialization formats available for a Python programmer like Pickle, XML, JSON and Protocol Buffers, a comparison explaining why to use the Protocol Buffers over the others is provided along with an example python program.
Why Protocol Buffers need to be used over Python's Pickle, XML and JSON:
- Python provides its own process of serialization called pickling.
- Pickling uses binary format.
- Through pickling Python objects are transformed and serialized into byte streams.
- Serialized objects are converted back into python objects using unpickling.
- However, pickling has little use in a heterogeneous environment where the serialized objects are consumed by non-python based systems.
XML and JSON:
- Though XML and JSON are widely used data exchange formats which are human readable as well, they are not as compact as the binary format provided by Protocol Buffers.
- Since the Protocol Buffers are represented in binary format the performance gained in terms of time and space are manifold over using XML and JSON.
How to use Protocol Buffers in a Python Program:
- Define one or more message types using protobuf syntax and save the file with a .proto extension.
- The message type syntax is somewhat similar to writing C++ classes.
- Compile the .proto file using the protobuf compiler protoc.
- The compilation will produce a python file named <message file name>_pb2.py.
- Now use the protobuf messages as just like Python classes.
- Import the messages which are Python classes, from the newly produced file <message file name>_pb2.py .
- Instantiate as many message instances as you want and assign values for the message attributes.
- Serialize the messages using protobuf, by calling the method SerializeToString() on the message instances.
The message definition using Protocol Buffers:
syntax = "proto2";
required string scripName = 1;
required string scripCode = 2;
required int32 quantity = 3;
required double unitPrice = 4;
required string venue = 5;
required string orderSide = 6;
required int64 timeStamp = 7;
repeated Quote quotes = 1;
A Python Program that uses Protocol Buffers:
from quote_pb2 import Quote
from quote_pb2 import Quotes
# Create an empty quoteCollection
quoteCollection = quote_pb2.Quotes()
# Create an MSFT quote
quoteMSFT = quoteCollection.quotes.add()
quoteMSFT.scripName = "MSFT"
quoteMSFT.scripCode = "1234";
quoteMSFT.quantity = 100;
quoteMSFT.unitPrice = 92.63;
quoteMSFT.venue = "NASDAQ GS";
quoteMSFT.orderSide = "Buy";
quoteMSFT.timeStamp = 1521458280;
# Create a GOOG quote
quoteGOOG = quoteCollection.quotes.add()
quoteGOOG.scripName = "GOOG"
quoteGOOG.scripCode = "5678";
quoteGOOG.quantity = 50;
quoteGOOG.unitPrice = 1096.25;
quoteGOOG.venue = "NASDAQ GS";
quoteGOOG.orderSide = "Buy";
quoteGOOG.timeStamp = 1521458283;
quotesFile = open("quotes.txt", "wb")
MSFT1234d!∏ÖÎQ(W@* NASDAQ GS2Buy8Ë∏æ’
GOOG56782!!ë@* NASDAQ GS2Buy8Î∏æ’