Python Google Protocol Buffer

This article mainly introduces how to use Google Protocol Buffer (hereinafter abbreviated as PB) in Python language, including the following parts:

Why use PB?

PB (Protocol Buffer) is a structured data exchange format developed by Google as a standard writing format for Tencent Cloud Log Service. Therefore, before it is used to write log data, the original log data needs to be serialized into a PB data stream and then written to the server through the API. However, it is inconvenient to operate the PB format in each end class program, so it is necessary to add a PB conversion layer between the end class and the log service.

Of course, the PB format also has its own advantages, mainly simple and fast. For specific test results, see Google serialization benchmark analysis

Install Google PB

If you want to use PB in Python, you need to install the PB compiler protoc to compile your .proto file. The installation method is as follows:

Download the latest protobuf release package and install it. The current version is 3.5.1. The installation steps are as follows

wget https://github.com/google/protobuf/releases/download/v3.5.1/protobuf-all-3.5.1.tar.gz
tar xvfz protobuf-all-3.5.1.tar.gz
cd protobuf-3.5.1/./configure --prefix=/usr
make
make check
make install

Passing all the check steps means that the compilation is passed.

Continue to install the python module of protobuf

cd ./python 
python setup.py build 
python setup.py test 
python setup.py install

The installation is complete to verify the protoc command

root@ubuntu:~# protoc --version
libprotoc 3.5.1

The default installation location of protobuf is /usr/local, and /usr/local/lib is not in the default LD_LIBRARY_PATH of the Ubuntu system. If the installation path is not specified as /usr during configure in the Ubuntu system, the following error will occur

protoc: error while loading shared libraries: libprotoc.so.8: cannot open shared object file: No such file or directory

You can use the ldconfig command to solve, refer to Protobuf cannot find shared libraries, this error is mentioned in the README of the installation package. Of course you can reinstall

Verify that the Python module is installed correctly

import google.protobuf

If the above import does not report an error in the python interpreter, the installation is normal.

Custom .proto file#

First, we need to write a proto file to define the structured data that needs to be processed in our program. In protobuf terminology, structured data is called Message. The proto file is very similar to the data definition in java or C++ language. The proto example file cls.Log.proto is as follows:

syntax ="proto2";package cls;
message Log
{
 optional uint64 time =1;// UNIX Time Format
 required string topic_id =2;
 required string content =3;}

. The beginning of the proto file is the declaration of the package to help prevent naming conflicts in different projects. In Python, the package is usually determined by the directory structure, so the package defined by this .proto file has no effect in the actual Python code. However, the official recommendation is to insist on declaring this statement, the main function is to prevent name conflicts in the PB namespace. The package name is cls, and it defines a message Log. The message has three members. The meaning of each member is as follows:

Field name type location is required meaning
time uint64 body No log time, if not specified, the time when the server received the request is used
topic_id string body Yes The id of the log topic reported by the log
content string body is log content

A good habit is to take the file name of the proto file seriously. For example, the naming rule is set as: packageName.MessageName.proto

Compile .proto file#

Use the compiler protoc to compile directly, you need to specify the source file path and the target file path

SRC_DIR=/tmp/src_dir
DST_DIR=/tmp/dst_dir
protoc -I=$SRC_DIR --python_out=$DST_DIR $SRC_DIR/cls.Log.proto

Use the --python_out option when generating Python classes, and use the --cpp_out option when generating C++ classes

Parse the target py file#

The file directory generated in the target folder corresponds to the following:

root@ubuntu:/tmp/dst_dir# tree
.
└── cls
 └── Log_pb2.py

1 directory,1 file

The contents of the Log_pb2.py file are as follows (editing is not allowed):

# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: cls.Log.proto

import sys
_ b=sys.version_info[0]<3and(lambda x:x)or(lambda x:x.encode('latin1'))from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from google.protobuf import reflection as _reflection
from google.protobuf import symbol_database as _symbol_database
from google.protobuf import descriptor_pb2
# @@ protoc_insertion_point(imports)

_ sym_db = _symbol_database.Default()

DESCRIPTOR = _descriptor.FileDescriptor(
 name='cls.Log.proto',package='cls',
 syntax='proto2',
 serialized_pb=_b('\n\rcls.Log.proto\x12\x03\x63ls\"6\n\x03Log\x12\x0c\n\x04time\x18\x01 \x01(\x04\x12\x10\n\x08topic_id\x18\x02 \x02(\t\x12\x0f\n\x07\x63ontent\x18\x03 \x02(\t'))

_ LOG = _descriptor.Descriptor(
 name='Log',
 full_name='cls.Log',
 filename=None,
 file=DESCRIPTOR,
 containing_type=None,
 fields=[
 _ descriptor.FieldDescriptor(
  name='time', full_name='cls.Log.time', index=0,
  number=1, type=4, cpp_type=4, label=1,
  has_default_value=False, default_value=0,
  message_type=None, enum_type=None, containing_type=None,
  is_extension=False, extension_scope=None,
  options=None, file=DESCRIPTOR),
 _ descriptor.FieldDescriptor(
  name='topic_id', full_name='cls.Log.topic_id', index=1,
  number=2, type=9, cpp_type=9, label=2,
  has_default_value=False, default_value=_b("").decode('utf-8'),
  message_type=None, enum_type=None, containing_type=None,
  is_extension=False, extension_scope=None,
  options=None, file=DESCRIPTOR),
 _ descriptor.FieldDescriptor(
  name='content', full_name='cls.Log.content', index=2,
  number=3, type=9, cpp_type=9, label=2,
  has_default_value=False, default_value=_b("").decode('utf-8'),
  message_type=None, enum_type=None, containing_type=None,
  is_extension=False, extension_scope=None,
  options=None, file=DESCRIPTOR),],
 extensions=[],
 nested_types=[],
 enum_types=[],
 options=None,
 is_extendable=False,
 syntax='proto2',
 extension_ranges=[],
 oneofs=[],
 serialized_start=22,
 serialized_end=76,)

DESCRIPTOR.message_types_by_name['Log']= _LOG
_ sym_db.RegisterFileDescriptor(DESCRIPTOR)

Log = _reflection.GeneratedProtocolMessageType('Log',(_message.Message,),dict(
 DESCRIPTOR = _LOG,
 __ module__ ='cls.Log_pb2'
 # @@ protoc_insertion_point(class_scope:cls.Log)))
_ sym_db.RegisterMessage(Log)

# @@ protoc_insertion_point(module_scope)

The analysis of the source code of the py file generated by pb is temporarily shelved, you can refer to the information in the attachment

Serialization and deserialization#

#! /usr/bin/env python
# - *- coding: utf-8-*-"""
Created on 1/30/184:23 PM
@ author: Chen Liang
@ function: pb test
"""

import sys

reload(sys)
sys.setdefaultencoding('utf-8')import Log_pb2
import json

def serialize_to_string(msg_obj):
 ret_str = msg_obj.SerializeToString()return ret_str

def parse_from_string(s):
 log = Log_pb2.Log()
 log.ParseFromString(s)return log

if __name__ =='__main__':
 # serialize_to_string
 content_dict ={"live_id":"1239182389648923","identify":"zxc_unique"}
 tencent_log = Log_pb2.Log()
 tencent_log.time =1510109254
 tencent_log.topic_id ="John Doe"
 tencent_log.content = json.dumps(content_dict)
 ret_s =serialize_to_string(tencent_log)print(type(ret_s))print(ret_s)

 # parse_from_string
 log_obj =parse_from_string(ret_s)print(log_obj)

The key operations are the writing and reading of message objects, the serialization function SerializeToString and the deserialization function ParseFromString

More complicated Message

So far, we have only given a simple example of uploading logs. In practical applications, people often need to define more complex Messages. We use the word "complex" not only to refer to more fields or more types of fields in terms of number, but to more complex data structures:

The following are introduced separately

Message nesting##

Nesting is a magical concept. Once you have nesting capabilities, the ability to express messages will be very powerful. Specific examples of nested Message are as follows

message Person { 
 required string name =1; 
 required int32 id =2;// Unique ID number for this person. 
 optional string email =3;enum PhoneType { 
 MOBILE =0; 
 HOME =1; 
 WORK =2;} 
 
 message PhoneNumber { 
 required string number =1; 
 optional PhoneType type =2[default= HOME];} 
 repeated PhoneNumber phone =4;}

In Message Person, a nested message PhoneNumber is defined and used to define the phone field in the Person message. This allows people to define more complex data structures.

Import Message

In a .proto file, you can also use the Import keyword to import messages defined in other .proto files, which can be called Import Message, or Dependency Message. Examples of specific import messages are as follows

import common.header; 
 
message youMsg{ 
 required common.info_header header =1; 
 required string youPrivateData =2;}

Among them, common.info_header is defined in the common.header package.

The main purpose of Import Message is to provide a convenient code management mechanism, similar to header files in C language. You can define some public messages in a package, and then import the package in other .proto files, and then use the message definitions in it.

Google Protocol Buffer can well support nested Message and the introduction of Message, which makes the work of defining complex data structures very easy and pleasant.

Dynamic compilation#

In general, people who use Protobuf will write the .proto file first, and then use the Protobuf compiler to generate the source code files needed by the target language. Compile these generated codes with the application.

However, in certain circumstances, people cannot know the .proto file in advance, and they need to dynamically process some unknown .proto files. For example, a general message forwarding middleware cannot predict what kind of message needs to be processed. This requires dynamically compiling the .proto file and using the Message in it.

For detailed explanation, please refer to: [Use and Principle of Google Protocol Buffer] (https://www.ibm.com/developerworks/cn/linux/l-cn-gpb/)

reference:

  1. https://developers.google.com/protocol-buffers/docs/reference/python/
  2. https://developers.google.com/protocol-buffers/docs/reference/python-generated
  3. http://hzy3774.iteye.com/blog/2323428
  4. https://github.com/google/protobuf/tree/master/python
  5. https://github.com/google/protobuf/tree/master/examples
  6. https://blog.csdn.net/losophy/article/details/17006573
  7. https://www.ibm.com/developerworks/cn/linux/l-cn-gpb/
  8. https://github.com/google/protobuf
  9. https://github.com/google/protobuf/releases/download/v3.5.1/protobuf-all-3.5.1.tar.gz
  10. Python Google Protocol Buffer: https://developers.google.com/protocol-buffers/docs/pythontutorial

Recommended Posts

Python Google Protocol Buffer
Python function buffer
Google Python Programming Style Guide