This article mainly introduces how to use Google Protocol Buffer (hereinafter abbreviated as PB) in Python language, including the following parts:
PB (Protocol Buffer) is a structured data exchange format developed by Google as a standard writing format for Tencent Cloud Log Service. Therefore, before it is used to write log data, the original log data needs to be serialized into a PB data stream and then written to the server through the API. However, it is inconvenient to operate the PB format in each end class program, so it is necessary to add a PB conversion layer between the end class and the log service.
Of course, the PB format also has its own advantages, mainly simple and fast. For specific test results, see Google serialization benchmark analysis
If you want to use PB in Python, you need to install the PB compiler protoc to compile your .proto file. The installation method is as follows:
Download the latest protobuf release package and install it. The current version is 3.5.1. The installation steps are as follows
wget https://github.com/google/protobuf/releases/download/v3.5.1/protobuf-all-3.5.1.tar.gz
tar xvfz protobuf-all-3.5.1.tar.gz
cd protobuf-3.5.1/./configure --prefix=/usr
make
make check
make install
Passing all the check steps means that the compilation is passed.
Continue to install the python module of protobuf
cd ./python
python setup.py build
python setup.py test
python setup.py install
The installation is complete to verify the protoc
command
root@ubuntu:~# protoc --version
libprotoc 3.5.1
The default installation location of protobuf is /usr/local, and /usr/local/lib is not in the default LD_LIBRARY_PATH of the Ubuntu system. If the installation path is not specified as /usr
during configure in the Ubuntu system, the following error will occur
protoc: error while loading shared libraries: libprotoc.so.8: cannot open shared object file: No such file or directory
You can use the ldconfig
command to solve, refer to Protobuf cannot find shared libraries, this error is mentioned in the README of the installation package. Of course you can reinstall
Verify that the Python module is installed correctly
import google.protobuf
If the above import does not report an error in the python interpreter, the installation is normal.
First, we need to write a proto file to define the structured data that needs to be processed in our program. In protobuf terminology, structured data is called Message. The proto file is very similar to the data definition in java or C++ language. The proto example file cls.Log.proto
is as follows:
syntax ="proto2";package cls;
message Log
{
optional uint64 time =1;// UNIX Time Format
required string topic_id =2;
required string content =3;}
. The beginning of the proto file is the declaration of the package to help prevent naming conflicts in different projects. In Python, the package is usually determined by the directory structure, so the package defined by this .proto file has no effect in the actual Python code. However, the official recommendation is to insist on declaring this statement, the main function is to prevent name conflicts in the PB namespace. The package name is cls, and it defines a message Log. The message has three members. The meaning of each member is as follows:
Field name | type | location | is required | meaning |
---|---|---|---|---|
time | uint64 | body | No | log time, if not specified, the time when the server received the request is used |
topic_id | string | body | Yes | The id of the log topic reported by the log |
content | string | body | is | log content |
A good habit is to take the file name of the proto file seriously. For example, the naming rule is set as: packageName.MessageName.proto
Use the compiler protoc to compile directly, you need to specify the source file path and the target file path
SRC_DIR=/tmp/src_dir
DST_DIR=/tmp/dst_dir
protoc -I=$SRC_DIR --python_out=$DST_DIR $SRC_DIR/cls.Log.proto
Use the --python_out
option when generating Python classes, and use the --cpp_out
option when generating C++ classes
The file directory generated in the target folder corresponds to the following:
root@ubuntu:/tmp/dst_dir# tree
.
└── cls
└── Log_pb2.py
1 directory,1 file
The contents of the Log_pb2.py file are as follows (editing is not allowed):
# Generated by the protocol buffer compiler. DO NOT EDIT!
# source: cls.Log.proto
import sys
_ b=sys.version_info[0]<3and(lambda x:x)or(lambda x:x.encode('latin1'))from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from google.protobuf import reflection as _reflection
from google.protobuf import symbol_database as _symbol_database
from google.protobuf import descriptor_pb2
# @@ protoc_insertion_point(imports)
_ sym_db = _symbol_database.Default()
DESCRIPTOR = _descriptor.FileDescriptor(
name='cls.Log.proto',package='cls',
syntax='proto2',
serialized_pb=_b('\n\rcls.Log.proto\x12\x03\x63ls\"6\n\x03Log\x12\x0c\n\x04time\x18\x01 \x01(\x04\x12\x10\n\x08topic_id\x18\x02 \x02(\t\x12\x0f\n\x07\x63ontent\x18\x03 \x02(\t'))
_ LOG = _descriptor.Descriptor(
name='Log',
full_name='cls.Log',
filename=None,
file=DESCRIPTOR,
containing_type=None,
fields=[
_ descriptor.FieldDescriptor(
name='time', full_name='cls.Log.time', index=0,
number=1, type=4, cpp_type=4, label=1,
has_default_value=False, default_value=0,
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
options=None, file=DESCRIPTOR),
_ descriptor.FieldDescriptor(
name='topic_id', full_name='cls.Log.topic_id', index=1,
number=2, type=9, cpp_type=9, label=2,
has_default_value=False, default_value=_b("").decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
options=None, file=DESCRIPTOR),
_ descriptor.FieldDescriptor(
name='content', full_name='cls.Log.content', index=2,
number=3, type=9, cpp_type=9, label=2,
has_default_value=False, default_value=_b("").decode('utf-8'),
message_type=None, enum_type=None, containing_type=None,
is_extension=False, extension_scope=None,
options=None, file=DESCRIPTOR),],
extensions=[],
nested_types=[],
enum_types=[],
options=None,
is_extendable=False,
syntax='proto2',
extension_ranges=[],
oneofs=[],
serialized_start=22,
serialized_end=76,)
DESCRIPTOR.message_types_by_name['Log']= _LOG
_ sym_db.RegisterFileDescriptor(DESCRIPTOR)
Log = _reflection.GeneratedProtocolMessageType('Log',(_message.Message,),dict(
DESCRIPTOR = _LOG,
__ module__ ='cls.Log_pb2'
# @@ protoc_insertion_point(class_scope:cls.Log)))
_ sym_db.RegisterMessage(Log)
# @@ protoc_insertion_point(module_scope)
The analysis of the source code of the py file generated by pb is temporarily shelved, you can refer to the information in the attachment
#! /usr/bin/env python
# - *- coding: utf-8-*-"""
Created on 1/30/184:23 PM
@ author: Chen Liang
@ function: pb test
"""
import sys
reload(sys)
sys.setdefaultencoding('utf-8')import Log_pb2
import json
def serialize_to_string(msg_obj):
ret_str = msg_obj.SerializeToString()return ret_str
def parse_from_string(s):
log = Log_pb2.Log()
log.ParseFromString(s)return log
if __name__ =='__main__':
# serialize_to_string
content_dict ={"live_id":"1239182389648923","identify":"zxc_unique"}
tencent_log = Log_pb2.Log()
tencent_log.time =1510109254
tencent_log.topic_id ="John Doe"
tencent_log.content = json.dumps(content_dict)
ret_s =serialize_to_string(tencent_log)print(type(ret_s))print(ret_s)
# parse_from_string
log_obj =parse_from_string(ret_s)print(log_obj)
The key operations are the writing and reading of message objects, the serialization function SerializeToString
and the deserialization function ParseFromString
So far, we have only given a simple example of uploading logs. In practical applications, people often need to define more complex Messages. We use the word "complex" not only to refer to more fields or more types of fields in terms of number, but to more complex data structures:
The following are introduced separately
Nesting is a magical concept. Once you have nesting capabilities, the ability to express messages will be very powerful. Specific examples of nested Message are as follows
message Person {
required string name =1;
required int32 id =2;// Unique ID number for this person.
optional string email =3;enum PhoneType {
MOBILE =0;
HOME =1;
WORK =2;}
message PhoneNumber {
required string number =1;
optional PhoneType type =2[default= HOME];}
repeated PhoneNumber phone =4;}
In Message Person, a nested message PhoneNumber is defined and used to define the phone field in the Person message. This allows people to define more complex data structures.
In a .proto file, you can also use the Import keyword to import messages defined in other .proto files, which can be called Import Message, or Dependency Message. Examples of specific import messages are as follows
import common.header;
message youMsg{
required common.info_header header =1;
required string youPrivateData =2;}
Among them, common.info_header
is defined in the common.header
package.
The main purpose of Import Message is to provide a convenient code management mechanism, similar to header files in C language. You can define some public messages in a package, and then import the package in other .proto files, and then use the message definitions in it.
Google Protocol Buffer can well support nested Message and the introduction of Message, which makes the work of defining complex data structures very easy and pleasant.
In general, people who use Protobuf will write the .proto file first, and then use the Protobuf compiler to generate the source code files needed by the target language. Compile these generated codes with the application.
However, in certain circumstances, people cannot know the .proto file in advance, and they need to dynamically process some unknown .proto files. For example, a general message forwarding middleware cannot predict what kind of message needs to be processed. This requires dynamically compiling the .proto file and using the Message in it.
For detailed explanation, please refer to: [Use and Principle of Google Protocol Buffer] (https://www.ibm.com/developerworks/cn/linux/l-cn-gpb/)
reference: