19
MicroPython: Effective Serial Data Processing
Sending serial data between microcontrollers and single-board-computers is an easy way to exchange information. In the last article, I showed three options how to connect the Raspberry Pi to the Raspberry Pico. At the time of writing this article, the best way is to use an USB-FTL connector on the Pi, and then to connect directly to TX/RX. This way, you do not need to configure anything special on your Pi and have a reliable connection.
Setup is one thing. But how do you use a serial connection effectively? What are the best practices? In this article, I describe how to work with bitfields, text, Python objects, and interchangeable data formats like JSON. Also, a comprehensive performance tests between all of those methods is done to find out which method is best for timing critical applications.
This article originally appeared at my blog admantium.com.
The choice for a suitable data structure to send between multiple python programs is simple: Any data transmitted is a string. And what this string represents, that is up to you. You can transmit just numbers, e.g. a bitmask, that is mapped to data or state on the receiver side. You can invent your own mini language to represent text command. or you work with concrete python mutable or immutable objects, parse them to their string representation and transmit them. And finally, you can use a well-defined, interchangeable data format like JSON or YAML.
These are plenty of options. Which should you use and why? What algorithmic costs are occurring for serializing and de-serializing the data? The following sections will briefly explain each of these formats and give a small example: sending movement commands to a robot, like moving forward or backward, with a certain speed, and turning left or right. Finally, I will make a performance measurement of each methods and the discuss the results.
When you use plain text, the format is up to you. You need to define your own language to represent the data that you are transmitting.
In the context of our robot example, lets define these statements to represent movements of a robot:
MOVE_FORWARD=1
MOVE_BACKWARD=2
TURN_LEFT=45
TURN_RIGHT=90
STOP
The speed values are absolute values from 1 to 10. The turn values represent radians.
Commands in this form will be transmitted on at a time. The receiver needs to parse the text, extract the command and values, and instruct the robot.
Bit fields encode data as binary, so all you need to do is to encode your commands in a suitable binary representation. The sender encodes, the receiver decodes the data.
Continuing with our example, we need to create a bitmask with these steps:
- Determine command encoding: How is each command represented in binary?
- Determine value encoding: How are the values represented in binary?
- Determine field length: What is the highest integer that will be transmitted?
Let's answer these questions step-by-step.
# Command Encoding
MOVE_FORWARD = 1
MOVE_BACKWARD = 2
TURN_LEFT = 3
TURN_RIGHT = 4
STOP = 5
# Command Encoding Field Length
BIGEST_INTEGER_VALUE = 5
BIGEST_INTEGER_VALUE_IN_BINARY = 101
BITFIELD_LENGTH = 3
# Value Encoding
MOVEMENTS = Integer
TURNS = Integer
# Value Encoding Field Length
BIGEST_INTEGER_VALUE = 360
BIGEST_INTEGER_VALUE_IN_BINARY = 101101000
BITFIELD_LENGTH = 9
Therefore, our bit field format consists of a 3bit command and 9bit value. Example commands:
# Command Encoding
MOVE_FORWARD_SPEED_7 = 0b001000000111
TURN_RIGHT_242 = 0b100011110010
In Python, any built-in immutable data type (integer, floats, tuple) or mutable data types (list, dictionaries, sets) can be converted to a string with the repr
function. See the following examples:
>>> f = 3.12345245
>>> repr(f)
'3.12345245'
>>> lst = ["hello", "from", "Pi", 4]
>>> repr(lst)
"['hello', 'from', 'Pi', 4]"
>>> components = set()
>>> components.add("Pi4")
>>> components.add("Pico")
>>> components.add("D435")
>>> repr(components)
"{'Pico', 'Pi4', 'D435'}"
It is also possible to define the __repr__
function on classes and instances to make them represented as a string, however this is only meaningful if the same class definition is available on sender and receiver side. For exchanging small messages, this is an unnecessary overhead.
Considering our robot example, a suitable data structure to submit commands would be a list where key-value pairs follow each other, of even a dictionary with fixed key-value pairs.
list = ['MOVE_FORWARD', '7', 'TURN_RIGHT', '242']
repr(list)
"['MOVE_FORWARD', '7', 'TURN_RIGHT', '242']"
dict = {'MOVE_FORWARD': 7, 'TURN_RIGHT': 242}
repr(dict)
"{'MOVE_FORWARD': 7, 'TURN_RIGHT': 242}"
Using these data structures gives additional programming benefits: Modifications, like renaming commands or changing the value typed, and future additions are simpler to implement. Also, messages can include any number of commands and metadata. Dictionaries are especially powerful in this regard, as they can be traversed, manipulated, and traversed with iterators. This makes them versatile in serial communications where you work with string data anyway.
The final option is to choose an interchangeable data format. A very common format is JSON, a short hand for Java Script object Notation. JSON can be used to serialize literals (integers, strings, boolean) and structures (lists). YAML is a superset of JSON and intended to be human readable.
Following our example, a simple YAML data structure to transfer robot commands is this:
And in yaml:
MOVE_FORWARD: 7
TURN_RIGHT: 242
JSON and YAML offer the same benefits as Python data structures: modifications and extensions are simpler. In addition, these data structure can be read by other programming languages as well. In the context of our example however, we will stick t Python for serialization and deserialization.
In order to compare the performance of these different approaches, we can use Pythons built-in timeit
function. As documented, this function receives providing a string of statements, the number of times the statement should be executed, and the number of repetitions. Simple example:
import timeit
context = """
def plus(a,b):
return a+b
"""
results = timeit.repeat(stmt='plus(6,36)', setup=context, repeat=5, number=1000000)
print("Average Time:", sum(results)/5, "\nMeasurements:", results, )
Calling this function yields the following output. As we see, the average time is 0.25 seconds.
AVG Time: 0.2519046256085858
Measurements: [0.25533306901343167, 0.25123326701577753, 0.2510213520145044, 0.2509820369887166, 0.25095340301049873]
With this tool, we can measure encoding and decoding of text, bit fields, and python objects.
Let's make a concrete example for one specific decoding.
Encoding a text is simple: Its a string that needs to be converted to a byte array.
def encode(text):
return text.encode('utf-8')
Decoding involves several steps. First, we decode the byte array back to a sting. Second, we execute a regular expression on the string. Third, we convert the matches into a tuple.
from re import match
def decode_text(text):
decoded_text = text.decode('utf-8')
reg_exp = r'(\w+)=(\d+)'
matches = match(reg_exp, decoded_text)
result = matches.groups()
return(result)
To measure this code, we apply the following steps:
- Define the statement to be executed:
t = encode("MOVE_FORWARD=1"); decode_text(t)
- Define the complete encoding and decoding code as a multiline string called
context
- Call the
timeit
function with thestatement
, thecontext
, thenumber
of executions, and therepeat
cycles
text_decoding_measurements = timeit.repeat(
statement='t = encode("MOVE_FORWARD=1"); decode_text(t)',
setup=text_context,
repeat=20,
number=10_000)
In the same manor, we can perform tests for all the cases.
def encode(bitfield):
return bitfield.encode('utf-8')
def decode_bitfield(msg):
bitfield = msg.decode('utf-8')
cmd_bit = '0b' + bitfield[2:5]
value_bit = '0b' + bitfield[5:14]
mapping = {1: 'MOVE_FORWARD', 2: 'MOVE_BACKWARD', 3: 'TURN_LEFT', 4: 'TURN_RIGHT', 5: 'STOP'}
result = (mapping.get(int(cmd_bit,2)), int(value_bit,2))
return result
Bitfield Decoding of 'encode("0b00100000011")'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Average Time: 0.05363158800173551
Measurements: [0.027031786972656846, 0.02717018499970436, 0.026509653020184487, 0.026926272024866194, 0.0266216280288063, 0.027101150946691632, 0.026846155989915133, 0.02704080700641498, 0.026566963992081583, 0.02656059304717928, 0.026734582032077014, 0.026688970043323934, 0.02639739098958671, 0.0269411489716731, 0.026987641002051532, 0.026755536964628845, 0.02668587298830971, 0.026785144000314176, 0.026891918969340622, 0.02707247802754864]
Source
from re import match
def encode(text):
return text.encode('utf-8')
def decode_text(text):
decoded_text = text.decode('utf-8')
reg_exp = r'(\w+)=(\d+)'
matches = match(reg_exp, decoded_text)
result = matches.groups()
return(result)
Text Decoding of 'encode("MOVE_FORWARD=1")'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Average Time: 0.08896405659033917
Measurements: [0.04545394500019029, 0.044554901018273085, 0.04442428599577397, 0.0443288889946416, 0.04432642401661724, 0.043912817956879735, 0.04431018000468612, 0.04420615697745234, 0.044510716979857534, 0.044194838963449, 0.04418196598999202, 0.0441048729699105, 0.044894682010635734, 0.044447703985497355, 0.0445377750438638, 0.044568741985131055, 0.0446226799977012, 0.045044841011986136, 0.04473581199999899, 0.044278335000853986]
Source
def encode(text):
return text.encode('utf-8')
def decode_python_objects(text):
decoded_text = text.decode('utf-8')
result = eval(decoded_text)
return(result)
Python Objects Decoding of 'encode("('MOVE_FORWARD’, 1)")'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Average Time: 0.5119784157956019
Measurements: [0.25575865199789405, 0.2549030850059353, 0.2579314360045828, 0.25870296999346465, 0.2604327359586023, 0.2600087499595247, 0.2584023640374653, 0.2558257740456611, 0.25510250200750306, 0.2583176919724792, 0.2549058750155382, 0.255189977993723, 0.25448778801364824, 0.2551257850136608, 0.2546310239704326, 0.25484405900351703, 0.25408411998068914, 0.25427312596002594, 0.2534371940419078, 0.2534192479797639]
Source
from yaml import safe_load
def encode(yml):
return yml.encode('utf-8')
def decode_yml(msg):
yml = safe_load(msg)
result = tuple(yml)
return result
YAML Decoding of 'encode("['MOVE_FORWARD’, 7")'
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Average Time: 9.058417272800579
Measurements: [4.193098113988526, 4.122215965995565, 4.285432767006569, 4.330240387003869, 4.076146796985995, 4.159796627995092, 4.560074345965404, 4.498862796986941, 4.709009921003599, 4.322364090010524, 4.69364814698929, 4.772696384985466, 4.7846884350292385, 4.738021294993814, 4.648917622980662, 4.780966740043368, 4.605022739968263, 4.846610993030481, 4.693909451016225, 4.7624491060269065]
In short, here is the runtime for each type of decoding:
Total Time | Relative Performance | |
---|---|---|
Bitfield | 0.053631588001736 | 100% |
Text | 0.088964056590339 | 166% |
python Objects | 0.511978415795602 | 955% |
YAML | 9.05841727280058 | 16890% |
Passing and interpreting bits is the most performance effective method, but you need to write more code and the commands are less extensively without refactoring. Working with straight texts is 60% slower and you should be firm with regular expressions, but this approach is versatile, interchangeable and makes your language design extensible. The third method, sending stringified Python objects and evaluating them, has a major impact to performance. Although implement is the easiest one, and extensibility is high, you should not use this approach in applications that operate on microseconds or which provide huge amounts of data. The final method, well, don't use it.
When working with serial data, you can use different methods for designing the data format, the language that you want to transmit over the wire. These methods are: a) encode everything as bit fields, b) send texts that represents command, c) serialize and evaluate complete Python objects, and d) work with a data exchange format like YAML or JSON. But what is the performance of these methods? To uncover this, the article showed how the built-in timeit
function can be used for simple and effective measurements. Comparing all methods shows two winners: Bitfields are by far most performant, but require more coding and are less extensible, followed by text that is easy to program with and extensible.
19