Logging
A valuable tool in debugging is the debug log. The debug log is a journal of events that have taken place. The events are recorded in sequence. The log record often contains a timestamp, a source identifier, log level and a status message.
The following log was generated using ChatGPT to give a stylized sample.
[2024-01-17 10:15:22.880] INFO TaskA Initializing...
[2024-01-17 10:16:05.968] WARNING TaskB Connection lost.
[2024-01-17 10:17:30.981] ERROR TaskC An unexpected error occurred.
[2024-01-17 10:18:12.406] INFO TaskD Processing data...
[2024-01-17 10:19:05.689] WARNING TaskE Resource usage high.
[2024-01-17 10:20:40.690] INFO TaskF Task completed successfully.
[2024-01-17 10:21:55.092] ERROR TaskG Database connection failed.
[2024-01-17 10:22:40.085] INFO TaskH Loading configuration...
[2024-01-17 10:23:15.101] INFO TaskI Task execution started.
[2024-01-17 10:24:02.057] WARNING TaskJ Potential data inconsistency detected.
Log record content
You can see in the above example the key fields you would expect in a log message. There is a date and time, letting you know when the event happened. There is a log level, info
in this case. Then there is the message that is output. These messages show the command called and parameters, and the execution duration. We even see an error message, though this error was only emitted as an info
level log message. Let us look at these fields in more detail.
Timestamp
Understanding the role of timestamps is crucial in logging. These timestamps should be set when the event occurs. Most logging systems are adept at keeping log messages from concurrent tasks in order, but I've encountered instances in homemade loggers where records can become jumbled.
Typically, timestamps might be represented as system ticks (sys-ticks), a Universal Time Coordinated (UTC) timestamp, or some form of ad hoc metric. In practice, we often convert sys-ticks to milliseconds (ms) to provide a more standard time measurement.
It's important to note that while free-running tickers start almost immediately once the clock tree is stable (this happens post Phase-Locked Loop (PLL) lock), initiating the Real-Time Clock (RTC) interface is a separate process. If the RTC is an external component, its interface requires configuration before the Microcontroller Unit (MCU) can begin accurately tracking wall clock time.
Log level
The log level qualifies how much attention you need to pay to the message. They may also be regarded as severity levels for what is being reported. Typical log levels include:
debug
extra information
branch paths
parameters
interim values
normally not shown
info
progress
state changes
branch paths
results
warning
improper requests
boundary conditions nearing
watermarks (pre-overflow)
error
out of bounds values/inputs
use of resource before available
conflicts
fatal
crash
Logging levels are sometimes used to gate what log messages are actually recorded or output. You will often see debug logs are available in debug builds of your code, but not in release builds.
Depending on your logging framework, active log levels could be run-time selectable. In less complex systems compile time options determine how verbose your logs are.
Source Identifier
For a log message to be useful, we need to know where it was set. The source identifier can be as simple as the function name. If a function is used in multiple tasks, it may be beneficial to add the context in which the function was called.
Message
The body of the log record will be message about what is happening. The message normally has some terse text. The message may contain data values.
Depending on system resources, you may allow formatted printing of variables and data structures as the program is executing. On small footprint devices, you will be limited to pre-formatted text. Under extreme limitations, the message may just be an error code.
The log may be recorded on-board or sent directly to another device for recording. Logs may grow "without limit", that is, until all space has been consumed. Alternately logs might be constrained to a few entries, with older messages being discarded to allow space for the last few actions. The system you are running on will guide these limits.
Log destination
I worked on a system that sent the log messages to various endpoints, such as UART connected serial console, local file and cloud storage. The debug log output was sent to the logging system from the active task. The logging system would filter the messages based on a rule set, based on severity and source, and then forward to the selected output streams.
Console
Connecting the logger output to a serial port, and viewing in a terminal program is about the easiest way to see system activity.
The local console is the cheapest to operate interface, intended to output to the developers PC for local monitoring and debug. QA recorded this output for bug reports too.
The low cost of console output is both in terms of resources within the device under test (DUT) and the upload, requiring a PC which you already own, and a $20 USB UART cable.
The real price you pay for a debug UART and console interface is the need for a dedicated UART port (minimum one pin, TX) and a UART peripheral controller.
You also pay in code space for the debug strings. I was coding a bootloader that was at the limit, and to get the 50 bytes of instruction space I needed I reduced the verbose log strings. The price in performance varies depending on if you can allocate a DMA engine to output the bytes, and if you use a formatted print like printf()
instead of an unformatted puts()
. printf
can place high demand on RAM and CPU cycles.
Cloud storage
Cloud storage can be expensive, with a dollar cost proportional to frequency of access, and bytes stored. To keep costs down, a limited set of log messages are sent to the cloud. Only messages essential for field failure analysis and customer support, get routed that way.
Cloud logging might not enabled until the network, BLE or WiFi stack are fully integrated. This level of integration can happen fairly early in a project, especially if a solid SDK is used, but may be very late where a lot of custom code is involved.
On-board
The on-board storage cost is measured differently, it is limited to available space. If the storage is in flash, then the limited number of erase cycles will forcing rules on how we can updated the records. Another expense in flash storage is SPI bus bandwidth, especially if the bus or flash are a shared resource. With flash, erase cycle time can bottle neck logging, and impact system performance - page erase is slow. You really do not want your logging system task blocking the task it is logging.
Enabling logs
Logging output, depending on your system resources, may be enabled at run-time or compile time. Logging may be globally enabled, or might be on a per task or per module basis. You can make it as simple, or as complicated as you like. The more control you have over what is logged, the simpler your log files will be, but you may miss important information if too much of the output is disabled.
What logging does under the hood
Depending on the logging framework in place, the following actions could be happening
reading system time
reading task control block for current task ID
formatted printing of message statement to include data from variables
block copy of log to queue
initiate a DMA transfer over a serial channel
initiate a block transfer over a wireless link
initiate a flash erase and write cycle
When your code wants to log, your code takes a detour, consuming stack depth, maybe allocating from heap, and processing the message creation. In a single threaded system, the output may happen immediately. All this immediate work adds latency to whatever your function is doing.
Within an RTOS it is more likely that the message is queued for the logging thread to handle some time later. Many threads may process information and add to the log queue before you hit a time when the logging thread can save your message, or output it. While waiting for save or output, other messages may appear on the console, or a crash could occur.
Limitations on logging
Since a puts() or printf() output to the console immediately, and the log outputs are typically deferred, there is a risk of prints and logs messages being out of sequence in your console. Prints often lack the same formatting, and omit timestamps, so it is hard to spot when order is reversed.
If a crash occurs between the log message creation and the log message being recorded to the screen, serial port or file, you may actually never have the information you need.
Last updated