Logging

A valuable tool in debugging is the debug log. The debug log is a journal of events that have taken place. The events are recorded in sequence. The log record often contains a timestamp, a source identifier, log level and a status message.

The following log was generated using ChatGPT to give a stylized sample.

[2024-01-17 10:15:22.880] INFO     TaskA      Initializing...
[2024-01-17 10:16:05.968] WARNING  TaskB      Connection lost.
[2024-01-17 10:17:30.981] ERROR    TaskC      An unexpected error occurred.
[2024-01-17 10:18:12.406] INFO     TaskD      Processing data...
[2024-01-17 10:19:05.689] WARNING  TaskE      Resource usage high.
[2024-01-17 10:20:40.690] INFO     TaskF      Task completed successfully.
[2024-01-17 10:21:55.092] ERROR    TaskG      Database connection failed.
[2024-01-17 10:22:40.085] INFO     TaskH      Loading configuration...
[2024-01-17 10:23:15.101] INFO     TaskI      Task execution started.
[2024-01-17 10:24:02.057] WARNING  TaskJ      Potential data inconsistency detected.

Log record content

You can see in the above example the key fields you would expect in a log message. There is a date and time, letting you know when the event happened. There is a log level, info in this case. Then there is the message that is output. These messages show the command called and parameters, and the execution duration. We even see an error message, though this error was only emitted as an info level log message. Let us look at these fields in more detail.

Timestamp

Understanding the role of timestamps is crucial in logging. These timestamps should be set when the event occurs. Most logging systems are adept at keeping log messages from concurrent tasks in order, but I've encountered instances in homemade loggers where records can become jumbled.

Typically, timestamps might be represented as system ticks (sys-ticks), a Universal Time Coordinated (UTC) timestamp, or some form of ad hoc metric. In practice, we often convert sys-ticks to milliseconds (ms) to provide a more standard time measurement.

It's important to note that while free-running tickers start almost immediately once the clock tree is stable (this happens post Phase-Locked Loop (PLL) lock), initiating the Real-Time Clock (RTC) interface is a separate process. If the RTC is an external component, its interface requires configuration before the Microcontroller Unit (MCU) can begin accurately tracking wall clock time.

Log level

The log level qualifies how much attention you need to pay to the message. They may also be regarded as severity levels for what is being reported. Typical log levels include:

  • debug

    • extra information

    • branch paths

    • parameters

    • interim values

    • normally not shown

  • info

    • progress

    • state changes

    • branch paths

    • results

  • warning

    • improper requests

    • boundary conditions nearing

    • watermarks (pre-overflow)

  • error

    • out of bounds values/inputs

    • use of resource before available

    • conflicts

  • fatal

    • crash

Logging levels are sometimes used to gate what log messages are actually recorded or output. You will often see debug logs are available in debug builds of your code, but not in release builds.

Depending on your logging framework, active log levels could be run-time selectable. In less complex systems compile time options determine how verbose your logs are.

Source Identifier

For a log message to be useful, we need to know where it was set. The source identifier can be as simple as the function name. If a function is used in multiple tasks, it may be beneficial to add the context in which the function was called.

Message

The body of the log record will be message about what is happening. The message normally has some terse text. The message may contain data values.

Depending on system resources, you may allow formatted printing of variables and data structures as the program is executing. On small footprint devices, you will be limited to pre-formatted text. Under extreme limitations, the message may just be an error code.

The log may be recorded on-board or sent directly to another device for recording. Logs may grow "without limit", that is, until all space has been consumed. Alternately logs might be constrained to a few entries, with older messages being discarded to allow space for the last few actions. The system you are running on will guide these limits.

Log destination

I worked on a system that sent the log messages to various endpoints, such as UART connected serial console, local file and cloud storage. The debug log output was sent to the logging system from the active task. The logging system would filter the messages based on a rule set, based on severity and source, and then forward to the selected output streams.

Console

Connecting the logger output to a serial port, and viewing in a terminal program is about the easiest way to see system activity.

The local console is the cheapest to operate interface, intended to output to the developers PC for local monitoring and debug. QA recorded this output for bug reports too.

The low cost of console output is both in terms of resources within the device under test (DUT) and the upload, requiring a PC which you already own, and a $20 USB UART cable.

The real price you pay for a debug UART and console interface is the need for a dedicated UART port (minimum one pin, TX) and a UART peripheral controller.

You also pay in code space for the debug strings. I was coding a bootloader that was at the limit, and to get the 50 bytes of instruction space I needed I reduced the verbose log strings. The price in performance varies depending on if you can allocate a DMA engine to output the bytes, and if you use a formatted print like printf() instead of an unformatted puts(). printf can place high demand on RAM and CPU cycles.

Cloud storage

Cloud storage can be expensive, with a dollar cost proportional to frequency of access, and bytes stored. To keep costs down, a limited set of log messages are sent to the cloud. Only messages essential for field failure analysis and customer support, get routed that way.

Cloud logging might not enabled until the network, BLE or WiFi stack are fully integrated. This level of integration can happen fairly early in a project, especially if a solid SDK is used, but may be very late where a lot of custom code is involved.

On-board

The on-board storage cost is measured differently, it is limited to available space. If the storage is in flash, then the limited number of erase cycles will forcing rules on how we can updated the records. Another expense in flash storage is SPI bus bandwidth, especially if the bus or flash are a shared resource. With flash, erase cycle time can bottle neck logging, and impact system performance - page erase is slow. You really do not want your logging system task blocking the task it is logging.

Enabling logs

Logging output, depending on your system resources, may be enabled at run-time or compile time. Logging may be globally enabled, or might be on a per task or per module basis. You can make it as simple, or as complicated as you like. The more control you have over what is logged, the simpler your log files will be, but you may miss important information if too much of the output is disabled.

What logging does under the hood

Depending on the logging framework in place, the following actions could be happening

  • reading system time

  • reading task control block for current task ID

  • formatted printing of message statement to include data from variables

  • block copy of log to queue

  • initiate a DMA transfer over a serial channel

  • initiate a block transfer over a wireless link

  • initiate a flash erase and write cycle

When your code wants to log, your code takes a detour, consuming stack depth, maybe allocating from heap, and processing the message creation. In a single threaded system, the output may happen immediately. All this immediate work adds latency to whatever your function is doing.

Within an RTOS it is more likely that the message is queued for the logging thread to handle some time later. Many threads may process information and add to the log queue before you hit a time when the logging thread can save your message, or output it. While waiting for save or output, other messages may appear on the console, or a crash could occur.

Limitations on logging

Since a puts() or printf() output to the console immediately, and the log outputs are typically deferred, there is a risk of prints and logs messages being out of sequence in your console. Prints often lack the same formatting, and omit timestamps, so it is hard to spot when order is reversed.

If a crash occurs between the log message creation and the log message being recorded to the screen, serial port or file, you may actually never have the information you need.

Last updated