Thursday, April 12, 2012

Learning By Doing

 I figured out a long time ago that I'm not happy unless I'm learning new things. It's more than just being happy, really. For me, learning new stuff is very much self-medication. And the only way I can learn new stuff, really internalize new information, is by applying the stuff I learn. And the best way for me to do that is to generate deliverable. This has resulted in a bunch of projects that contain all sorts of useful collateral, some of which has found its way into real products of paying clients. Even that which hasn't, has served me well as a kind of reference design that I routinely go back to when I'm working in related areas. I attribute much of my career success (and I've had a lot of it) to this approach.

Amigo, my foray into low-power eight-bit microcontrollers, has been no different. Here, in no particular order, are some of the lessons I've learned, relearned, or have had reinforced.

C++ works just fine for embedded applications.

C++ isn't just usable for embedded applications, even those with real-time requirements, that run on targets that are resource-contrainted. It's superior to alternatives like C or even assembler. I've been using FreeRTOS, a popular real-time microkernel with a tiny footprint, on the Freetronics EtherMega board with the Atmel AVR ATmega2560 microcontroller. This is a platform with 256KB of flash and only 8KB of SRAM. I've written a C++ layer around the FreeRTOS facilities to provide classes like Queue, Task, MutexSemaphore, etc. I've written interrupt-driven device drivers in C++ for AVR hardware features like USART and SPI.

Not only does C++ work, but the result was a cleaner, simpler, easier-to-use design than I could have accomplished in C. How much SRAM overhead does C++ add over using the FreeRTOS C API? For example: Queue, four bytes; Task, eight bytes; MutexSemaphore, four bytes. Two bytes of each of those could have been eliminated by not having virtual methods. The remaining extra bytes would have been added in a C layer as well. Thanks to inline C++ methods, there is frequently no additional overhead in flash to using the C++ API.

Formal unit testing is the best thing since sliced bread.

I wish I could just port my favorite C++ unit testing framework, Google Test, to the AVR. I have considered writing a framework around Google Test to have it run on my desktop but execute unit tests on the target. But so far a few carefully written preprocessor macros like UNITTEST(__NAME__), FAILED(__LINE__), and PASSED() have been more than adequate. A single unit test C++ main() program that exercises most of what I've written so far consumes only about 27KB of flash and 6KB of SRAM, most of which is stack space for main() and four concurrent tasks. Below (mostly to record it for my own reference, truth be told) is the output of the unit test suite. (Update 2012-04-26: I pasted in the latest version.)

Unit Test Console
Unit Test Morse PASSED.
Unit Test Task PASSED.
Unit Test Sink
Now is the time for all good men to come to the aid of their country.
Unit Test sizeof
sizeof(signed char)=1
sizeof(unsigned char)=1
sizeof(signed short)=2
sizeof(unsigned short)=2
sizeof(signed int)=2
sizeof(unsigned int)=2
sizeof(signed long)=4
sizeof(unsigned long)=4
sizeof(long long)=8
sizeof(signed long long)=8
sizeof(unsigned long long)=8
Unit Test stack PASSED.
Unit Test heap PASSED.
Unit Test littleendian and byteorder PASSED.
Unit Test Low Precision delay PASSED.
Unit Test High Precision busywait PASSED.
Unit Test Dump
Unit Test Uninterruptible PASSED.
Unit Test BinarySemaphore PASSED.
Unit Test CountingSemaphore PASSED.
Unit Test MutexSemaphore PASSED.
Unit Test CriticalSection PASSED.
Unit Test PeriodicTimer PASSED.
Unit Test OneShotTimer PASSED.
Unit Test Digital I/O (requires text fixture on EtherMega) PASSED.
Unit Test Analog Output (uses red LED on EtherMega) PASSED.
Unit Test Analog Output (uses pin 9 on EtherMega) L M H M L PASSED.
Unit Test Analog Output (uses pin 8 on EtherMega) L M H M L PASSED.
Unit Test SPI (requires WIZnet W5100) PASSED.
Unit Test W5100 (requires WIZnet W5100) PASSED.
Unit Test Socket (requires internet connectivity) PASSED.
Unit Test errors=0 (so far)
Unit Test Source (type control-D to exit)
England expects each man to do his duty.
Unit Test errors=0
Type "<control-a><control-\>y" to exit Mac screen utility.

(Update 2013-03-22: that line above that says text fixture should be test fixture, and refers to the wiring on the board that allows the unit test -- really a functional test -- to succeed when it controls the actual hardware.)

When I make a change to existing code, I just build, upload, and execute the entire unit test suite. When it gets to the end in thirty-seconds or so and declares Unit Test errors=0 I am pretty confident that I haven't screwed something up. When I add new functionality, I add another unit test code segment to main(). If I start to run out of space, I'll start deactivating selected tests by turning off the #if conditional compilation statements I've put in the code. But so far that isn't a problem, and doesn't look likely to become one in the near future.

Unit tests do more than assure me I haven't done something stupid. I get a lot of feedback about my design by eating my own dog food. If I find using a feature that I've written to be cumbersome in the unit test, I know that I've botched the API.

The unit tests also serve as a living, functional example of how I expect my code to be used. I deliberately try to use my software in a way I expect it to be used in an application, or even to suggest ways in which it might be used. I often go back to my own unit tests to remind myself how to use my own code.

Tool chains for embedded projects can still be problematic.

I've written recently about my adventures in AVR tool chains when I discovered that my code worked just fine with GCC 4.5.1 but failed mysteriously and catastrophically with GCC 4.3.4. I don't have much to add to that except that the AVR, with its Harvard architecture, and the broad range of configurations of microcontrollers in the AVR product line, can make code generation for these targets challenging. Unfortunately, this is likely true for a lot of microcontrollers, and indeed for any processor other than the mainstream Intel x86. My reading of disassembled code suggests me that GNU C++ doesn't quite support virtual methods in the upper 128KB of flash on the AVR. I'd be happy to be proven wrong.

I'd be extremely reluctant to ever say I'd found a compiler bug. My reading and writing about memory models and support for the C and C++ volatile keyword has convinced me that these areas are subtle and fraught with peril not just for embedded developers but potentially for everyone, even with perfectly working compilers. But I am still puzzled why a function that returned a pointer to a volatile variable returned NULL when checking the value just before it was returned showed it to be correct. And why I had to cast the result of a sizeof() operator in order to print something other than a monstrously large number when sizeof(size_t) is two bytes. When you are writing code close to bare metal, strange and unexpected things can sometimes happen.

Lexical scoping can be like having a superpower.

C programmers (and just about everyone else) already know about scoping. When a local variable comes into scope, the compiler generates code to allocate it on the stack. When it goes out of scope, the compiler generates code to deallocate it from the stack.

/* foo is out of scope. */
int foo = 0; /* foo comes into scope. */

* :
* foo is in scope.
* :

/* foo is about to go out of scope. */
/* foo is out of scope. */

C++ extends this to objects by automatically calling the class constructor when an object of that class comes into scope, and it automatically calls the class destructor when that object goes out of scope. The great thing about this is that constructors and destructors are methods that you write that can do all sorts of things, including things that may or may not be related to the object being allocated and deallocated. The Resource Acquisition is Initialization idiom is a way to exploit this.

For example, Amigo implements the class MutexSemaphore with give() and take() methods. The class CriticalSection stores a reference to a MutexSemaphore and calls take() against it in its constructor, and calls give() in its destructor. This is how you implement a critical section to protect data shared between concurrent tasks.

MutexSemaphore mutex; // Shared among tasks.

CriticalSection cs(mutex);

// Code accessing shared date goes here.

That's it. Everything else is done for you. No matter how you exit that lexical block, the compiler guarantees that the destructor will be called to release the recursive mutex semaphore.

Similarly, Uninterruptible saves the current interrupt state (by saving a copy of SREG, the status register) and disables interrupts in its constructor, and restores the interrupt state in its destructor. So here's a section of code that runs with interrupts disabled.

Uninterruptible ui;

// Code to run without interruption goes here.

Endianess is sometimes a function of the tool chain.

It is legitimate to say that the megaAVR architecture is little-endian: the first or lowest address of a multi-byte variable points to the least-significant byte in that variable. Except on the AVR, there is no such thing as a multi-byte variable. Everything is done in byte-sized chunks using eight-bit registers.

Some registers are split into multiple eight-bit registers that are logically concatenated. For example, the sixteen-bit stack pointer is split into SPL (low) memory-mapped to address 0x3d and SPH (high) at address 0x3e. That's little-endian. But there is no way for an application to atomically access both of these chunks at one time as a variable. It is the code generated by the compiler that assumes a byte ordering of short or long variables stored in two or four consecutive bytes of memory. This was a new idea to me.

Harvard architecture requires some new thinking.

Executable code in the megaAVR architecture resides in flash memory or program memory. Non-persistent data resides in static random access memory or data memory. This is generically known as Harvard architecture, as opposed to von Neumann architecture where everything is accessed from a common memory, or at least a common memory bus. On the megaAVR, program memory is (two-byte) word addressed, while data memory is byte addressed. Persistent constant data can be stored in program memory too, but requires special functions that call dedicated machine instructions to bridge the gap from flash to SRAM for processing.

Just to make it even more complicated, some megaAVRs have a three-byte program counter instead of two-byte because their flash memory exceeds 128KB; word addressed, remember? So pointers to stuff in program memory may be three bytes, depending on the model of megaAVR, but pointers to stuff in data memory will always be two bytes. You can have two pointers with exactly the same numerical value, but one points to data in flash, the other to data in SRAM; if you mix up their usage, wackiness ensues.

It's up to you to keep this straight. The GCC AVR tool chain does not deal with this automatically. The non-automatic part makes life more complicated because the SRAM is so small -- a whopping 8KB on the ATmega2560, but a miniscule 2KB on the ATmega328P used on the Arduino Uno -- you absolutely must store large constant data in flash, or you'll find all your SRAM taken up with stuff like character strings. Ask me how I know this.

The GCC C and C++ compilers and the AVR C library includes extensions, attributes, type definitions, functions, and preprocessor macros to enable you to write code to deal with all of this. But write code you must. You will find your source files littered with stuff like PROGMEMPSTR, strlen_P(), pgm_read_byte() and the like. Once you get used to it, it's actually pretty straightforward. But it is definitely a different way of thinking. You also hope you'll never have to port this code to a different microcontroller architecture.

Resource constraints make you a better developer.

It is a lot easier to write code for a whopping big server with gigabytes of real memory, gigabytes more of virtual memory, terabytes of disk space, and many processing cores each of which runs at many gigahertz, and a virtual machine that hides all the hardware from you. Heck, anybody can do that. Shoehorning a complex multi-tasking C++ application into 8KB of SRAM, that takes some real thought.

You are forced to make architectural decisions up front and careful design and implementation decisions as you go. You are forced into understanding the consequences of your actions as you decide: do I really need virtual functions in this class? You have to figure out how things actually work under the hood as you ponder: what happens when I go past the 128KB flash boundary in my application?

As they do in so many problem domains, constraints force you to confront the implications of your decisions head on. That makes you a better developer, and that is why I like the megaAVR as a teaching platform. I like to say "all the interesting problems are really scalability problems", and resource constraints allows you to see scalability problems while spending a lot less money.

C++ templates are a win in the eight-bit realm.

C++ templates are a form of code generation, like the C preprocessor but more structured, and so they must be used judiciously, especially in a resource constrained environment. But they can be used to solve some of the very problems that resource constraints bring to the table. And they can be used to make your software more reliable with little or no additional overhead.

When you have only 8KB of data memory into which you must squeeze all your variables, a sizable stack for each task, and a heap from which memory may be dynamically allocated, you may come to realize that your heap isn't going to be very big. C++ templates are a way of implementing variable sized objects as local variables on a stack, instead of using malloc() to dynamically allocate them. I've written about this before.

Templates can make your software more reliable by allowing you to implement generic code in a base class, then make it type specific in a derived class generated by a template.  For example, I wrote the C++ wrapper Queue around the FreeRTOS queue facility. FreeRTOS queues are synchronized ring buffers that can be used by an application to pass data back and forth with an interrupt-driven device driver, or for two concurrent tasks to pass data back and forth. Amigo uses them in both ways. A FreeRTOS queue can contain any number of fixed length objects. The Queue class is as generic as the underlying FreeRTOS functions. But the TypedQueue class extends Queue for a specific data type, and makes all of the Queue operations type safe. This makes it a lot harder to screw up and send the wrong message to the wrong queue.

template <typename _TYPE_>
class TypedQueue
: public Queue


explicit TypedQueue(Count count, const signed char * name = 0)
: Queue(count, sizeof(_TYPE_), name)

virtual ~TypedQueue() {}

bool peek(_TYPE_ * buffer, Ticks timeout = IMMEDIATELY) { return Queue::peek(buffer, timeout); }

bool receive(_TYPE_ * buffer, Ticks timeout = NEVER) { return Queue::receive(buffer, timeout); }

bool receiveFromISR(_TYPE_ * buffer, bool & woken = unused.b) { return Queue::receiveFromISR(buffer, woken); }

bool send(const _TYPE_ * datum, Ticks timeout = NEVER) { return Queue::send(datum, timeout); }

bool sendFromISR(const _TYPE_ * datum, bool & woken = unused.b) { return Queue::sendFromISR(datum, woken); }

bool express(const _TYPE_ * datum, Ticks timeout = NEVER) { return Queue::express(datum, timeout); }

bool expressFromISR(const _TYPE_ * datum, bool & woken = unused.b) { return Queue::expressFromISR(datum, woken); }


Ring buffers are a fundamental interprocess communication mechanism.

By the way, synchronized ring buffers, that is, buffers that provide atomic reads and writes with synchronized access to concurrent tasks and that wrap around the underlying storage, have long been useful interprocess communication (IPC) mechanisms for solving general producer-consumer problems. Queues like those in FreeRTOS are most often, in my experience, used to store individual bytes of data. But they are equally adept at storing pointers to buffers or even to objects, and so can be thought of as a more general asynchronous message passing scheme.

C++ references are better than pointers.

C++ has pointers, just like C. But it also has references, which actually are pointers but with some useful restrictions, like: there is no such thing as a NULL reference. Yes, if you are quite clever, you can create a NULL reference, but your code will soon be on its way to a fatal error. Using references instead of pointers can make your code simpler and more reliable.

A common idiom for optional function arguments in C is to declare them to be pointers. If they aren't used in a particular call to a function, you pass a NULL pointer. Your function has to check for this. Sometimes your forget. Wackiness ensues.

In C++ you can use references and default parameters instead.

bool Queue:sendFromISR(const void * datum, bool & woken = unused);

This method of the Queue class is used to send data to a synchronized ring buffer from an interrupt service routine. It has two arguments: a pointer to the data to be sent, and a reference to a boolean variable that is returned to the caller with a value indicating whether or not this operation woke up a higher priority task. (Users of FreeRTOS will already be familiar with this idiom.)

C++ turns the second parameter into what is effectively a pointer, although the syntax for its use inside the instance method makes it look just like a variable. The pointer dereferencing stuff is all handled automatically by the compiler. That's why there is no possibility of a NULL pointer: there is no way syntactically for you to specify it, and hence you can't even check for it in the method.

Sometimes the application cares about the returned boolean value, and sometimes it doesn't. When it does, it passes its own boolean variable as the second argument, overriding the default parameter. When it doesn't, C++ passes the default parameter, a reference to the boolean variable unused. The instance method never has to check for a NULL pointer, because a NULL pointer can't ever be passed in. There is always a reference to a boolean variable for the method to use.

And what the deuce is unused? It's just a dummy variable, defined elsewhere, which is write-only: it's written to by Queue::sendFromISR() but no one ever reads it. It could be implemented, for example, as a private class (static) variable of the Queue class.

I should mention: there are some odd things about references too. C++ deliberately makes it hard to have an instance variable to which a reference has not yet been assigned. The syntax for assigning a reference to the variable can look like an assignment statement, where as an actual assignment statement is actually assigning something not to the reference variable but to the thing to which it refers. You may really only be able tell them apart in context. That throws a lot of folks new to C++. When I use references in constructor arguments (which I do routinely) I actually prefer to convert them to pointers to be stored in pointer instance variables. I find that leads to fewer mistakes, both on my part and on the parts of maintenance developers who come after me.

Here's an example that does just that.

class CriticalSection


CriticalSection(MutexSemaphore & mutex)
: mutexp(&mutex)
if (!mutexp->take()) {
mutexp = 0;

~CriticalSection() {
if (mutexp != 0) {


MutexSemaphore * mutexp;


Doxygen is great even if you don't use Doxygen.

My love affair with Doxygen goes back more than a decade. Inspired by javadoc, Doxygen is a tool that scans your source code for comments written in a very specific format, and generates API documentation based on your code and those comments. You can use Doxygen with any of several programming languages (including Java) to automatically generate documentation in the form of HTML web pages, TeX files, PDF documents, etc. It works great with C and C++.

For example

 * This is the function is that from which nothing ever returns.
 * It disables interrupts, takes over the console serial port,
 * prints a message if it can using busy waiting, and infinite
 * loops. This version can be called from either C or C++
 * translation units.
 * @param file points to a file name in program space,
 * typically PSTR(__FILE__).
 * @param line is a line number, typically __LINE__.
CXXCAPI void amigo_fatal(PGM_P file, int line);


Screen shot: amigo_fatal Doxygen comments

But I would still love Doxygen even if I never used any of the documentation that it generates. Doxygen enforces a very specific comment format and discipline for commenting functions, methods, parameters, classes, files, and even preprocessor symbols and macros. Running Doxygen against the source code base yields warnings about undocumented source code. Doxygen is like an automated code inspector that lets me know when I've slipped up. It's one of the many ways I keep myself honest.

Since the public and protected API is defined in header files, that is typically where I put the bulk of my Doxygen comments. Documenting my public API helps refine its design just like unit tests do: if while writing Doxygen comments I find myself thinking "This is rubbish! Who is the cretin that designed this?" I know my API design is lacking in credibility.

Big city techniques work just fine in small town microcontrollers.

I've discovered that with some care and discipline, the techniques I have used for the past decade or two for embedded and real-time development on larger platforms work just fine on tiny eight-bit microcontrollers, and they bring all the same advantages to the table.

I continue to learn.

Update 2012-05-14

Since writing this I have run my entire unit test suite without changes on an Arduino Mega ADK board with an Arduino Ethernet shield. Getting it to work took all of maybe ten minutes, and almost all of that was trying to figure out the pin alignment when plugging in the Ethernet Shield onto the Mega ADK. This says a lot about the compatibility of the Freetronics EtherMega board, which is supposed to behave like an Arduino Mega board with an Ethernet shield. Apparently it does.


Dogburt said...


I am eager to try Amigo on Arduino Mega2560. I have the code but documentation is scarce. I am very familiar with FreeRTOS and need no help however, the "basic" Arduino IDE configuration for multiple files is not obvious. Can you add some first run instruction for the Amigo release?


Dogburt said...


I am familiar with FreeRTOS but just delving into Arduino. I have the Amigo release but I have no info on how to integrate the large file set in the simple Arduino IDE. Do you have any advice on how to get Amigo running in this environment? I have the Mega2560 so I think we are close enough to get this going but the IDE is so simple, it may be complicated, if you know what I mean.

Thanks in advance.

Chip Overclock said...

I chose _not_ to use the Arduino IDE for the Amigo work. You can have multiple files in the IDE -- the "Add File" option under the "Sketch" tab. But while I think the IDE is the best thing since sliced bread for small, standalone projects, it's not easily scalable up to the kinds of things I routinely do for a living. Amigo was all developed using the GNU tool chain (on which Arduino is also based), Make, and Eclipse. That environment is more typical of the large product line/code reuse kind of environment that I routinely work in. It's not built on top of the Arduino run-time, which implements a simple task loop while doing I/O using busy waiting. Amigo is interrupt driven and multi-threaded. It uses the Arduino HW ecosystem. I find this really combination really appealing: start with the Arduino IDE to learn the basics, then gradually evolve to using the full tool chain with FreeRTOS and Make.