Wednesday, April 11, 2012

Hitting a Moving Target While Flying Solo

Embedded developers inevitably seem to have to worry a lot more about their tool chains than folks who develop desktop or server-side applications. Tool chains are that vast collection of compilers and utilities you use to get from source code that you write to a binary executable image that runs on the actual hardware that you care about.

For one thing, when you are writing code that is close to bare metal, things that might seem otherwise trivial are actually really important. Like whether or not the machine code implementation of a function falls in the first 128 kilobytes of memory. Or whether an access can change the state of a variable that happens to be a memory mapped I/O register.

But the other big issue is that configuring and building a tool chain is no small feat. Configuring and building a tool chain for cross compilation -- that is, one that generates and processes executable machine code for a different hardware target than the one on which it is running -- is even more fraught with peril. It typically requires careful selection and configuration of a number of large, complex, and independent components, such as a specific GNU compiler collection package (from whence come the C and C++ compilers), a specific binary utilities package (which provides the linker among other necessary tools), a run-time library package, and what not.

This is a troublesome issue that desktop and server-side developers seldom have to worry about. Not that those folks don't have their own problems. Back in my Enterprise Java days I did have to give some thought before clicking on "Install" when a pop-up announced that a new version of Java was available. And the speed at which the many open source frameworks a large application might rely upon changed was astounding. But many of the desktop and server-side developers I hang out with today don't even know whether their system has a C or C++ compiler. And rightfully don't care.

I have had much success recently at building and running a multi-threaded interrupt-driven C++ application using FreeRTOS on the Freetronics EtherMega 2560 board which uses the Atmel AVR ATmega2560 microcontroller. I have been using the AVR CrossPack package of GNU cross compilers on my desktop Mac with no problems whatsoever. I have also had the occasion to build and test my application under Windows 7 using the AVR Studio 5.1 IDE which includes a very slightly older version of the GCC tool chain. So it seemed like a no brainer to try building on my big multicore Ubuntu 10.04 server using the cross-compilation tool chain installed by the Synaptics package manager.

Uh oh.

Yeah, the application built just fine, but went seriously south right about the time my code enabled interrupts on the microcontroller. South as in jumping to the reset vector and entering into a rolling reboot. My expensive JTAG debugger was useless in this case, since it's only supported on Windows, and the Windows build worked.

The AVR CrossPack for my Mac uses GCC 4.5.1 and AVR libc 1.8.0. AVR Studio on Windows uses GCC 4.5.1 and AVR libc 1.7.1. Those worked just fine. Ubuntu uses GCC 4.3.4 and AVR libc 1.6.7. Those sucked, at least for my application. I was so bold as to run the Mac and Ubuntu binary executable images through the AVR disassembler, figuring what the heck, I had a passing familiarity with AVR assembler, and how different could they be? Yeah, right. The graphical diff tool ran for a long time before finally disabusing me of that notion.

Crap.

So last night, after futzing around for the better part of an afternoon with visions of having to spend a few days trying to lovingly handcraft a whole new tool chain, I posted a query to AVR Freaks, an international forum of AVR users. By this morning I had many suggestions from folks in the U.S., Denmark, Sweden, and Germany, one of which pointed me to a pre-built Debian package on a British web site that included just the versions of the tool chain I needed. It took me all of maybe fifteen minutes to go from reading that comment, through installing the package, modifying my makefile, and building and downloading my application, to all of my unit tests passing.

You guys rock.

That's the good news. Here's the bad news: this is not uncommon. Open source software, including tool chains, are rapidly moving targets. How many times have you heard a manager say "we don't have to develop any of that code, it's all open source, and it's free"? It's seldom that simple, for embedded developers, or for any other kind of developer for that matter. It's only free in the sense that the manager doesn't have to cut a purchase order. Or in the sense that their employees' time isn't considered valuable.

Whether using open source is easy or not will depend on your very specific requirements and the exact combination of tools, utilities, and libraries that you need, and even what operating system distribution and release you are running on the machine upon which you want to install this stuff. The level of difficulty can range from a few minutes work (see above) to something completely outside of your schedule. You may not be able to reliably gauge where you are on this spectrum until you are deeply into it. Meanwhile: it's all mutating, each independent package changing at its own rate, driven by someone else's requirements which may or may not jive with your own.

This can be an even more vexing issue for embedded developers. I remember a few years ago I was debugging a hang during boot with my custom Linux 2.6 build for a client's embedded project using a Freescale PowerPC processor. I traced it down to a bug inside the Linux kernel in processor-specific code that handled the hardware clock; depending on what the initial non-derministic value was in a hardware register, kernel initialization code executed during boot would wait until the clock wrapped around. That could take a while. Like maybe hours or days. This code, by sheer random chance, might work the first time you executed it. Maybe even the second time. But eventually your processor was going away and not come back until you lost patience hit the reset button. It didn't take much testing to notice.

When you are using open source software on a mainstream processor -- which these days means an Intel x86 of some vintage -- you can be reasonably sure that hundreds if not thousands of people have already been using the same software on a daily basis before you ever laid eyes on it. But in the embedded domain, you have to accept the fact that it is entirely possible that you are the only guy in the entire world using, or even to have ever used, that exact version of that exact software on that exact processor model.

In which case the adage "with enough eyes all bugs are shallow", while perhaps true, isn't helpful.

No comments: