Tuesday, September 02, 2008

Choosing Software for Diminuto

Diminuto is my attempt to put together a platform on which to teach real-time software design, systems programming, embedded software development and assembly language programming, all using real commercial hardware and open source software. In Diminuto Right Out of the Box, I described how to get the Atmel AT91RM9200-EK evaluation kit (EK) running with the software it ships with, which includes a Linux version 2.4 kernel and a tool chain based on the GNU Compiler Collection version 2.95.3. In this this article, I'll describe the software I used to build a reduced-memory footprint system using the Linux version 2.6 kernel and a tool chain based on GCC 4.2.4.

My career as a technologist has been pretty much all over the map. I've always thought of that as a good thing, although I've been accused by my betters of being everything from "a renaissance man" to "unfocused". My career seems to shuttle back and forth between developing for high-end server-side systems and doing real-time and embedded development. Consequently, the systems for which I've developed have ranged from processors with eight kilobytes of memory to distributed networks of servers to supercomputers.

Even among embedded systems, the range has been kind of startling. In the 1970s I wrote standalone assembler code for PDP-11s with 8KB of core memory. Core memory was kind of neat because it maintained its state across power cycles. We used to patch our code by toggling in new machine instructions from the switch register on the front panel. More recently, and more typically, I've found myself writing mostly in C, C++ and occasionally even Java, for PowerPC, i960, or ARM microprocessors running some commercial real-time operating system (RTOS) like VxWorks, pSOS, C-Executive, or RTX which run completely memory resident. Meanwhile, the server-side portion of my career evolved from using just about every UNIX variant known to man to being 100% Linux based. Probably no surprise there.

With the availability of cheaper random access memory (RAM), and relatively inexpensive microprocessors that have a memory management unit (MMU), perhaps it was inevitable that these two career paths merge with the advent of embedded Linux. But even among embedded Linux systems there is a broad range of configurations, ranging from systems that still run completely memory resident and which are reminiscent of those VxWorks systems, to those that have persistent storage in the form of solid-state "disks" such that they seem more like a Linux server.

Whether you are accustomed to developing for Linux on a PC, or you are an embedded developer used to VxWorks, embedded Linux will seem like a weird combination of the familiar and the bizarre, not quite fitting in with either world. The kernel seems familiar to the PC developer, but there may be no persistent storage and hence no paging of virtual memory, and you may do everything logged in as root while the system runs in single-user mode. The lack of persistent storage seems natural to the embedded developer, but the system has an MMU which maps between physical and virtual memory addresses, the processor executes code in privileged and non-privileged mode, it insists on having a root file system even though it may be completely memory resident in something called a RAM disk, and there is an unusual wealth of sophisticated applications and tools that can run on the system. The need to compile and link on a host system (like a conventional Linux PC with an Intel processor and a cross-compilation tool chain) but run on a completely different target system (like the AT91RM9200-EK board with an ARM processor and no tool chain at all) freaks out the PC developer but is strictly in the comfort zone of the embedded developer.

Many embedded systems have pretty severe resource constraints when compared to your typical PC. The EK board has 32 megabytes of RAM and a processor delivering a claimed 200 million instructions per second (MIPS). The Dell 530 quad-core server in the basement (a.k.a. the vast Digital Aggregates corporate data center) on which I run the Diminuto tool chain has three gigabytes of RAM and its speed is several billion instructions per second (GIPS). My Nokia N810 internet tablet, more typical of hand-held consumer devices, has 128 megabytes of RAM, just four times that of the EK, with maybe a 400 MIPS processor. All of these are Linux systems, but clearly they are not quite the same Linux system.

Living within your means, RAM-wise, isn't just an issue for the kernel and any programs that are running at any one time. Although Linux requires a root file system, I've mentioned that many embedded devices have no persistent storage device. The kernel, any running programs, and the root file system in some kind of RAM disk, all share the available RAM. The kernel and the RAM disk are loaded into RAM at boot time, either from some read-only memory (ROM) or across a network using something like the Trivial File Transfer Protocol (TFTP). Reducing the memory footprint of the RAM disk turns out to be important too.

Or maybe not. Ridiculously large surface mounted flash memory devices with integrated controllers that emulate IDE disk drives are becoming more available, driven in part by Windows CE hand-held devices. The multi-gigabyte single chip disk drives that I have used come from the factory pre-formatted for the Windows FAT file system. Linux itself includes drivers for devices like Secure Digital (SD) flash cards and USB flash drives which emulate SCSI disk drives.

Persistent storage devices are becoming more common in all but the most cost sensitive or tiniest embedded systems. However, such solid state persistent storage devices tend to be very slow when compared to disk drives. Even when they are available, there is a tendency to keep as much as possible of the working set of "disk" blocks from these devices cached in memory.

It's easy to argue that the trend will be towards more and more RAM in embedded devices, just as there has been for servers, desktops, and laptops. But the trend may be in just the opposite direction for many embedded applications. Microcontrollers (which is what microprocessors are called when you don't know that they're there) are being used in physically smaller, and more price sensitive, devices. There is a strong competitive pressure to shave the manufacturing cost of a device that will be sold in the millions. Sometimes this means less RAM and eliminating a flash drive. Pennies count in such applications.

I've built both types of embedded Linux systems: those that had a lot of RAM and a large flash-based disk drive, and those that booted from ROM and ran completely in RAM. The former wasn't substantially different from the user's point of view from a PC-based system: multi-user, ran the SSH daemon, even ran an Apache web server and a Java SE JVM, and your files were still there in your home directory when the system was rebooted. The latter: single user mode, could only be accessed from a serial port, and every press of the reset button was a new day RAM disk-wise. It's easy to get spoiled with the former. I learned more from the latter. Diminuto marches right down the middle: the basic system runs completely resident in RAM, but supports the use of persistent file systems on SD cards and USB drives.

Here are some approaches to reducing the resource footprint of a Linux system.

Reduce the Size of the Kernel

You download the latest Linux kernel from www.kernel.org: all eight million lines of C code in twenty thousand source files. In the case of Diminuto, which uses an ARM-based processor, you run the command

make menuconfig ARCH=arm

and are delighted to discover that you can select the options AT91 and AT91RM9200-EK to get support for the devices in your system-on-a-chip. Then you build the kernel and discover to your horror that the resulting bootable image is enormous. True, the Linux kernel has direct support for the AT91RM9200-EK board. But if you just select that option, you will find that by default the configuration process has selected tons of other options and drivers you could care less about, including a lot of stuff you don't even recognize, and also left out a bunch of stuff you know you're going to need.

To generate exactly the kernel you need without blowing your memory budget, you must laboriously go through all the dozens of nested configuration menus, figure out whether you really need each item, and then either enable or disable it. This includes the seemingly hundreds of device drivers, including the ones that can be built as loadable modules (more on that later). This may take several iterations, some research, and some hard decisions ("Hmmmm, do I really need IPV6?"), but it's worth it. In the end you'll get what you need with a memory footprint you can live with. If you forget something, you can always generate another kernel and try it.

During this process you will be tempted to build every device driver as a loadable module. You will do this because this is exactly what you do for your PC-based Linux system. Loadable kernel modules and device drivers are not linked directly into the kernel image, but instead are loaded dynamically on demand from the file system. Building loadable modules for your embedded system doesn't actually hurt anything, as long as you realize it's mostly a waste of time. Devices other than what are on your board aren't likely to magically appear; the EK doesn't have a back plane into which you can easily plug in other boards. And you have pretty much a 100% chance of using the devices that are implemented on your board. Furthermore, unused loadable modules in a RAM disk file system take up RAM whether you are using them or not. My advice is to build all the kernel options and device drivers you know you are going to use directly into the kernel, and explicitly omit everything else.

There is an important exception to this rule: if you are writing your own device drivers for your board, you may want to use loadable modules for reasons of intellectual property and licensing. Conventional wisdom is that loadable modules are not contaminated by the GNU Public License (GPL), whereas modules linked directly into the kernel are. I'm not a lawyer, and I'm not aware of any legal precedent having been set for this, but when I've written device drivers for my clients for closed, proprietary, custom hardware, I've built them as loadable modules.

In Diminuto, I built version 2.6.25.10 of the Linux kernel using the approach I just described, including all supported devices, directly into the kernel, and omitting everything else.

There are also alternatives to the conventional Linux kernel. uCLinux is a version of Linux that does not require an MMU. I've never used it. But it does admit the possibility of running Linux on microcontrollers that are in the traditional sphere of RTOSes like VxWorks. uCLinux has been ported to many of the embedded microprocessors on which I've worked over the years. I think it's worth examining if that's the world in which you are working.

Reduce the Size of Binary Executables

A statically linked binary executable is a program which incorporates all of its code in a single monolithic image. If several programs are statically linked against the same library, they each contain unique copies of the same functions and subroutines that they reference in that library. Everyone gets their own copy of printf. There are sometimes reasons to do this (it may facilitate debugging), but it makes every binary executable a lot larger than it otherwise needs to be.

A shared library in Linux is like a Dynamic Link Library (DLL) in Windows (or so I'm told). It is a way for binary executables to share common code. One copy of the shared library is loaded into memory. When a binary executable which was originally dynamically linked against that shared library is run, the program loader fills in the references to the shared functions and subroutines inside the executable so that they point into the shared library. The MMU is used to create the illusion that everyone is using their own copy of the library, even though everyone is sharing the same code.

This is a good thing, generally, whether you have an embedded system with 32MB of RAM or a PC with 3GB of RAM. But it does take some care to manage the installation and maintenance of the shared library. Woe be to the person who runs a binary executable that was dynamically linked on the host system against a different version of the shared library from that which is resident on the target system.

It also requires some care in the construction of the shared library itself. A shared library that incorporates a bunch of functions and subroutines that aren't used by any binary executable is still wasting RAM. Shared libraries are only a win if the code in them is actually shared. This is why I chose to build Diminuto using uClibc, a version of the Standard C library with a reduced memory footprint. uClibc gets its reduction through a variety of means: by omitting some features not needed in most embedded applications, by reducing the capacity of some retained features, and by some code re-factoring. I used version 0.9.29 of uClibc for Diminuto.

I have also built embedded Linux systems using the full blown GNU standard C library, so your mileage may vary. If you are running completely memory resident, I recommend you at least look at uClibc to see if it meets your needs. If you have the extravagant pleasure of a "disk" on your target, you may consider using the GNU library. On my To Do list for Diminuto is to build a root file system using the full GNU library that boots from an EXT3 file system on the Secure Digital (SD) flash card supported by the EK. I built support for EXT3 and the SD driver into the kernel, so that same kernel should work for either system. I'd like to eventually run the ARM-version of Sun's Standard Edition (SE) of their Java Virtual Machine (JVM) using this platform. (I've done this in the past on embedded PowerPC platforms with zero issues.) I routinely use EXT3 file systems on USB drives for persistent storage with the current RAM-resident Diminuto system.

One thing I did not scrimp on was the C++ standard library and the POSIX Thread library. I knew I was going to port my embedded, real-time C++ toolkit, Desperado, to Diminuto, and I wanted unfettered access to the complete Standard Template Library (STL) and to pthreads. Recently, however, a reduced footprint version of the C++ standard library, uClibc++, has been developed that is worth a look. I haven't used it.

I've had no library-compatibility issues with porting either Desperado or my C-based Diminuto library to the AT91RM9200-EK and running their unit tests. I have had some weirdness with the GNU arm-linux-g++ compiler itself which I discuss on the Diminuto web page.

Reduce the Size of the Root File System

Much of the Linux root file system is taken up with scripts and utilities to manage its multi-user environment and to provide a rich set of tools for those multi-users. The entire huge System V init script infrastructure exists mainly to take a multi-user system up and down in an orderly manner. The typical embedded system, however, runs a very limited set of applications, may have no traditional user interface at all except for during initial development, all processes may run as root, and it is brought up and down strictly with the power switch. This requires a much simpler infrastructure that eliminates a lot of the stuff usually found in the root file system.

This is also one of the reasons I recommend using an EXT3 journaled file system if you have persistent storage. EXT3 is bit-compatible with EXT2, a staple of Linux systems for a long time, with the addition of a journal file. In fact, an EXT3 file system can be mounted as an EXT2 file system, in which case the system ignores the journal file. EXT3 is a robust approach for an embedded system where someone may hit the reset button at any time. I built Diminuto with support for EXT2, EXT3, and VFAT (a.k.a. Windows FAT-32, especially useful for USB drives that are traded between Windows and Linux systems).

PC developers will be horrified to know that most work is done on embedded Linux systems while the developer is logged in as root, but this will seem natural to the embedded developer. This is because nearly all of the work that needs to be done on the target system has to be done as root, while all the development work that needs to be done on the host system is done logged in as a normal unprivileged user. Also, a scrogged target system can typically be recovered by just hitting the reset button, with the resulting loss of everything you may have had on the RAM disk. Linux offers the ability to segregate processes and files on the target system by user ID, which does provide a great deal more protection. I recommend thinking about this if non-developers have access to the innards of your embedded Linux system, even if they don't have login access. For example, you will surely want to run any web server under a non-root account.

BusyBox is a very clever way of reducing both the RAM and disk footprints of a Linux system without much impact on functionality. BusyBox is a single monolithic program that implements the functionality of dozens of commonly used Linux commands. It achieves this significant resource reduction, even when using shared libraries, by yet more sharing of common code, excluding seldom used options, and by being less scalable than the equivalent full featured GNU utilities. BusyBox is a multi-call binary: if it is invoked through a soft link called ls, it behaves like the ls command.

I built Diminuto using version 1.11.1 of BusyBox. A quick tour through all of the bin directories on my PATH reveals that my root file system on Diminuto has all of five binary executables:

/sbin/ldconfig
/usr/bin/ldd
/usr/bin/gdbserver
/usr/local/bin/getubenv, and
/bin/busybox.

The dozens of other commands are just soft links to /bin/busybox. (And the first three of those binaries are just to expedite debugging.)

The use of BusyBox is a huge win, even if you can't use all of its capabilities. I am currently using the Bourne-like shell provided by BusyBox. If your embedded system depends heavily on complex shell scripts, and if resources permit, consider loading your shell of choice (e.g. the Bourne Shell) on your embedded system. Scripting is a huge win productivity-wise over implementing the same functionality in C code, and the additional resources required by a full blown shell interpreter can be well worth it.

In future articles I'll discuss how I built the root file system for Diminuto, describe the adventure that was getting the 2.6 Linux kernel running on the AT91RM9200-EK board, and give you a tour of the running system.

No comments: