Thursday, August 16, 2007

What's Old Is New Again

Back in January I wrote an article on the Java Business Integration (JBI) specification, and my experience with a particular implementation of it, Apache's ServiceMix Enterprise Service Bus (ESB). One aspect of JBI really threw me for a loop back when I was in the trenches using it in a product development effort: the behavior of send versus sendSync.

JBI (and ServiceMix) offers two different mechanisms for sending a message across the ESB: send and sendSync. The send call is asynchronous, meaning it's fire and forget. Once the method returns successfully to your application, the only thing you really know is that your message exchange object has entered your delivery channel. This is true even if the message requires a response; the response will arrive later asynchronously on the same delivery channel.

It's like dropping a letter in the mailbox. Although it's likely it will be delivered to the destination, there is no guarantee. Furthermore, there is no guarantee that the order in which you dropped many letters addressed to the same recipient in the same mailbox will be the order in which they will arrive and be read by the recipient. (This is the part that really threw me.)

If all you are doing is sending in product registration cards for the new desktop PC system you just bought, it's no big deal. But if you are mailing deposits and money transfers to your bank, order is suddenly important: if all the transfers arrive and are processed before the deposits, some of the withdrawals associated with the transfers may bounce. That's because the former example is stateless, while the latter example is stateful.

JBI provides a sendSync call which is synchronous, meaning your application blocks on the sendSync call until it is acknowledged by the recipient. The kind of acknowledgement depends on the message exchange pattern (MEP) you are using. For example, if you are sending an InOnly MEP, then all you get back is a indication of Done: you are notified that the recipient received your message and that it actively marked the message exchange as completed. If you are sending an InOut MEP, then you get back a message exchange containing a response from the recipient, and it is you who must then mark the message exchange as completed by sending a Done back to the recipient. You have guaranteed delivery, or at least you know if it didn't work before proceeding, and (more subtly) order is guaranteed to be preserved among successive sendSyncs that you perform to the same recipient.

Synchronous

This sounds simple, but in a system of any complexity at all, it may not be pretty.

The most obvious design issue is that the original sender (the service consumer in JBI parlance) is blocked waiting for the recipient (the service provider) to process the message exchange. On a loaded system, the recipient may be busy processing a long queue of pending requests from many senders. This causes the sender to block for potentially a long period of time. Since the sender may itself be acting as a service provider to other service consumers (that is, they are trying to send it requests and get a response), pending requests can back up in many components throughout the system. Wackiness may ensue. Even if the system isn't busy, handling some requests may require long latency operations like persistence handling, database queries, or remote network accesses.

The other obvious design issue is that if there are circumstances in which the recipient may itself act as a service consumer to the original sender's service provider, that is, the recipient may as part of its processing make a request of the original sender, and both components use sendSync, the system deadlocks. The original sender is waiting on its sendSync as a consumer to the recipient as a provider, and the recipient as a consumer is waiting on yet another sendSync to the original sender as a provider. Neither sendSync will ever complete, or they will timeout if they were coded with a timeout parameter.

This is not a new issue. Developers (like me) old enough to be taking their meals through a straw will recognize a similar issue in the remote procedure call (RPC) paradigm that was fashionable in the 1990s. In RPC, distributed components communicated with one another through function calls that were mapped by frameworks like CORBA, OSF DCE, or SunRPC (I've used 'em all) into network messages. Developers (like me) old enough to have one foot in the grave will remember dealing with this issue when writing bidirectional messaging frameworks using Berkeley sockets and TCP in the 1980s. I dimly recall similar issues arising when writing communications layers for PDP-11s using RS232 serial connections in the 1970s.

Geoff Towell, a colleague of mine in the JBI adventure, remarked that in his experience "systems using synchronous message passing aren't scalable." He also noted that "systems using asynchronous message passing can be difficult to write." In my experience, he was correct on both counts. The fix is the same whether you are using JBI, RPCs, sockets, or serial ports.

To insure guaranteed delivery and preserve order, you use synchronous message passing: sendSync, an RPC call with a returned value, a TCP socket, or a serial protocol that requires a reply. But the response you get back merely indicates that the recipient received your request and queued it for later processing. It says nothing about when the recipient will actually get around to processing your request. When it does, the relative roles of the components will reverse: the original recipient will act as a consumer and perform a sendSync with a new message exchange to the original sender, who is now acting as a provider and will appropriately complete the new message exchange. Hence, message passing is synchronous, but message processing is asynchronous.

Asynchronous

We did the same thing with RPCs: the successful return of the remote procedure call merely meant that the called component received the parameters, not that it actually did anything with them. When the called component completed processing the request, it would invoke a callback function in the original calling component, doing another RPC in the opposite direction to deliver the response.

This is why the design of the sender and recipient gets ugly: they may both have to handle multiple requests concurrently. They typically do this by implementing multiple concurrent state machines, each machine implemented as an object. For each request, whether originated by the sender or received by the recipient, a new state machine object is created. Many of these objects may exist simultaneously in both the sender and the recipient as many requests are asynchronously processed. Each state machine is maintained in the recipient until a response for the request that particular state machine represents can be sent, and the recipient's state machine transitions to its final state. The original sender maintains its own state machine for each request until the corresponding response is received and processed, then that state machine also transitions to its final state.

(If you are into automata theory or formal languages, all of this will sound very familiar. The message exchange between the two components describes a protocol. Finite state machines and other automata are typically used to implement parsers for formal languages. Formal languages are described by formal grammars. The fact that you frequently use a state machine to implement a protocol is why protocols are often described in terms of a formal grammar. Such grammars are remarkably useful and should be in every developer's toolbox. But that is a topic for another article.)

There are a number of ways you might implement multiple concurrent state machines. The simplest is to have a separate thread for each request in both the sender and the recipient. This works well in systems in which the cost of context switching and maintaining a thread is zero. The fact that there are no such systems means this approach is seldom used.

(It can in fact get bad a lot faster than you might expect, since on some systems I have seen the cost of context switching increase proportional to the square of the number of active threads. I wrote a marvelous white paper on this topic, that this blog article is too small to contain.)

You may have a fixed size pool of threads in the recipient that service a common queue of incoming requests. I have seen this work well in both Java and C++ implementations in which the number of possible concurrent outstanding requests is small, the lifespan of each request is short, and concurrent pending requests are mostly independent. There are Java and C++ frameworks that provide this capability.

When I've run up against systems that have to handle 38,000 concurrent long-duration requests (and no, I didn't pick that number out of thin air), neither approach scales, and I resort to designing and coding an application-specific concurrent state machine implementation that runs inside a small number (like one) of threads. This is not as hard as it sounds.

(Dan Kegel wrote a really great article on this scalability issue in the context of server-side socket applications in systems of UNIX-like sensibilities; see The C10K Problem.)

My web services friends will no doubt be up in arms over this article, either because I'm suggesting using synchronous message passing, or because I'm suggesting using asynchronous message processing. Probably both. But my background for the past thirty years has been in building robust, scalable server-side real-time systems, and what I have described here is a design pattern I have found to work.

Update (2008-07-07)

I've recently been reading about continuations, which are a mechanism to, very roughly speaking, pick up a prior computation where it left off. It's funny: being the wizened old man that I am, I always thought of continuations as a form of checkpointing, from my time in the deep past with IBM mainframes and Cray supercomputers. It wasn't until recently that I realized that continuations serve essentially the same purpose for web servers as the state machine architecture I describe here and have implemented on several real-time systems over the years. For that matter, checkpoints served a similar purpose on mainframes and supercomputers.

I suspect that the motivation for all of these mechanisms was slightly different. Checkpoints existed because hardware was slow and expensive, it wasn't uncommon for the system to crash, and when it did you wanted to pick up work were it left off. The state machine architecture I describe here was mostly done for scalability, handling tens of thousands of simultaneous connections with just a few threads. Continuations seem to be mostly motivated not just by robustness and scalability, but the state-less nature of RESTful HTTP operations.

Maybe as I read more it will become more clear to me that continuations are completely different. In the spirit of The Sapir-Whorf Hypothesis, I should probably learn a programming language or framework that supports continuations natively.

No comments: