Thu, 18 May 2006

Van Jacobson Stole Those Channel Ideas!

After Van Jacobson's linux.conf.au talk, some IBM Research people complained to me about all the fuss: this was done years ago by the Exokernel crowd (see http://www.stanford.edu/~engler/exo-tocs.pdf). The Sydney LCA organizers said to me "this was described at LCA in Perth by Peter Chubb" (see http://www.linux.org.au/conf/2004/abstracts.html#24). This one was a bit of a miss: VJ didn't want to move drivers out into userspace. Now, we get a post to lkml on Lazy Receive Processing (see http://www.cs.rice.edu/CS/Systems/LRP/).

Lockless datastructures aren't new. Moving stuff out to usermode isn't new (some of the LRP ideas are already in Linux). Zero-copy to userspace isn't new (I remember seeing a student project doing this on Linux on my first trip to Bangalore, called "zbufs" IIRC). So why are we suddenly now excited about these ideas? Is David S. Miller just slow?

There are two basic reasons. The weaker, more prosaic one, is that Van had an implementation for Linux, and more importantly, a step-by-step plan for how to introduce this into Linux showing improvements at each point. This isn't the kind of stuff academics write papers about, but it's critical if you're trying to change an OS in widespread use. This lack of revolution was a strong argument. It wasn't "if you start with our wierdo OS, you can make it fast by doing X", nor "if you rewrite all your applications to use our wierdo API, you can make it fast" (although that was the part many people concentrated on: the final stage where VJ exposed the API to userspace).

The greater, more profound reason was clearer if you were at the talk. This wasn't about a performance tweak, not about cool datastructures, and not about zero-copy (which it isn't). This was about fixing a longstanding design kludge in TCP/IP: the acknowledgements done by OS on behalf of the application, in violation of the end-to-end principle. This, in turn, requires window information in every packet "Ack, but not really". Without this insight, you might be doing the right things, but for the wrong reasons: instead of a Grand Design, you have one or more Performance Hacks.

Just to nail this, let me quote the LRP paper (which I recommend reading):

The main difference between UDP and TCP processing in the LRP architecture is that receiver processing cannot be performed only in the context of a receive system call, due to the semantics of TCP. Because TCP is flow controlled, transmission of data is paced by the receiver via acknowledgments. Achieving high network utilization and throughput requires timely processing of incoming acknowledgments. If receiver processing were performed only in the context of receive system calls, then at most one TCP congestion window of data could be transmitted between successive receive system calls, resulting in poor performance for many applications.

Which is exactly the opposite of what VJ is saying.

And the Exokernel paper, also very interesting reading:

The real power of application-level networking is that it allows networking
software to be specialized for and integrated with important applications.

Which might be true (and I think it will prove true for message-passing a-la RDMA) but isn't the point which has got everyone excited today.


[/tech] permanent link