The Tragedy of the RProxy

In 1999, I attended a wonderful talk by Dr. Andrew Tridgell at the first Australian Linux Conference. The talk was entitled "rsync in http", and in it, the already-legendary Open Source maestro described a method of transferring web pages, based on work he had done in his PhD thesis. In a nutshell, his thesis gave us the "rsync" program, which is the fastest way to send local files to a distant computer if the distant computer has an older, similar version of the same files. The distant machine sends a list of "signatures" of parts of the file, and the local one looks for matching signatures in the local file, and sends any parts of its file which don't match those signatures: in effect, only the differences are sent.

The talk was on his proposal to apply this same technique to the World Wide Web: your browser sends a request to the web server at the other end like normal, but you also send the signatures of any old copy of the same page you have lying around: the other end can choose to send you only the differences. If you don't have an old copy of the page, you can use any page you think might be a close match: if you guess wrong, at worst the web server will have to send you the entire file, which it would have done normally anyway.

Andrew had even written a small demonstration program, RProxy, and used it on his modem link at home for a few weeks. The result was an 85% reduction in web traffic. I sat in that lecture captivated by the possibilities. The Open Source Apache web server would be fairly simple to extend, giving support on a huge number of web sites. The Open Source Mozilla browser from Netscape could be similarly extended, giving Linux users a great way of saving bandwidth. The Open Source Squid proxy cache, used widely by ISPs, could be extended as well, meaning much lower costs for ISPs, as the proxy would use the protocol even if their users were still using old web browsers.

I was busy on other things, but the more I thought about it, the more I wanted to do it: document the protocol, write a software library for everyone in a convenient package, get it accepted by Squid and Apache, and later by the browsers, test on large-scale sites like the geek news site, and have it formally accepted as a web standard so everyone could implement it. This kind of "change the world" technology doesn't come up every day, and we were all excited.

But there was a problem. To be useful, this technology had to be ubiquitous: as many web servers on the Internet as possible had to use it. This technique, however, is potentially covered by a US patent dated December 1995. The company involved had approached Dr Tridgell before, and they had reached an unwritten gentleman's agreement that they would take no legal action over the "rsync" program which might infringe their patent. Andrew approached them again, but was unable to obtain a formal, wider exclusion which would be required for such a deployment into Internet infrastructure. Although their current business is not interested in using the technology for web pages, the patent gives them a monopoly on the idea; they don't even have to return emails (although generally, they have). Through the years of fairly fruitless off-and-on discussions, another patent holder has been discovered, further muddying the waters over this technique.

We cannot progress: neither patent holder wants to use the technology in this way, but we have no lever to convince them to allow us to do so. With previous problems such as Fraunhofer deciding to charge for their audio patents once their MP3 technology became widespread, and the more recent JPEG image patent assertions, we cannot spread this technology while it remains under a cloud (and indeed, the W3 Consortium will not accept such patent-encumbered standards). For over five years, this technology has been tantalizingly out of reach of all of us, because of a software patent.

Various people have argued that extending patents to cover software was a mistake: with copyright and trade secret protection already applying to the same work, granting patents seems wildly excessive. With over 10 million programmers in the world, there are more researchers than any other field of technology already, independent of software patents. All the common Internet protocols predate patentability of software, and hence their ubiquity, unimpeded by legal "yellow tape". Others have complained of poor quality of such patents being granted, the landmine effect of so many software patents, their use as cheap replacements for genuine research, their use as anti-competitive weapons, their parasitic relationship on top of research which happens anyway as a normal part of producing products, the danger of their introduction in an area which has already spawned troublesome monopolies, the widespread examples of software success prior to its patentability, the raising of barriers to entry against smaller businesses, and the threat to the rise of competitive Open Source software.

I believe strongly in the economic and social benefits of communication, from the remote parts of India to the London offices of The Economist. With RProxy, we could reach further, faster and cheaper, and so we would all gain. I cannot see how we benefit from granting software patents, but I can see what they're costing us.

Free RProxy.

Rusty Russell
Last modified: Fri Aug 20 14:38:25 EST 2004