Tue, 22 Apr 2008

Austin, TX

Arrived for the virtualization mini-summit (alongside the Linux Foundation Collaboration Summit) the week before last, and stayed around because much of IBM's kvm work is done here. Much hacking, but I should have blogged about my travel plans sooner.

I leave on Friday for San Jose (on the "Nerd bird" I'm told) for the weekend before I fly back home, but if anyone wants to catch up, send mail...


[/self] permanent link

Mon, 07 Apr 2008

C inline functions not in headers

I just appreciated an interesting side-effect of slapping "inline" on static functions within .c files. You don't get a warning when they become unused.

This breaks my normal method for code cleanup (in this case, the tun driver). So unless you have evidence otherwise, plase trust the compiler to inline static functions appropriately and don't label them inline. (And remember: inline is the register keyword for the 21st century.)


[/tech] permanent link

Sat, 05 Apr 2008

Hard To Misuse Commentry

Since my blogfu doesn't extend to comments, I recommend the thoughtful comments found on my recent 'Hard to Misuse' posts at LWN: firstly 'How Do I Make This Hard to Misuse?' commentry and then 'What If I Don't Actually Like My Users?' commentry.
[/tech] permanent link

Tue, 01 Apr 2008

What If I Don't Actually Like My Users?

Here begins our descent into hell; if an interface manages to achieve negative scores on the Hard To Misuse List, your users may detect the dull red glow of malignancy rather than incompetence.

-1. Read the mailing list thread and you'll get it wrong.

If the first hit on Google when searching for the symptoms or how to use your interface leads to a convincing but incorrect answer, that puts your interface here.

-2. Read the implementation and you'll get it wrong.

This happens most often when the implementation being read is not the one you which ends up being used. Or maybe the implementation comes with test cases which all exercise the unnatural corners of the interface, which mislead instead of enlightening.

-3. Read the documentation and you'll get it wrong.

Here's my favorite (now fixed) example, from the glibc snprintf man page:

RETURN VALUE
       snprintf and vsnprintf do not write more than size bytes
       (including the trailing '\0'), and return -1 if the output was
       truncated due to this limit.

I was scanning the man page for the return value on overlength snprintfs; now I'd found it I stopped reading. But here was the next sentence:

       (Thus until glibc 2.0.6. Since glibc 2.1 these functions follow
       the C99 standard and return the number of characters (exclud-
       ing the trailing '\0') which would have been written to the
       final string if enough space had been available.)
-4. Follow common convention and you'll get it wrong.

The usual example here is fputs() and similar which take the context argument at the end instead of the start:

	int fputs(const char *s, FILE *stream);

But that doesn't quite get down here: the compiler will warn if you get the argument order backwards (or, if you prefer, forwards). So again I reach to the Linux Kernel, this time for the list macros:

	void list_add(struct list_head *new, struct list_head *head);

I now have this nailed into my brain, but for a long time I expected the 'head' (ie. the list I'm adding to) to be the first argument. Of course, this wouldn't be such a problem if list heads and list entries were not exactly the same type.

-5. Do it right and it will sometimes break at runtime.

Every C programmer knows that malloc returns NULL on error:

	p = malloc(bufsize);
	if (!p) {
		/* Phew!  We can handle this... */
		backout_nicely();
		exit(1);
	}

Except malloc may also return NULL on zero-length allocations: something you'll find out the hard way when your nice code which didn't special case 0-length allocations breaks horribly on someone else's machine.

-6. The name tells you how not to use it.

Sometimes we opt for changing behavior without changing a (now-inappropriate) name, knowing that existing users won't be broken by the new behaviour. But don't curse future users with a misleading name: if your project takes off, there will be far more of them than current users.

My example here is another Linux kernel one which bit me. I was writing a block (disk) driver: it gets passed a struct request which consists of a series of chunks. After servicing them, it calls end_request(). Only it turns out that (for historical reasons!) this only ends the first chunk. My block driver "worked", but it was doing about N^2/2 times the work it needed to do for an N-chunk request.

(I didn't find that, the maintainer reviewing my code did).

-7. The obvious use is wrong.

I've been coding in C for about 20 years, and about five years ago I spent an hour chasing a case where I'd done if (strcmp(arg, "foo")) instead of if (!strcmp(arg, "foo")). Now I religiously #define streq(a, b) (!strcmp((a),(b))) because I know I'm not as smart as I think I am.

Less "I'm obviously an idiot" is the behavior of strncpy() which truncates the destination string without adding a NUL terminator. Or char x[5] = "hello"; which the C standards committee thought would be an excellent trap for newcomers (and particularly stupid since there is a workaround if you really want an unterminated character array).

-8. The compiler will warn if you get it right.

The bind() socket library call comes to mind here: it takes a struct sockaddr but you always have to cast to use it, as you will never have a struct sockaddr, but instead a struct sockaddr_in or some other specific type. This one is almost excusable, although I'd expect better from modern code.

-9. The compiler/linker won't let you get it right.

This is hard to find in C, since the compiler will let you cast your way through almost anything. Listed here for completeness.

-10. It's impossible to get right.

Unlike the first category, this final category is neither a paragon nor unattainable. Some interfaces are so fundamentally flawed that they can't be used correctly. Perhaps it can fail in a way you have to know about but it doesn't return an error. Perhaps it returns an error but you can do nothing about it.

In the Linux kernel there used to be interfaces which assumed single-threading, and are now unsafe. Say you expose two functions called prepare() and and action() and expect the caller to do if (prepare()) action();. This is broken if action() relied on all the checks in prepare() passing, and now conditions can change between the two.

That's everything I know about interface design. Now, go and make your own mistakes so you can have wise things to say about it!


[/tech] permanent link

Sun, 30 Mar 2008

How Do I Make This Hard to Misuse?

It's useful to arm ourselves with a pithy phrase should we ever have to face an "it'll be easier to use!" argument. But once we've pointed to it, it's still not clear how to improve the difficulty of interface misuse.

So I've created a "best" to "worst" list: my hope is that by putting "hard to misuse" on one axis in our mental graphs, we can at least make informed decisions about tradeoffs like "hard to misuse" vs "optimal".

The Hard To Misuse Positive Score List

10. It's impossible to get wrong.

This ideal is represented by the dwim() (Do What I Mean) function, where misuse means the implementation has a bug. In real life this goal is only achievable by greatly restricting your definition of misuse. Even the dwim() function can be abused by not calling it at all.

9. The compiler/linker won't let you get it wrong.

As a C person, I like that the compiler reads all my code before it even gives me a chance to run any of it. We're so used to this we don't give it a second thought when the compiler barfs because we use the wrong type or don't provide enough arguments to a function. But we can go out of our way to use this: various project such as gcc and the Linux kernel have macros like BUILD_BUG_ON(cond) which can be implanted strategically to evoke compile errors (it evalates sizeof(char[1-2*!!(cond)]) which won't compile if cond is true).

I use this in the kernel's module_param(name, type, perm) macro to check that the read/write permissions for the module parameter are sane (a common mistake was to specify 644 instead of 0644).

8. The compiler will warn if you get it wrong.

This is weaker than breaking the compile, but in many cases easier to achieve. The classic of this school is the Linux kernel min() and max() macros, which use two GCC extensions: a statement expression which allows the whole statement to be treated by the caller as a single expression, and typeof which lets us declare a temporary variable of same type as another:

	/*
	 * min()/max() macros that also do
	 * strict type-checking.. See the
	 * "unnecessary" pointer comparison.
	 */
	#define min(x,y) ({ \
		typeof(x) _x = (x);	\
		typeof(y) _y = (y);	\
		(void) (&_x == &_y);	\
		_x < _y ? _x : _y; })

Since a common error in C is to compare signed vs unsigned types and expect a signed result, this macro insists that both types be identical.

7. The obvious use is (probably) the correct one.

Always make it easier to do the Right Thing than the Wrong Thing. So if you can't make the right thing easy, make the wrong thing hard! This is the "explicit args required for kmalloc" example again, but it usually means choosing defaults carefully and knowing the normal use for the function.

My example here is the standard Unix exit() and _exit(): the latter does not call any atexit() handlers and is usually not the right choice, so it's harder to find.

6. The name tells you how to use it.

Everyone knows a good name is invaluable. In the _exit() the underscore punches far above its one-character weight was a warning sign.

My example here is the strange reference counting mechanism used by the Linux Kernel module code: getting a reference count can fail, unlike almost all the rest of the kernel reference counts. Hence, the "get a reference count" function is called try_module_get(): those first four characters reflect the importance of the return code. Note that these days, the GCC "__attribute__((warn_unused_result))" can be used to promote this usage to a warning. I still like the name, though, because overuse of such things has lead to some warning fatigue...

5. Do it right or it will always break at runtime.

As soon as the misusing code is executed, it'll die horribly. Not all code paths are tested, but this will often catch cases where someone is writing new code using your interface. It's hard for the compiler to ensure that the user calls your "open" routine before your other routines, but an "assert()" can at least get you to this level.

4. Follow common convention and you'll get it right.

This is a corollary of "this simplest use is the correct one", and a very useful handhold on the way up this scale. In particular, C convention for argument order seems to have evolved down to three ordered rules:

  1. Context argument(s) go first. A context is something the user will do a series of different things to; a handle.
  2. Associated arguments are adjacent. An array and its length go together, as does a timestamp and its granularity. If you could see yourself making a structure out of some of the args, they should go together.
  3. Details go as late as possible. Flags for the function go at the end. Pointer and length pairs are passed in that order.

I've never gotten the argument order of the standard write() wrong, even though the fd and count could be interchanged:
	ssize_t write(int fd, const void *buf, size_t count);

There are also minor (but important!) conventions, such as memcpy's "destination before source", which you should use for any memcpy-like routines.

Like all rules, this one exists to be violated; but know you're doing so.

3. Read the documentation and you'll get it right.

People only read instructions after they've already tied themselves into a knot. Then they skim them for keywords and don't read your warnings. I don't give an example of this; if this is the best an interface can get do, it's in trouble.

2. Read the implementation and you'll get it right.

We've all done this. Reading the implementation can work for the simple questions (what unit is this argument in?), but leads to trouble for the subtler issues. The concept of "the" implementation is always problematic, and when the implementation is tightened or fixed we discover we didn't actually get it right, we just got it working.

In some cases, the implementation is a noop, which doesn't help.

1. Read the correct mailing list thread and you'll get it right.

The reason the some strange interface quirk exists might be for compatibility with some strange OS or compiler, weird corner case or even older versions of this codebase. In other words, historical reasons ("see, on the VAX we only had 6 characters for..."). You sometimes only find this when you send a patch to fix it and the original author yells at you.

Sometimes they add it to the FAQ. That does not increase the interface's score very much: please try harder.


[/tech] permanent link

Tue, 18 Mar 2008

APIs: "Easy to Use" vs "Hard to Misuse"

It's an elementary goal of API design to make something easy to use: easy for yourself, easy for yourself next year, easy for others. Let's take that as a given.

Many goals will conflict with "easy to use", but the subtlest is the requirement that an API be hard to misuse. Ease of use attracts users, but difficulty of misuse keeps them alive.

To make this concept crisp, I have two real life examples. The first is the safety catch on a gun. Hard to misuse beats easy to use.

The second example is the Linux kernel's kmalloc dynamic memory allocation function. It takes two arguments: a size and a flag. The most commonly used flag arguments are GFP_KERNEL and GFP_ATOMIC: I'll ignore the others for this example.

This flag indicates what the allocator should do when no memory is immediately available: should it wait (sleep) while memory is freed or swapped out (GFP_KERNEL), or should it return NULL immediately (GFP_ATOMIC). And this flag is entirely redundant: kmalloc() itself can figure out whether it is able to sleep or not. Implementing malloc() would be a no-brainer, and kernel coders generally like ease of use. So why don't we? [Correction:Jon Corbet points out that it's not entirely redundant in some configurations; we'd need to do a few lines extra work.]

Because atomic allocations should be avoided: they're drawing from a limited pool and more likely to fail or make other atomic allocations fail. By placing the burden of specifying this onto the author, we make atomic allocations easier to spot and thus harder to abuse.

And if we want to make our APIs harder to misuse we need to measure how an API scores, and that'll be the topic of the next post.


[/tech] permanent link

Wed, 12 Mar 2008

Bricklayer, not cathedral builder.

I'm always a little uncomfortable with "fuzzy" programming topics; much better to judge between two specific pieces of code. The big issues are important but it's hard to say something new on that topic which will help people code better. Most useful stuff has been said already.

Nonetheless, for my OLS keynote years ago I did have a point which I felt was underappreciated, and managed to rope it down to actual guidelines so the idea was of practical use. I'm going to revisit that topic in my next few blog posts, because unfortunately my OLS keynote was not recorded anywhere for me to simply point to, and there has been some maturing of these ideas since then.


[/tech] permanent link