Sun, 30 Mar 2008

How Do I Make This Hard to Misuse?

It's useful to arm ourselves with a pithy phrase should we ever have to face an "it'll be easier to use!" argument. But once we've pointed to it, it's still not clear how to improve the difficulty of interface misuse.

So I've created a "best" to "worst" list: my hope is that by putting "hard to misuse" on one axis in our mental graphs, we can at least make informed decisions about tradeoffs like "hard to misuse" vs "optimal".

The Hard To Misuse Positive Score List

10. It's impossible to get wrong.

This ideal is represented by the dwim() (Do What I Mean) function, where misuse means the implementation has a bug. In real life this goal is only achievable by greatly restricting your definition of misuse. Even the dwim() function can be abused by not calling it at all.

9. The compiler/linker won't let you get it wrong.

As a C person, I like that the compiler reads all my code before it even gives me a chance to run any of it. We're so used to this we don't give it a second thought when the compiler barfs because we use the wrong type or don't provide enough arguments to a function. But we can go out of our way to use this: various project such as gcc and the Linux kernel have macros like BUILD_BUG_ON(cond) which can be implanted strategically to evoke compile errors (it evalates sizeof(char[1-2*!!(cond)]) which won't compile if cond is true).

I use this in the kernel's module_param(name, type, perm) macro to check that the read/write permissions for the module parameter are sane (a common mistake was to specify 644 instead of 0644).

8. The compiler will warn if you get it wrong.

This is weaker than breaking the compile, but in many cases easier to achieve. The classic of this school is the Linux kernel min() and max() macros, which use two GCC extensions: a statement expression which allows the whole statement to be treated by the caller as a single expression, and typeof which lets us declare a temporary variable of same type as another:

	 * min()/max() macros that also do
	 * strict type-checking.. See the
	 * "unnecessary" pointer comparison.
	#define min(x,y) ({ \
		typeof(x) _x = (x);	\
		typeof(y) _y = (y);	\
		(void) (&_x == &_y);	\
		_x < _y ? _x : _y; })

Since a common error in C is to compare signed vs unsigned types and expect a signed result, this macro insists that both types be identical.

7. The obvious use is (probably) the correct one.

Always make it easier to do the Right Thing than the Wrong Thing. So if you can't make the right thing easy, make the wrong thing hard! This is the "explicit args required for kmalloc" example again, but it usually means choosing defaults carefully and knowing the normal use for the function.

My example here is the standard Unix exit() and _exit(): the latter does not call any atexit() handlers and is usually not the right choice, so it's harder to find.

6. The name tells you how to use it.

Everyone knows a good name is invaluable. In the _exit() the underscore punches far above its one-character weight was a warning sign.

My example here is the strange reference counting mechanism used by the Linux Kernel module code: getting a reference count can fail, unlike almost all the rest of the kernel reference counts. Hence, the "get a reference count" function is called try_module_get(): those first four characters reflect the importance of the return code. Note that these days, the GCC "__attribute__((warn_unused_result))" can be used to promote this usage to a warning. I still like the name, though, because overuse of such things has lead to some warning fatigue...

5. Do it right or it will always break at runtime.

As soon as the misusing code is executed, it'll die horribly. Not all code paths are tested, but this will often catch cases where someone is writing new code using your interface. It's hard for the compiler to ensure that the user calls your "open" routine before your other routines, but an "assert()" can at least get you to this level.

4. Follow common convention and you'll get it right.

This is a corollary of "this simplest use is the correct one", and a very useful handhold on the way up this scale. In particular, C convention for argument order seems to have evolved down to three ordered rules:

  1. Context argument(s) go first. A context is something the user will do a series of different things to; a handle.
  2. Associated arguments are adjacent. An array and its length go together, as does a timestamp and its granularity. If you could see yourself making a structure out of some of the args, they should go together.
  3. Details go as late as possible. Flags for the function go at the end. Pointer and length pairs are passed in that order.

I've never gotten the argument order of the standard write() wrong, even though the fd and count could be interchanged:
	ssize_t write(int fd, const void *buf, size_t count);

There are also minor (but important!) conventions, such as memcpy's "destination before source", which you should use for any memcpy-like routines.

Like all rules, this one exists to be violated; but know you're doing so.

3. Read the documentation and you'll get it right.

People only read instructions after they've already tied themselves into a knot. Then they skim them for keywords and don't read your warnings. I don't give an example of this; if this is the best an interface can get do, it's in trouble.

2. Read the implementation and you'll get it right.

We've all done this. Reading the implementation can work for the simple questions (what unit is this argument in?), but leads to trouble for the subtler issues. The concept of "the" implementation is always problematic, and when the implementation is tightened or fixed we discover we didn't actually get it right, we just got it working.

In some cases, the implementation is a noop, which doesn't help.

1. Read the correct mailing list thread and you'll get it right.

The reason the some strange interface quirk exists might be for compatibility with some strange OS or compiler, weird corner case or even older versions of this codebase. In other words, historical reasons ("see, on the VAX we only had 6 characters for..."). You sometimes only find this when you send a patch to fix it and the original author yells at you.

Sometimes they add it to the FAQ. That does not increase the interface's score very much: please try harder.

[/tech] permanent link