Free Software programmer
rusty@rustcorp.com.au
Subscribe
Subscribe to a syndicated
feed of my weblog, brought to you by the wonders of
RSS.
This blog existed before my current employment, and obviously
reflects my own opinions and not theirs.
This work is licensed under a Creative Commons Attribution 2.1 Australia License.
Categories of this blog:
IP issues
Technical issues
Personal issues
Restaurants
Older issues:
All 2008 posts
All 2007 posts
All 2006 posts
All 2005 posts
All 2004 posts
Older posts
|
Rusty's Bleeding Edge Page
Sun, 30 Mar 2008
It's useful to arm ourselves with a pithy phrase should we ever
have to face an "it'll be easier to use!" argument. But once we've
pointed to it, it's still not clear how to improve the difficulty of
interface misuse.
So I've created a "best" to "worst" list: my hope is that by
putting "hard to misuse" on one axis in our mental graphs, we can at
least make informed decisions about tradeoffs like "hard to misuse" vs
"optimal".
The Hard To Misuse Positive Score List
- 10. It's impossible to get wrong.
-
This ideal is represented by the dwim() (Do What I Mean) function,
where misuse means the implementation has a bug. In real life this
goal is only achievable by greatly restricting your definition of
misuse. Even the dwim() function can be abused by not
calling it at all.
- 9. The compiler/linker won't let you get it wrong.
-
As a C person, I like that the compiler reads all my code before
it even gives me a chance to run any of it. We're so used to this we
don't give it a second thought when the compiler barfs because we use
the wrong type or don't provide enough arguments to a function. But
we can go out of our way to use this: various project such as gcc and
the Linux kernel have macros like BUILD_BUG_ON(cond) which
can be implanted strategically to evoke compile errors (it evalates
sizeof(char[1-2*!!(cond)]) which won't compile if
cond is true).
I use this in the kernel's module_param(name, type,
perm) macro to check that the read/write permissions for the
module parameter are sane (a common mistake was to specify
644 instead of 0644).
- 8. The compiler will warn if you get it wrong.
-
This is weaker than breaking the compile, but in many cases easier
to achieve. The classic of this school is the Linux kernel min() and
max() macros, which use two GCC extensions: a statement expression
which allows the whole statement to be treated by the caller as a
single expression, and typeof which lets us declare a
temporary variable of same type as another:
/*
* min()/max() macros that also do
* strict type-checking.. See the
* "unnecessary" pointer comparison.
*/
#define min(x,y) ({ \
typeof(x) _x = (x); \
typeof(y) _y = (y); \
(void) (&_x == &_y); \
_x < _y ? _x : _y; })
Since a common error in C is to compare signed vs unsigned types and
expect a signed result, this macro insists that both types be
identical.
- 7. The obvious use is (probably) the correct one.
-
Always make it easier to do the Right Thing than the Wrong Thing.
So if you can't make the right thing easy, make the wrong thing hard!
This is the "explicit args required for kmalloc" example again, but it
usually means choosing defaults carefully and knowing the normal use
for the function.
My example here is the standard Unix exit() and
_exit(): the latter does not call any atexit()
handlers and is usually not the right choice, so it's harder to find.
- 6. The name tells you how to use it.
-
Everyone knows a good name is invaluable. In the _exit()
the underscore punches far above its one-character weight was a
warning sign.
My example here is the strange reference counting mechanism used
by the Linux Kernel module code: getting a reference count can
fail, unlike almost all the rest of the kernel reference
counts. Hence, the "get a reference count" function is called
try_module_get(): those first four characters reflect the
importance of the return code. Note that these days, the GCC
"__attribute__((warn_unused_result))" can be used to promote this
usage to a warning. I still like the name, though, because overuse of
such things has lead to some warning fatigue...
- 5. Do it right or it will always break at runtime.
-
As soon as the misusing code is executed, it'll die horribly. Not
all code paths are tested, but this will often catch cases where
someone is writing new code using your interface. It's hard for the
compiler to ensure that the user calls your "open" routine before your
other routines, but an "assert()" can at least get you to this level.
- 4. Follow common convention and you'll get it right.
-
This is a corollary of "this simplest use is the correct one", and
a very useful handhold on the way up this scale. In particular, C
convention for argument order seems to have evolved down to three
ordered rules:
-
Context argument(s) go first. A context is something the user
will do a series of different things to; a handle.
-
Associated arguments are adjacent. An array and its length go
together, as does a timestamp and its granularity. If you could see yourself
making a structure out of some of the args, they should go together.
-
Details go as late as possible. Flags for the function go at the end.
Pointer and length pairs are passed in that order.
I've never gotten the argument order of the standard
write() wrong, even though the fd and count could be
interchanged:
ssize_t write(int fd, const void *buf, size_t count);
There are also minor (but important!) conventions, such as
memcpy's "destination before source", which you should use for any
memcpy-like routines.
Like all rules, this one exists to be violated; but know you're doing so.
- 3. Read the documentation and you'll get it right.
-
People only read instructions after they've already tied
themselves into a knot. Then they skim them for keywords and don't
read your warnings. I don't give an example of this; if this is the
best an interface can get do, it's in trouble.
- 2. Read the implementation and you'll get it right.
-
We've all done this. Reading the implementation can work for the
simple questions (what unit is this argument in?), but leads to
trouble for the subtler issues. The concept of "the" implementation
is always problematic, and when the implementation is tightened or
fixed we discover we didn't actually get it right, we just got it
working.
In some cases, the implementation is a noop, which doesn't help.
- 1. Read the correct mailing list thread and you'll get it right.
-
The reason the some strange interface quirk exists might be for
compatibility with some strange OS or compiler, weird corner case or
even older versions of this codebase. In other words, historical
reasons ("see, on the VAX we only had 6 characters for..."). You
sometimes only find this when you send a patch to fix it and the
original author yells at you.
Sometimes they add it to the FAQ. That does not increase the
interface's score very much: please try harder.
[/tech] permanent link
|
|