Tue, 24 Oct 2006

Kernel coding irritations

The Linux kernel is one of the nicer pieces of complex code I've worked on, but it's not without ugliness. Consider this function from mm/memory.c (2.6.19-rc2):

int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
		unsigned long start, int len, int write, int force,
		struct page **pages, struct vm_area_struct **vmas);

This function get references to a number of user pages from address "start". Features are as follows:

  • The "mm" arg is always "tsk->mm" in the caller, in fact, it doesn't make sense for them to be different. (Actually, it's not clear that fs/aio.c does this correctly).
  • The "tsk" arg is always "current", the (global) macro identifying the current task. AFAICT it will work with other tasks, so it's not a fundamental constraint. [This is crap: DaveM points out that ptrace is the obvious case where tsk != current. See mm/memory.c's access_process_vm().]
  • It returns the number of pages it got, or -EFAULT, or -ENOMEM. This makes for some interesting error handling in the caller: if we run out of memory or hit a bad address after getting N pages, we return N. The caller presumably notes that not all the pages were retrieved, and unmaps the first N pages.
  • "len", unlike the name suggests, is actually the number of pages, not the length in bytes. This one bit me.
  • write and force are booleans, but the kernel doesn't use "bool". I've come to like bool for its documentation value.
  • Naturally, there's no documentation on this function.

[/tech] permanent link