[K42-discussion] memory leaks, and what I am gonna do
Orran Y Krieger
okrieg at us.ibm.com
Thu Jan 26 04:18:55 EST 2006
Posting to the list, we have to start getting better at exposing our
internal hacking.
We have had various memory leaks that we started looking at because of
some running out of memory problems. We started lookign at this using the
LeekProof support in K42, and we are fixing a bunch of bugs. Pointer on
wiki to how to use LeakProof: http://k42.ozlabs.org/Wiki/DebuggingK42
Marc, for leaking page descriptors, had an interesting result. If our
experiments are correct, we are leaking page descriptors but not the
equivaent number of page frames. That is, on each run we see 20 page
descriptors go away, but only a couple of pages of memory. So, these
are somehow page descriptors that are either not representing real frames,
or are already pointing to existing page frames. Thoughts Marc? I am
suspicious of fork logic, but thats just because it scares me :-)
Before I work on plugging the above leaks, I think I am going to work a
bit on the problem that is actually causing the current problem. The
problem is that we are running out of memory in the page allocator even
though lots of memory is available in the cache in the PM structures.
While most operations go through the cache, a few allocates go directly to
the page allocator. Examples are the allocation of the dispatcher
structures, some pinned multi-page operations, and some operations in the
networking stack in linux. Not only do we have to keep some memory in the
page allocator for these uses, but we also ahve to keep some contiguous
memory available for multi-page pinned structures. I think we also use
the page allocator as a common infrastructure (behnd the small memory
allocator) to what is available in applications, so some allocates come
from there. The allocates of these are very rare, but the system panics
if we can't satisfy them.
In retrospect, I did something stupid when first doing this work. The
caching (per-processor/PM) using the same interfaces of the page allocator
as other operations. For locking hierarchy reasons, the page allocator
can't call back to the PM tree to flush pages..., since a request may be
comming from the PM structure. I am first going to introduce a different
set of interfaces (or at least a flag) to say if a request is from the PM
cache or not. If its not, then for single page allocates, the page
allocator will just do the request back to the PM. In that case the page
allocator is just being called for interface reasons, and we will get a
performance boost out of using the local cache. For multi-page allocates,
I will, try to do an allocate, and if it fails (contiguous memory not
available) release locks and make a call to the PM structuers to flush
back all the cache. For calls from the PM side, the page allocator will
return error instead of asserting, and the PM will flush back caches from
other processors... before trying again.
Comments welcome.
-- Orran
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://ozlabs.org/pipermail/k42-discussion/attachments/20060125/f28596eb/attachment.htm
More information about the K42-discussion
mailing list