[K42-discussion] memory leaks, and what I am gonna do

Orran Y Krieger okrieg at us.ibm.com
Thu Jan 26 04:18:55 EST 2006


Posting to the list, we have to start getting better at exposing our 
internal hacking. 

We have had various memory leaks that we started looking at because of 
some running out of memory problems.  We started lookign at this using the 
LeekProof support in K42, and we are fixing a bunch of bugs.  Pointer on 
wiki to how to use LeakProof: http://k42.ozlabs.org/Wiki/DebuggingK42

Marc, for leaking page descriptors, had an interesting result.  If our 
experiments are correct, we are leaking page descriptors but not the 
equivaent number of page frames.  That is, on each run we see 20 page 
descriptors go away, but only a couple of pages of memory.    So, these 
are somehow page descriptors that are either not representing real frames, 
or are already pointing to existing page frames.  Thoughts Marc?  I am 
suspicious of fork logic, but thats just because it scares me :-)

Before I work on plugging the above leaks, I think I am going to work a 
bit on the problem that is actually causing the current problem.  The 
problem is that we are running out of memory in the page allocator even 
though lots of memory is available in the cache in the PM structures. 
While most operations go through the cache, a few allocates go directly to 
the page allocator.  Examples are the allocation of the dispatcher 
structures, some pinned multi-page operations, and some operations in the 
networking stack in linux.  Not only do we have to keep some memory in the 
page allocator for these uses, but we also ahve to keep some contiguous 
memory available for multi-page pinned structures.   I think we also use 
the page allocator as a common infrastructure (behnd the small memory 
allocator) to what is available in applications, so some allocates come 
from there.  The allocates of these are very rare, but the system panics 
if we can't satisfy them. 

 In retrospect, I did something stupid when first doing this work.  The 
caching (per-processor/PM) using the same interfaces of the page allocator 
as other operations.  For locking hierarchy reasons, the page allocator 
can't call back to the PM tree to flush pages..., since a request may be 
comming from the PM structure.  I am first going to introduce a different 
set of interfaces (or at least  a flag) to say if a request is from the PM 
cache or not.  If its not, then for single page allocates, the page 
allocator will just do the request back to the PM.  In that case the page 
allocator is just being called for interface reasons, and we will get a 
performance boost out of using the local cache.  For multi-page allocates, 
I will, try to do an allocate, and if it fails (contiguous memory not 
available) release locks and make a call to the PM structuers to flush 
back all the cache.   For calls from the PM side, the page allocator will 
return error instead of asserting, and the PM will flush back caches from 
other processors... before trying again. 

Comments welcome.
     -- Orran
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://ozlabs.org/pipermail/k42-discussion/attachments/20060125/f28596eb/attachment.htm 


More information about the K42-discussion mailing list