[K42-discussion] memory leaks, and what I am gonna do

Orran Y Krieger okrieg at us.ibm.com
Fri Jan 27 07:55:51 EST 2006


Okay, to kick myself,... and to document where I am at for posterity...

The PMRoot structures are dynamically allocated, that is, all the 
structures used to represent FCMs and PMs.  They are dynamically allocated 
out of pageable memory, and hence may themselves be paged.  The PM for the 
kernel itself gets memory from the pinned page allocator directly, so that 
is not a problem.

We may, in some random routine, in PMRoot, while holding a lock suffer a 
page fault that results in a request to the pinned page allocator.  So, 
even if a request to the pinned page allocator doesn't come directly from 
the PMRoot structures, it may indirectly come from them, and the lock may 
be held.  Hence, an allocate to the pinned page allocator can never call 
the PMRoot with an operation that might aquire a lock, since that request 
can result in a recursive holding of a lock.   So, what I said below 
doesn't work.

Arugably we should make the PM structures not pageable, but won't do that 
yet.  For now, my thought is to still have two interfaces, and a call from 
the PM side which returns an error will flush caches.  A call that isn't 
directly from the PM side, but may be indirectly from there, will kick 
paging, and will force it to push back pages more aggressively.
          -- Orran



Orran Y Krieger/Watson/IBM at IBMUS 
Sent by: k42-discussion-bounces at ozlabs.org
01/25/2006 12:18 PM

To
Marc Auslander/Watson/Contr/IBM at IBMUS, Bryan S Rosenburg/Watson/IBM at IBMUS, 
Maria Butrico/Watson/IBM at IBMUS
cc
k42-discussion at ozlabs.org
Subject
[K42-discussion] memory leaks, and what I am gonna do







Posting to the list, we have to start getting better at exposing our 
internal hacking.   

We have had various memory leaks that we started looking at because of 
some running out of memory problems.  We started lookign at this using the 
LeekProof support in K42, and we are fixing a bunch of bugs.  Pointer on 
wiki to how to use LeakProof: http://k42.ozlabs.org/Wiki/DebuggingK42 

Marc, for leaking page descriptors, had an interesting result.  If our 
experiments are correct, we are leaking page descriptors but not the 
equivaent number of page frames.  That is, on each run we see 20 page 
descriptors go away, but only a couple of pages of memory.    So, these 
are somehow page descriptors that are either not representing real frames, 
or are already pointing to existing page frames.  Thoughts Marc?  I am 
suspicious of fork logic, but thats just because it scares me :-) 

Before I work on plugging the above leaks, I think I am going to work a 
bit on the problem that is actually causing the current problem.  The 
problem is that we are running out of memory in the page allocator even 
though lots of memory is available in the cache in the PM structures. 
While most operations go through the cache, a few allocates go directly to 
the page allocator.  Examples are the allocation of the dispatcher 
structures, some pinned multi-page operations, and some operations in the 
networking stack in linux.  Not only do we have to keep some memory in the 
page allocator for these uses, but we also ahve to keep some contiguous 
memory available for multi-page pinned structures.   I think we also use 
the page allocator as a common infrastructure (behnd the small memory 
allocator) to what is available in applications, so some allocates come 
from there.  The allocates of these are very rare, but the system panics 
if we can't satisfy them.   

 In retrospect, I did something stupid when first doing this work.  The 
caching (per-processor/PM) using the same interfaces of the page allocator 
as other operations.  For locking hierarchy reasons, the page allocator 
can't call back to the PM tree to flush pages..., since a request may be 
comming from the PM structure.  I am first going to introduce a different 
set of interfaces (or at least  a flag) to say if a request is from the PM 
cache or not.  If its not, then for single page allocates, the page 
allocator will just do the request back to the PM.  In that case the page 
allocator is just being called for interface reasons, and we will get a 
performance boost out of using the local cache.  For multi-page allocates, 
I will, try to do an allocate, and if it fails (contiguous memory not 
available) release locks and make a call to the PM structuers to flush 
back all the cache.   For calls from the PM side, the page allocator will 
return error instead of asserting, and the PM will flush back caches from 
other processors... before trying again.   

Comments welcome. 
     -- Orran _______________________________________________
K42-discussion mailing list
K42-discussion at ozlabs.org
https://ozlabs.org/mailman/listinfo/k42-discussion

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://ozlabs.org/pipermail/k42-discussion/attachments/20060126/0563395e/attachment.htm 


More information about the K42-discussion mailing list