[K42-discussion] memory leaks, and what I am gonna do
Orran Y Krieger
okrieg at us.ibm.com
Fri Jan 27 07:55:51 EST 2006
Okay, to kick myself,... and to document where I am at for posterity...
The PMRoot structures are dynamically allocated, that is, all the
structures used to represent FCMs and PMs. They are dynamically allocated
out of pageable memory, and hence may themselves be paged. The PM for the
kernel itself gets memory from the pinned page allocator directly, so that
is not a problem.
We may, in some random routine, in PMRoot, while holding a lock suffer a
page fault that results in a request to the pinned page allocator. So,
even if a request to the pinned page allocator doesn't come directly from
the PMRoot structures, it may indirectly come from them, and the lock may
be held. Hence, an allocate to the pinned page allocator can never call
the PMRoot with an operation that might aquire a lock, since that request
can result in a recursive holding of a lock. So, what I said below
doesn't work.
Arugably we should make the PM structures not pageable, but won't do that
yet. For now, my thought is to still have two interfaces, and a call from
the PM side which returns an error will flush caches. A call that isn't
directly from the PM side, but may be indirectly from there, will kick
paging, and will force it to push back pages more aggressively.
-- Orran
Orran Y Krieger/Watson/IBM at IBMUS
Sent by: k42-discussion-bounces at ozlabs.org
01/25/2006 12:18 PM
To
Marc Auslander/Watson/Contr/IBM at IBMUS, Bryan S Rosenburg/Watson/IBM at IBMUS,
Maria Butrico/Watson/IBM at IBMUS
cc
k42-discussion at ozlabs.org
Subject
[K42-discussion] memory leaks, and what I am gonna do
Posting to the list, we have to start getting better at exposing our
internal hacking.
We have had various memory leaks that we started looking at because of
some running out of memory problems. We started lookign at this using the
LeekProof support in K42, and we are fixing a bunch of bugs. Pointer on
wiki to how to use LeakProof: http://k42.ozlabs.org/Wiki/DebuggingK42
Marc, for leaking page descriptors, had an interesting result. If our
experiments are correct, we are leaking page descriptors but not the
equivaent number of page frames. That is, on each run we see 20 page
descriptors go away, but only a couple of pages of memory. So, these
are somehow page descriptors that are either not representing real frames,
or are already pointing to existing page frames. Thoughts Marc? I am
suspicious of fork logic, but thats just because it scares me :-)
Before I work on plugging the above leaks, I think I am going to work a
bit on the problem that is actually causing the current problem. The
problem is that we are running out of memory in the page allocator even
though lots of memory is available in the cache in the PM structures.
While most operations go through the cache, a few allocates go directly to
the page allocator. Examples are the allocation of the dispatcher
structures, some pinned multi-page operations, and some operations in the
networking stack in linux. Not only do we have to keep some memory in the
page allocator for these uses, but we also ahve to keep some contiguous
memory available for multi-page pinned structures. I think we also use
the page allocator as a common infrastructure (behnd the small memory
allocator) to what is available in applications, so some allocates come
from there. The allocates of these are very rare, but the system panics
if we can't satisfy them.
In retrospect, I did something stupid when first doing this work. The
caching (per-processor/PM) using the same interfaces of the page allocator
as other operations. For locking hierarchy reasons, the page allocator
can't call back to the PM tree to flush pages..., since a request may be
comming from the PM structure. I am first going to introduce a different
set of interfaces (or at least a flag) to say if a request is from the PM
cache or not. If its not, then for single page allocates, the page
allocator will just do the request back to the PM. In that case the page
allocator is just being called for interface reasons, and we will get a
performance boost out of using the local cache. For multi-page allocates,
I will, try to do an allocate, and if it fails (contiguous memory not
available) release locks and make a call to the PM structuers to flush
back all the cache. For calls from the PM side, the page allocator will
return error instead of asserting, and the PM will flush back caches from
other processors... before trying again.
Comments welcome.
-- Orran _______________________________________________
K42-discussion mailing list
K42-discussion at ozlabs.org
https://ozlabs.org/mailman/listinfo/k42-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://ozlabs.org/pipermail/k42-discussion/attachments/20060126/0563395e/attachment.htm
More information about the K42-discussion
mailing list