<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
<title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
Hi Marcelo,<br>
<br>
Marcelo Tosatti wrote:<br>
<blockquote type="cite" cite="mid20050629155445.GA3560@logos.cnet">
<pre wrap="">Hi Guillaume,
On Wed, Jun 29, 2005 at 11:32:19AM -0400, Guillaume Autran wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Benjamin Herrenschmidt wrote:
</pre>
<blockquote type="cite">
<pre wrap="">On Tue, 2005-06-28 at 09:42 -0400, Guillaume Autran wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Hi,
I happen to notice a race condition in the mmu_context code for the 8xx
with very few context (16 MMU contexts) and kernel preemption enable. It
is hard to reproduce has it shows only when many processes are
created/destroy and the system is doing a lot of IRQ processing.
In short, one process is trying to steal a context that is in the
process of being freed (mm->context == NO_CONTEXT) but not completely
freed (nr_free_contexts == 0).
The steal_context() function does not do anything and the process stays
in the loop forever.
Anyway, I got a patch that fixes this part. Does not seem to affect
scheduling latency at all.
Comments are appreciated.
</pre>
</blockquote>
<pre wrap="">Your patch seems to do a hell lot more than fixing this race ... What
about just calling preempt_disable() in destroy_context() instead ?
</pre>
</blockquote>
<pre wrap="">I'm still a bit confused with "kernel preemption". One thing for sure is
that disabling kernel preemption does indeed fix my problem.
So, my question is, what if a task in the middle of being schedule gets
preempted by an IRQ handler, where will this task restart execution ?
Back at the beginning of schedule or where it left of ?
</pre>
</blockquote>
<pre wrap=""><!---->
Execution is resumed exactly where it has been interrupted.</pre>
</blockquote>
In that case, what happen when a higher priority task steal the context
of the lower priority task after get_mmu_context() but before
set_mmu_context() ?<br>
Then when the lower priority task resumes, its context may no longer be
valid...<br>
Do I get this right ?<br>
<br>
<blockquote type="cite" cite="mid20050629155445.GA3560@logos.cnet">
<blockquote type="cite">
<pre wrap="">The idea behind my patch was to get rid of that nr_free_contexts counter
that is (I thing) redundant with the context_map.
</pre>
</blockquote>
<pre wrap=""><!---->
Apparently its there to avoid the spinlock exactly on !FEW_CONTEXTS machines.
I suppose that what happens is that get_mmu_context() gets preempted after stealing
a context (so nr_free_contexts = 0), but before setting next_mmu_context to the
next entry
next_mmu_context = (ctx + 1) & LAST_CONTEXT;
So if the now running higher prio tasks calls switch_mm() (which is likely to happen)
it loops forever on atomic_dec_if_positive(&nr_free_contexts), while steal_context()
sees "mm->context == CONTEXT".
I think that you should try "preempt_disable()/preempt_enable" pair at entry and
exit of get_mmu_context() - I suppose around destroy_context() is not enough (you
can try that also).
spinlock ends up calling preempt_disable().
</pre>
</blockquote>
I'm going to do like this instead of my previous attempt:<br>
<br>
/* Setup new userspace context */<br>
preempt_disable();<br>
get_mmu_context(next);<br>
set_context(next->context, next->pgd);<br>
preempt_enable();<br>
<br>
To make sure we don't loose our context in between.<br>
<br>
<br>
<br>
Thanks.<br>
Guillaume.<br>
<br>
<pre class="moz-signature" cols="72">--
=======================================
Guillaume Autran
Senior Software Engineer
MRV Communications, Inc.
Tel: (978) 952-4932 office
E-mail: <a class="moz-txt-link-abbreviated" href="mailto:gautran@mrv.com">gautran@mrv.com</a>
======================================= </pre>
</body>
</html>