[PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save

Nicholas Piggin npiggin at gmail.com
Tue Nov 21 18:56:10 AEDT 2023


On Tue Nov 21, 2023 at 11:23 AM AEST, Timothy Pearson wrote:
>
>
> ----- Original Message -----
> > From: "Michael Ellerman" <mpe at ellerman.id.au>
> > To: "Timothy Pearson" <tpearson at raptorengineering.com>
> > Cc: "Jens Axboe" <axboe at kernel.dk>, "regressions" <regressions at lists.linux.dev>, "npiggin" <npiggin at gmail.com>,
> > "christophe leroy" <christophe.leroy at csgroup.eu>, "linuxppc-dev" <linuxppc-dev at lists.ozlabs.org>
> > Sent: Monday, November 20, 2023 5:39:52 PM
> > Subject: Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register  save
>
> > Timothy Pearson <tpearson at raptorengineering.com> writes:
> >> ----- Original Message -----
> >>> From: "Michael Ellerman" <mpe at ellerman.id.au>
> > ...
> >>> 
> >>> But we now have a new path, because io-uring can call copy_process() via
> >>> create_io_thread() from the signal handling path. That's OK if the signal is
> >>> handled as we return from a syscall, but it's not OK if the signal is handled
> >>> due to some other interrupt.
> >>> 
> >>> Which is:
> >>> 
> >>> interrupt_return_srr_user()
> >>>  interrupt_exit_user_prepare()
> >>>    interrupt_exit_user_prepare_main()
> >>>      do_notify_resume()
> >>>        get_signal()
> >>>          task_work_run()
> >>>            create_worker_cb()
> >>>              create_io_worker()
> >>>                copy_process()
> >>>                  dup_task_struct()
> >>>                    arch_dup_task_struct()
> >>>                      flush_all_to_thread()
> >>>                        save_all()
> >>>                          if (tsk->thread.regs->msr & MSR_FP)
> >>>                            save_fpu()
> >>>                            # fr0 is clobbered and potentially live in userspace
> >>> 
> >>> 
> >>> So tldr I think the corruption is only an issue since io-uring started doing
> >>> the clone via signal, which I think matches the observed timeline of this bug
> >>> appearing.
> >>
> >> I agree the corruption really only started showing up in earnest on
> >> io_uring clone-via-signal, as this was confirmed several times in the
> >> course of debugging.
> > 
> > Thanks.
> > 
> >> Note as well that I may very well have a wrong call order in the
> >> commit message, since I was relying on a couple of WARN_ON() macros I
> >> inserted to check for a similar (but not identical) condition and
> >> didn't spend much time getting new traces after identifying the root
> >> cause.
> > 
> > Yep no worries. I'll reword it to incorporate the full path from my mail.
> > 
> >> I went back and grabbed some real world system-wide stack traces, since I now
> >> know what to trigger on.  A typical example is:
> >>
> >> interrupt_return_srr_user()
> >>  interrupt_exit_user_prepare()
> >>   interrupt_exit_user_prepare_main()
> >>    schedule()
> >>     __schedule()
> >>      __switch_to()
> >>       giveup_all()
> >>        # tsk->thread.regs->msr MSR_FP is still set here
> >>        __giveup_fpu()
> >>         save_fpu()
> >>         # fr0 is clobbered and potentially live in userspace
> > 
> > fr0 is not live there.
> > 
> > __giveup_fpu() does roughly:
> > 
> >	msr = tsk->thread.regs->msr;
> >	msr &= ~(MSR_FP|MSR_FE0|MSR_FE1);
> >        msr &= ~MSR_VSX;
> >	tsk->thread.regs = msr;
> > 
> > ie. it clears the FP etc. bits from the task's MSR. That means the FP
> > state will be reloaded from the thread struct before the task is run again.
> > 
> > Also on that path we're switching to another task, so we'll be reloading
> > the other task's FP state before returning to userspace.
> > 
> > So I don't see any bug there.
>
> Yeah, you're right.  I was trying to get traces while doing something else, and didn't think that all the way through, sorry. :)  It's not going to be super easy to get a real trace (I was triggering the WARN_ON() from of fr0 getting set to to FPSCR), so let's just assume it's mainly the path you manually found above and update the commit message accordingly.
>
> > There's only two places that call save_fpu() and skip the giveup logic,
> > which is save_all() and kvmppc_save_user_regs().
>
> Now that's interesting as well, since it might explain some issues I've seen for years on a specific QEMU workload.  Once this is backported to stable I'll need to get the kernels updated on those boxes and see if the issues disappear...

KVM issue is actually slightly different. You need this (lightly
verified to solve issue so far).

---

>From c12fbed0e12207058282a2411da59b43b1f96c49 Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin at gmail.com>
Date: Tue, 21 Nov 2023 17:03:55 +1000
Subject: [PATCH] KVM: PPC: Book3S HV: Fix KVM_RUN ioctl clobbering FP/VEC
 registers

Before running a guest, the host process's FP/VEC registers are saved
away like a context switch or a kernel use of those regs. The guest
FP/VEC registers can then be loaded as needed. The host process
registers would be restored lazily when it uses FP/VEC again.

KVM HV did not do this correctly. The registers are saved in the
thread struct, but the FP/VEC/VSX bits remain enabled in the user
MSR, leading the kernel to think they are still valid. Even after
they are clobbered by guest registers. This leads to the host
process getting guest FP/VEC register values.

KVM must be invoked by ioctl in this path, and almost certainly that
means a C call to a wrapper function, but that still leaves real
possibility of corruption in for non-volatile registers to cause
problems for QEMU process.
---
 arch/powerpc/kernel/process.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 392404688cec..9452a54d356c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1198,11 +1198,11 @@ void kvmppc_save_user_regs(void)
 
 	usermsr = current->thread.regs->msr;
 
+	/* Caller has enabled FP/VEC/VSX/TM in MSR */
 	if (usermsr & MSR_FP)
-		save_fpu(current);
-
+		__giveup_fpu(current);
 	if (usermsr & MSR_VEC)
-		save_altivec(current);
+		__giveup_altivec(current);
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 	if (usermsr & MSR_TM) {
-- 
2.42.0



More information about the Linuxppc-dev mailing list