[PATCH] powerpc/64s/radix: Don't warn on copros in radix__tlb_flush()

Sachin Sant sachinp at linux.ibm.com
Wed Oct 18 03:59:52 AEDT 2023



> On 17-Oct-2023, at 5:45 PM, Michael Ellerman <mpe at ellerman.id.au> wrote:
> 
> Sachin reported a warning when running the inject-ra-err selftest:
> 
>  # selftests: powerpc/mce: inject-ra-err
>  Disabling lock debugging due to kernel taint
>  MCE: CPU19: machine check (Severe)  Real address Load/Store (foreign/control memory) [Not recovered]
>  MCE: CPU19: PID: 5254 Comm: inject-ra-err NIP: [0000000010000e48]
>  MCE: CPU19: Initiator CPU
>  MCE: CPU19: Unknown
>  ------------[ cut here ]------------
>  WARNING: CPU: 19 PID: 5254 at arch/powerpc/mm/book3s64/radix_tlb.c:1221 radix__tlb_flush+0x160/0x180
>  CPU: 19 PID: 5254 Comm: inject-ra-err Kdump: loaded Tainted: G   M        E      6.6.0-rc3-00055-g9ed22ae6be81 #4
>  Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries
>  ...
>  NIP radix__tlb_flush+0x160/0x180
>  LR  radix__tlb_flush+0x104/0x180
>  Call Trace:
>    radix__tlb_flush+0xf4/0x180 (unreliable)
>    tlb_finish_mmu+0x15c/0x1e0
>    exit_mmap+0x1a0/0x510
>    __mmput+0x60/0x1e0
>    exit_mm+0xdc/0x170
>    do_exit+0x2bc/0x5a0
>    do_group_exit+0x4c/0xc0
>    sys_exit_group+0x28/0x30
>    system_call_exception+0x138/0x330
>    system_call_vectored_common+0x15c/0x2ec
> 
> And bisected it to commit e43c0a0c3c28 ("powerpc/64s/radix: combine
> final TLB flush and lazy tlb mm shootdown IPIs"), which added a warning
> in radix__tlb_flush() if mm->context.copros is still elevated.
> 
> However it's possible for the copros count to be elevated if a process
> exits without first closing file descriptors that are associated with a
> copro, eg. VAS.
> 
> If the process exits with a VAS file still open, the release callback
> is queued up for exit_task_work() via:
>  exit_files()
>    put_files_struct()
>      close_files()
>        filp_close()
>          fput()
> 
> And called via:
>  exit_task_work()
>    ____fput()
>      __fput()
>        file->f_op->release(inode, file)
>          coproc_release()
>            vas_user_win_ops->close_win()
>              vas_deallocate_window()
>                mm_context_remove_vas_window()
>                  mm_context_remove_copro()
> 
> But that is after exit_mm() has been called from do_exit() and triggered
> the warning.
> 
> Fix it by dropping the warning, and always calling __flush_all_mm().
> 
> In the normal case of no copros, that will result in a call to
> _tlbiel_pid(mm->context.id, RIC_FLUSH_ALL) just as the current code
> does.
> 
> If the copros count is elevated then it will cause a global flush, which
> should flush translations from any copros. Note that the process table
> entry was cleared in arch_exit_mmap(), so copros should not be able to
> fetch any new translations.
> 
> Fixes: e43c0a0c3c28 ("powerpc/64s/radix: combine final TLB flush and lazy tlb mm shootdown IPIs")
> Reported-by: Sachin Sant <sachinp at linux.ibm.com>
> Closes: https://lore.kernel.org/all/A8E52547-4BF1-47CE-8AEA-BC5A9D7E3567@linux.ibm.com/
> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
> Signed-off-by: Michael Ellerman <mpe at ellerman.id.au>
> ---

Thanks for the fix. This fixes the reported problem.

Tested-by: Sachin Sant <sachinp at linux.ibm.com>

- Sachin


More information about the Linuxppc-dev mailing list