[PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()

Christophe Leroy christophe.leroy at csgroup.eu
Fri Jan 26 02:55:09 AEDT 2024


Hi Nathan,

Le 06/03/2023 à 22:33, Nathan Lynch via B4 Relay a écrit :
> From: Nathan Lynch <nathanl at linux.ibm.com>
> 
> The kernel can handle retrying RTAS function calls in response to
> -2/990x in the sys_rtas() handler instead of relaying the intermediate
> status to user space.

 From this series with still have patches 5, 7 and 8 awaiting in 
patchwork, see 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?submitter=85747 
and patch 8 doesn't apply anymore.

Are those 3 patches still relevant or should they be discarded ?

Thanks
Christophe


> 
> Justifications:
> 
> * Currently it's nondeterministic and quite variable in practice
>    whether a retry status is returned for any given invocation of
>    sys_rtas(). Therefore user space code cannot be expecting a retry
>    result without already being broken.
> 
> * This tends to significantly reduce the total number of system calls
>    issued by programs such as drmgr which make use of sys_rtas(),
>    improving the experience of tracing and debugging such
>    programs. This is the main motivation for me: I think this change
>    will make it easier for us to characterize current sys_rtas() use
>    cases as we move them to other interfaces over time.
> 
> * It reduces the number of opportunities for user space to leave
>    complex operations, such as those associated with DLPAR, incomplete
>    and diffcult to recover.
> 
> * We can expect performance improvements for existing sys_rtas()
>    users, not only because of overall reduction in the number of system
>    calls issued, but also due to the better handling of -2/990x in the
>    kernel. For example, librtas still sleeps for 1ms on -2, which is
>    completely unnecessary.
> 
> Performance differences for PHB add and remove on a small P10 PowerVM
> partition are included below. For add, elapsed time is slightly
> reduced. For remove, there are more significant improvements: the
> number of context switches is reduced by an order of magnitude, and
> elapsed time is reduced by over half.
> 
> (- before, + after):
> 
>    Performance counter stats for 'drmgr -c phb -a -s PHB 23' (5 runs):
> 
> -          1,847.58 msec task-clock                       #    0.135 CPUs utilized               ( +- 14.15% )
> -            10,867      cs                               #    9.800 K/sec                       ( +- 14.14% )
> +          1,901.15 msec task-clock                       #    0.148 CPUs utilized               ( +- 14.13% )
> +            10,451      cs                               #    9.158 K/sec                       ( +- 14.14% )
> 
> -         13.656557 +- 0.000124 seconds time elapsed  ( +-  0.00% )
> +          12.88080 +- 0.00404 seconds time elapsed  ( +-  0.03% )
> 
>    Performance counter stats for 'drmgr -c phb -r -s PHB 23' (5 runs):
> 
> -          1,473.75 msec task-clock                       #    0.092 CPUs utilized               ( +- 14.15% )
> -             2,652      cs                               #    3.000 K/sec                       ( +- 14.16% )
> +          1,444.55 msec task-clock                       #    0.221 CPUs utilized               ( +- 14.14% )
> +               104      cs                               #  119.957 /sec                        ( +- 14.63% )
> 
> -          15.99718 +- 0.00801 seconds time elapsed  ( +-  0.05% )
> +           6.54256 +- 0.00830 seconds time elapsed  ( +-  0.13% )
> 
> Move the existing rtas_lock-guarded critical section in sys_rtas()
> into a conventional rtas_busy_delay()-based loop, returning to user
> space only when a final success or failure result is available.
> 
> Signed-off-by: Nathan Lynch <nathanl at linux.ibm.com>
> ---
>   arch/powerpc/kernel/rtas.c | 28 ++++++++++++++++------------
>   1 file changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 47a2aa43d7d4..c330a22ccc70 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -1798,7 +1798,6 @@ static bool block_rtas_call(int token, int nargs,
>   /* We assume to be passed big endian arguments */
>   SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>   {
> -	struct pin_cookie cookie;
>   	struct rtas_args args;
>   	unsigned long flags;
>   	char *buff_copy, *errbuf = NULL;
> @@ -1866,20 +1865,25 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>   
>   	buff_copy = get_errorlog_buffer();
>   
> -	raw_spin_lock_irqsave(&rtas_lock, flags);
> -	cookie = lockdep_pin_lock(&rtas_lock);
> +	do {
> +		struct pin_cookie cookie;
>   
> -	rtas_args = args;
> -	do_enter_rtas(&rtas_args);
> -	args = rtas_args;
> +		raw_spin_lock_irqsave(&rtas_lock, flags);
> +		cookie = lockdep_pin_lock(&rtas_lock);
>   
> -	/* A -1 return code indicates that the last command couldn't
> -	   be completed due to a hardware error. */
> -	if (be32_to_cpu(args.rets[0]) == -1)
> -		errbuf = __fetch_rtas_last_error(buff_copy);
> +		rtas_args = args;
> +		do_enter_rtas(&rtas_args);
> +		args = rtas_args;
>   
> -	lockdep_unpin_lock(&rtas_lock, cookie);
> -	raw_spin_unlock_irqrestore(&rtas_lock, flags);
> +		/*
> +		 * Handle error record retrieval before releasing the lock.
> +		 */
> +		if (be32_to_cpu(args.rets[0]) == -1)
> +			errbuf = __fetch_rtas_last_error(buff_copy);
> +
> +		lockdep_unpin_lock(&rtas_lock, cookie);
> +		raw_spin_unlock_irqrestore(&rtas_lock, flags);
> +	} while (rtas_busy_delay(be32_to_cpu(args.rets[0])));
>   
>   	if (buff_copy) {
>   		if (errbuf)
> 


More information about the Linuxppc-dev mailing list