[PATCH v5] powerpc/pseries/vas: Use usleep_range() to support HCALL delay

Nathan Lynch nathanl at linux.ibm.com
Fri Jan 12 04:27:11 AEDT 2024


Haren Myneni <haren at linux.ibm.com> writes:
> VAS allocate, modify and deallocate HCALLs returns
> H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
> delay and expects OS to reissue HCALL after that delay. But using
> msleep() will often sleep at least 20 msecs even though the
> hypervisor suggests OS reissue these HCALLs after 1 or 10msecs.
>
> The open and close VAS window functions hold mutex and then issue
> these HCALLs. So these operations can take longer than the
> necessary when multiple threads issue open or close window APIs
> simultaneously, especially might affect the performance in the
> case of repeat open/close APIs for each compression request.
> On the large machine configuration which allows more simultaneous
> open/close windows (Ex: 240 cores provides 4800 VAS credits), the
> user can observe hung task traces in dmesg due to mutex contention
> around open/close HCAlls.

Is this because the workload queues enough tasks on the mutex to trigger
the hung task watchdog? With a threshold of 120 seconds, something on
the order of ~6000 tasks each taking 20ms or more to traverse this
critical section would cause the problem I think you're describing.

Presumably this change improves the situation, but the commit message
isn't explicit. Have you measured the "throughput" of window open/close
activity before and after? Anything that quantifies the improvement
would be welcome.

> diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
> index 71d52a670d95..79ffe8868c04 100644
> --- a/arch/powerpc/platforms/pseries/vas.c
> +++ b/arch/powerpc/platforms/pseries/vas.c
> @@ -38,7 +38,27 @@ static long hcall_return_busy_check(long rc)
>  {
>  	/* Check if we are stalled for some time */
>  	if (H_IS_LONG_BUSY(rc)) {
> -		msleep(get_longbusy_msecs(rc));
> +		unsigned int ms;
> +		/*
> +		 * Allocate, Modify and Deallocate HCALLs returns
> +		 * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC
> +		 * for the long delay. So the sleep time should always
> +		 * be either 1 or 10msecs, but in case if the HCALL
> +		 * returns the long delay > 10 msecs, clamp the sleep
> +		 * time to 10msecs.
> +		 */
> +		ms = clamp(get_longbusy_msecs(rc), 1, 10);
> +
> +		/*
> +		 * msleep() will often sleep at least 20 msecs even
> +		 * though the hypervisor suggests that the OS reissue
> +		 * HCALLs after 1 or 10msecs. Also the delay hint from
> +		 * the HCALL is just a suggestion. So OK to pause for
> +		 * less time than the hinted delay. Use usleep_range()
> +		 * to ensure we don't sleep much longer than actually
> +		 * needed.
> +		 */
> +		usleep_range(ms * 100, ms * USEC_PER_MSEC);

                usleep_range(ms * (USEC_PER_MSEC / 10), ms * USEC_PER_MSEC);

is probably what reviewers want to see when they ask you to use
USEC_PER_MSEC. I.e. both arguments to usleep_range() should be expressed
in terms of USEC_PER_MSEC.


More information about the Linuxppc-dev mailing list