[PATCH] powerpc/pseries: Add pool idle time at LPAR boot

Nathan Lynch nathanl at linux.ibm.com
Sat Apr 6 07:03:18 AEDT 2024


Shrikanth Hegde <sshegde at linux.ibm.com> writes:
> On 4/5/24 6:19 PM, Nathan Lynch wrote:
>> Shrikanth Hegde <sshegde at linux.ibm.com> writes:
>
> Hi Nathan, Thanks for reviewing this.
>
>>> When there are no options specified for lparstat, it is expected to
>>> give reports since LPAR(Logical Partition) boot. App is an indicator
>>> for available processor pool in an Shared Processor LPAR(SPLPAR). App is
>>> derived using pool_idle_time which is obtained using H_PIC call.
>> 
>> If "App" is short for "Available Procesoor Pool" then it should be
>> written "APP" and the it should be introduced and defined more clearly
>> than this.
>> 
>
> Ok.I reworded it for v2. 
>
> yes APP is Available Processor Pool. 
>
>> 
>>> The interval based reports show correct App value while since boot
>>> report shows very high App values. This happens because in that case app
>>> is obtained by dividing pool idle time by LPAR uptime. Since pool idle
>>> time is reported by the PowerVM hypervisor since its boot, it need not
>>> align with LPAR boot. This leads to large App values.
>>>
>>> To fix that export boot pool idle time in lparcfg and powerpc-utils will
>>> use this info to derive App as below for since boot reports.
>>>
>>> App = (pool idle time - boot pool idle time) / (uptime * timebase)
>>>
>>> Results:: Observe app values.
>>> ====================== Shared LPAR ================================
>>> lparstat
>>> System Configuration
>>> type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
>>>
>>> reboot
>>> stress-ng --cpu=$(nproc) -t 600
>>> sleep 600
>>> So in this case app is expected to close to 37-6=31.
>>>
>>> ====== 6.9-rc1 and lparstat 1.3.10  =============
>>> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
>>> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
>>> 47.48  0.01  0.00    52.51     0.00  0.00 47.49 69099.72 541547    21
>>>
>>> === With this patch and powerpc-utils patch to do the above equation ===
>>> %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
>>> ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
>>> 47.48  0.01  0.00    52.51     5.73 47.75 47.49 31.21 541753    21
>>> =====================================================================
>>>
>>> Note: physc, purr/idle purr being inaccurate is being handled in a
>>> separate patch in powerpc-utils tree.
>>>
>>> Signed-off-by: Shrikanth Hegde <sshegde at linux.ibm.com>
>>> ---
>>> Note:
>>>
>>> This patch needs to merged first in the kernel for the powerpc-utils
>>> patches to work. powerpc-utils patches will be posted to its mailing
>>> list and link would be found in the reply to this patch if available.
>>>
>>> arch/powerpc/platforms/pseries/lparcfg.c | 7 +++++++
>>>  1 file changed, 7 insertions(+)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/lparcfg.c b/arch/powerpc/platforms/pseries/lparcfg.c
>>> index f73c4d1c26af..8df4e7c529d7 100644
>>> --- a/arch/powerpc/platforms/pseries/lparcfg.c
>>> +++ b/arch/powerpc/platforms/pseries/lparcfg.c
>>> @@ -184,6 +184,8 @@ static unsigned h_pic(unsigned long *pool_idle_time,
>>>  	return rc;
>>>  }
>>>
>>> +unsigned long boot_pool_idle_time;
>> 
>> Should be static, and u64. Better to use explicitly sized types for data
>> at the kernel-hypervisor boundary.
>
> Current usage of h_pic doesn't follow this either. Are you suggesting we change that 
> as well?

Yes pretty much. h_pic() as currently written and used has some
problems:

  static unsigned h_pic(unsigned long *pool_idle_time,
                        unsigned long *num_procs)
  {
          unsigned long rc;
          unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
  
          rc = plpar_hcall(H_PIC, retbuf);
  
          *pool_idle_time = retbuf[0];
          *num_procs = retbuf[1];
  
          return rc;
  }

* Coerces the return value of plpar_hcall() to unsigned -- hcall
  errors are negative.
* Assigns *pool_idle_time and *num_procs using uninitialized
  data when H_PIC is unsuccessful.
* Assigns the outparams unconditionally; would be nicer if it allowed
  callers to pass NULL so they don't have to provide dummy inputs that
  aren't even used, as in your change.
* Should follow Linux -errno return value convention in the absence
  of a need for the specific hcall status in its callers.

>>> @@ -801,6 +805,9 @@ static int __init lparcfg_init(void)
>>>  		printk(KERN_ERR "Failed to create powerpc/lparcfg\n");
>>>  		return -EIO;
>>>  	}
>>> +
>>> +	h_pic(&boot_pool_idle_time, &num_procs);
>> 
>> h_pic() can fail, leaving the out parameters uninitialized.
>
> Naveen pointed to me this a while ago, but I forgot. 
>
> Currently h_pic return value is not checked at all, either at boor or at runtime. 
> When it fails, should we re-try or just print a kernel debug? What would be expected 
> behavior? because if it fails, it would anyway result in wrong values of app even 
> if the variables are initialized to 0.

There's nothing in the spec for H_PIC that suggests retrying on failure.
I'm not really in favor of printing a message about it either; that
practice tends to result in log noise in non-PowerVM guests.

When H_PIC doesn't work the corresponding lines should not be emitted in
/proc/powerpc/lparcfg IMO.


More information about the Linuxppc-dev mailing list