Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Mon, 7 Feb 2022 11:39:54 -0500 (EST)
From: Mathieu Desnoyers <>
To: Peter Zijlstra <>
Cc: linux-kernel <>, 
	Thomas Gleixner <>, paulmck <>, 
	Boqun Feng <>, "H. Peter Anvin" <>, 
	Paul Turner <>, linux-api <>, 
	Christian Brauner <>, 
	Florian Weimer <>, 
	David Laight <>, carlos <>, 
	Peter Oskolkov <>, 
	libc-coord <>
Subject: Re: [RFC PATCH 3/3] rseq: extend struct rseq with numa node id

----- On Feb 6, 2022, at 4:52 PM, Peter Zijlstra wrote:

> On Thu, Feb 03, 2022 at 02:38:53PM -0500, Mathieu Desnoyers wrote:
>> +static int rseq_reset_rseq_cpu_node_id(struct task_struct *t)
>>  {
>> -	u32 cpu_id_start = 0, cpu_id = RSEQ_CPU_ID_UNINITIALIZED;
>> +	u32 cpu_id_start = 0, cpu_id = RSEQ_CPU_ID_UNINITIALIZED, node_id = 0;
>>  	/*
>>  	 * Reset cpu_id_start to its initial state (0).
>> @@ -124,6 +126,11 @@ static int rseq_reset_rseq_cpu_id(struct task_struct *t)
>>  	 */
>>  	if (put_user(cpu_id, &t->rseq->cpu_id))
>>  		return -EFAULT;
>> +	/*
>> +	 * Reset node_id to its initial state (0).
>> +	 */
>> +	if (put_user(node_id, &t->rseq->node_id))
>> +		return -EFAULT;
> Why 0 vs -1 ?

That's because we are adding this field as a replacement of 4 from the 12 bytes
of padding at the end of the original struct rseq uapi. I expect this padding to be
zero-initialized in glibc, but I don't think its initial value matters very much.

The initial value for cpu_id (-1) mattered  because that field was used to check whether
rseq was initialized. It should not matter as much for extended fields. Taking the
glibc userspace ABI into account, I would expect the following for node_id feature
discovery by applications:

- use __rseq_size (symbol exposed by glibc) to validate whether rseq is registered for
  the process.
- use getauxval(AT_RSEQ_FEATURE_SIZE) to know how many rseq fields are populated by the

So there are really only a few possible scenarios here:

- __rseq_size == 0: rseq registration unsuccessful
- else:
  - getauxval(AT_RSEQ_FEATURE_SIZE) == 0:
    - Kernel only implements the features of original rseq (last populated field is "flags",
      offsetofend(, flags) == 20).
  - else if getauxval(AT_RSEQ_FEATURE_SIZE) <= 32:
    - all exposed kernel features can be used, because the original rseq len was 32 bytes,
      so glibc always registers at least a 32-byte area.
  - else
    - only min(__rseq_size, getauxval(AT_RSEQ_FEATURE_SIZE)) features can be used.
      glibc is required to allocate a memory area large enough to hold all features,
      except when it allocates the original uapi 32 bytes.

So until we reach the 32 bytes limit of the original rseq structure, applications can directly
use getauxval() to use the additional features before glibc is made aware of them. From that
point on, glibc needs to allocate a larger memory area to hold those features.

I suspect that future glibc versions might want to expose a new "__rseq_feature_size"
symbol which contains the result of

    min(__rseq_size, getauxval(AT_RSEQ_FEATURE_SIZE) ? : offsetofend(, flags))

so applications don't have to do this computation on their own.

Considering that applications will likely check for rseq registration and feature availability
by checking offsetofend(, field) <= __rseq_feature_size, then I don't think the initialization
value matters much.



Mathieu Desnoyers
EfficiOS Inc.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.