Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 17 Sep 2022 13:51:44 +0200
From: Mathieu Desnoyers <>
To: Florian Weimer <>, Chris Kennelly <>
Cc:, "" <>,
 libc-alpha <>,
Subject: Re: Re: RSEQ symbols: __rseq_size, __rseq_flags vs

On 2022-09-16 23:32, Florian Weimer wrote:
> * Chris Kennelly:
>>> If the kernel does not currently overwrite the padding, glibc can do
>>> its own per-thread initialization there to support its malloc
>>> implementation (because the padding is undefined today from an
>>> application perspective).  That is, we would initialize these
>>> invisible vCPU IDs the same way we assign arenas today.  That would
>>> cover this specific malloc use case only, of course.
>> If a user program updates to a new kernel before glibc does, would it be
>> able to easily take advantage of it?
> No, as far as I understand it, there is presently no signaling from
> kernel to applications that bypasses the rseq area registration.  So
> the only thing you could do is to unregister and re-register with a
> compatible value.  And that is of course quite undefined and assumes
> that you can do this early enough during the life-time of each thread.
> But if we have the extension handshake, I'll expect us to backport it
> quite widely, after some time to verify that it works with CRIU etc.

I don't think this is what Chris is asking here.

I think the requirement here is to make sure that the extensibility 
scheme we come up with will allow to extend struct rseq simply by 
upgrading the kernel, without any need to upgrade glibc. (that's indeed 
a requirement of mine). So a new application and a new kernel can use a 
newly available extended field, even with an old glibc.

Let me bring an example of what I think would be a *bad* way to do 
things, just to show how we can shoot ourselves in the foot if we don't 
consider evolution of this ABI carefully.

Let's assume we expose a "rseq_feature_size" integer through 
getauxval(). This allows the kernel to tell glibc about the memory size 
required to hold all the rseq features. This is information that we 
_need_ to expose from the kernel to glibc. So if glibc decides to expose 
each new features through __rseq_flags bits (e.g. one bit per feature), 
then we run into a situation where for every new feature exposed by the 
kernel, glibc needs to know the mapping from feature size to feature bit 
before it can expose them to the rest of user-space, which goes against 
the requirement that we should be able to extend rseq features by simply 
upgrading the kernel, without needing to upgrade glibc as well every time.

So considering that the kernel needs to let glibc know how much memory 
to allocate for struct rseq, a getauxval() "rseq_feature_size" is 
needed. One approach we could consider to allow extending rseq features 
without upgrading glibc would be to expose an additional 
"rseq_feature_flags" getauxval(), which could then be used by glibc to 
populate its __rseq_flags symbol without prior knowledge of the 
feature-set. This could accommodate 32 features before we need to expose 
an additional __rseq_flags2 symbol.

Exposing a feature flag from the kernel through getauxval() would have 
the advantage to allow the kernel to "disable" some features in the 
future, e.g. if we want to deprecate a field. This comes with its own 
complexity though, as user-space could then not rely that when a feature 
is present, all prior feature fields are necessarily present, which 
therefore makes the testing matrix more complex. I personally don't see 
a need to deprecate rseq fields, but it might just be a lack of 
imagination on my part.

If we want to keep the kernel ABI as simple as we can, then we just 
expose the rseq feature size (and required alignment), and don't expose 
any rseq feature flags. This in turn means that glibc would have to 
somehow expose the rseq feature size in its ABI. If glibc decides 
against exposing this rseq feature size symbol, then it would be up to 
the application to combine information about __rseq_size and 
getauxval(rseq feature size) to figure out which fields are actually 
populated. It would "work", but chances are that some users will get it 
wrong. It seems simpler for a user to simply do:

if (__rseq_feature_size >= offsetofend(struct rseq, vm_vcpu_id))

to validate whether a given field is indeed populated.

The rseq feature size approach would scale to very large feature 
numbers. It would *not* allow deprecation of fields after they are 
published, but I see this as a gain in simplicity for users of the ABI, 
even though we lose a knob as kernel developers.

I think it's important that we consider both the kernel and libc ABIs if 
we want to make sure that we can extend the feature-set without having a 
mandatory glibc upgrade in the way every time we add a rseq feature.

Thoughts ?



Mathieu Desnoyers
EfficiOS Inc.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.