Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <67fce096-cedd-4a52-9f7a-d2882c767bff@linux.ibm.com>
Date: Mon, 14 Oct 2024 14:22:59 +0200
From: Stefan Liebler <stli@...ux.ibm.com>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: [PATCH] s390x: Don't allow br r0 in CRTJMP

On 11.10.24 14:30, Rich Felker wrote:
> On Fri, Oct 11, 2024 at 09:40:06AM +0200, Stefan Liebler wrote:
>> On 10.10.24 23:05, Rich Felker wrote:
>>> On Thu, Oct 10, 2024 at 03:02:44PM +0200, Stefan Liebler wrote:
>>>> When building musl with gcc 14, I've recognized that gcc has chosen
>>>> r0 for the branch-instruction. Therefore we don't jump, but keep
>>>> looping in ldso/dynlink.c:__dls3():
>>>> CRTJMP((void *)aux[AT_ENTRY], argv-1);
>>>> for(;;);
>>>>
>>>> This patch adjusts the inline assembly constraints and marks "pc" as
>>>> address, which disallows usage of r0.
>>>> ---
>>>>  arch/s390x/reloc.h | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/s390x/reloc.h b/arch/s390x/reloc.h
>>>> index 6e5c1fb8..38de9d9b 100644
>>>> --- a/arch/s390x/reloc.h
>>>> +++ b/arch/s390x/reloc.h
>>>> @@ -10,4 +10,4 @@
>>>>  #define REL_TPOFF       R_390_TLS_TPOFF
>>>>  
>>>>  #define CRTJMP(pc,sp) __asm__ __volatile__( \
>>>> -	"lgr %%r15,%1; br %0" : : "r"(pc), "r"(sp) : "memory" )
>>>> +	"lgr %%r15,%1; br %0" : : "a"(pc), "r"(sp) : "memory" )
>>>> -- 
>>>> 2.46.0
>>>
>>> What is especially problematic about r0 here? Does the encoding for br
>>> just use the bits that would be for r0 to encode some other jump form?
>>> Or is r0 cursed in some other way?
>>>
>>> (Patch is probably fine, but I would like to better understand the
>>> motivation.)
>>>
>>> Rich
>>>
>> Sure.
>>
>> According to the Principles of Operations (in case you don't have it,
>> please find the links here:
>> https://linux.mainframe.blog/zarchitecture-principles-of-operation/),
>> "br r0", which is branch on condition "bcr M1,R2" (RR instruction
>> format) with those values "bcr 15,0":
>> When all four mask bits are zeros or when the R2
>> field in the RR format contains zero, the branch
>> instruction is equivalent to a no-operation. When
>> all four mask bits are ones, that is, the mask
>> value is 15, the branch is unconditional unless
>> the R2 field in the RR format is zero.
>>
>> Furthermore this special "bcr 15,0" is used as serialization
>> or checkpoint-synchronization function.
>>
>> In each case, if r0 is used as branch target for "br", it does not jump
>> to the address in r0.
>>
>> I hope this helps?
> 
> Thanks! Looking at the GCC machine contraints docs for "a" as "address
> register", are there other contexts in which r0 is unsafe as an
> address where we might need to have the asm using "a" instead of "r"?
> I see we're using "Q" in atomics so that should probably be safe.
> __get_tp uses "=r" but that's probably okay?
> 
> Rich

Whenever a register is needed for address computation, r0 should not be
used. When the address is computed e.g. with D(X,B), but you don't need
both base- and index-register, then you can skip using the
index-register. The assembler then uses "0" (which would mean r0) as X
and the CPU does not use the content of r0 for computing the address.

"Q" (Memory reference without index register and with short
displacement) is fine for the "cs" instruction.
For "csg" instruction, "S" (Memory reference without index register but
with long displacement) would be the best fit. Using Q instead of S is
not a bug, but perhaps there is room for improvement if long instead of
short displacement can be used.

"r" in __get_tp is fine as "ear" instruction only moves the register
values from register to register.

Bye,
Stefan

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.