Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240723235911.GA10433@brightrain.aerifal.cx>
Date: Tue, 23 Jul 2024 19:59:11 -0400
From: Rich Felker <dalias@...c.org>
To: Alex Rønne Petersen <alex@...xrp.com>
Cc: musl@...ts.openwall.com
Subject: Re: Stack pointer is misaligned when invoking the musl
 dynamic linker directly to run a program without start files

On Wed, Jul 24, 2024 at 01:42:28AM +0200, Alex Rønne Petersen wrote:
> On Wed, Jul 24, 2024 at 1:07 AM Rich Felker <dalias@...c.org> wrote:
> >
> > On Wed, Jul 24, 2024 at 12:55:18AM +0200, Alex Rønne Petersen wrote:
> > > On Wed, Jul 24, 2024 at 12:46 AM Rich Felker <dalias@...c.org> wrote:
> > > >
> > > > On Tue, Jul 23, 2024 at 11:42:51PM +0200, Alex Rønne Petersen wrote:
> > > > > Hi,
> > > > >
> > > > > Repro:
> > > > >
> > > > >     $ cat test.s
> > > > >     .global _start
> > > > >     _start:
> > > > >     mov %rsp, %rdi
> > > > >     and $15, %rdi
> > > > >     call exit
> > > > >     $ musl-gcc test.s -nostartfiles
> > > > >     $ ./a.out; echo $?
> > > > >     0
> > > > >     $ /lib64/ld-linux-x86-64.so.2 ./a.out; echo $?
> > > > >     0
> > > > >     $ /lib/ld-musl-x86_64.so.1 ./a.out; echo $?
> > > > >     8
> > > > >     $ /lib/ld-musl-x86_64.so.1 --version
> > > > >     musl libc (x86_64)
> > > > >     Version 1.2.3
> > > > >
> > > > > I could well be missing something here, but at first glance, this
> > > > > *seems* like an ABI violation; the x86-64 psABI [0] states in §3.4..1
> > > > > that RSP is guaranteed to be 16-byte aligned on process entry. The
> > > > > same is true of many other architectures (though the amount obviously
> > > > > differs).
> > > > >
> > > > > I suppose it's debatable whether a program interpreter ought to be
> > > > > required to uphold the same guarantees as the kernel on process
> > > > > initialization?
> > > > >
> > > > > [0] https://gitlab.com/x86-psABIs/x86-64-ABI
> > > >
> > > > This is intentional. _start is not a C function subject to psABI
> > > > calling convention. It's an entry point with its own convention that
> > > > the stack pointer register point at the start of the ELF argument
> > > > packing, and that has a requirement to align the stack before calling
> > > > into C code.
> > >
> > > Can you elaborate on this point?
> > >
> > > To be clear on my end, I'm not suggesting that `_start` is a normal C
> > > function and should be subject to the calling convention. I'm
> > > specifically referring to §3.4.1 which deals with register state upon
> > > process initialization, i.e. when control is transferred to the ELF
> > > entry point by the kernel (or dynamic linker, in this case). This is
> > > the part I believe musl is in contravention of.
> >
> > OK, that's just not something we claim to conform to though. 3.4.1 is
> > documenting a contract between the kernel and the userspace runtime
> > (e_entry point of application or dynamic linker). Not the internal
> > contract between one part of musl (crt1) and another (ldso).
> 
> Yeah, that's the part I thought might be debatable.
> 
> The ABI is not clear on whether the program interpreter loading the
> program and transferring control to its entry point constitutes
> "process initialization". If I felt strongly about it, I would
> probably argue that it does as a practical matter, but given the level
> of precision usually seen in this particular document, I think it's
> also entirely reasonable to take the opposite position.
> 
> In any case, I think just having this officially on record as
> not-a-bug is fine. :)
> 
> >
> > The psABI document does not cover enough requirements to be able to
> > portably write your own custom replacement for crt1.o -- for example,
> > that it call __libc_start_main, or the contract for how
> > __libc_start_main is called. So if you want to do that, you need to be
> > aware of libc-specific requirements. And one of those, at least for
> > now, is tolerating that the stack pointer might not be aligned
> > (because it makes a lot more sense to just use the argument vector
> > in-place rather than rewriting the whole thing).
> 
> By the way, I'm not actually clear on *how* the stack pointer ends up
> misaligned vs what it is when given by the kernel. Just for my
> curiosity, would you be able to explain why?

Sure. On entry, the stack pointer points to an array of words that
look like:

    argc, argv0, argv1, argv2, ..., 0, env0, env1, ..., 0, [aux]

In the case of "/lib/ld-musl-x86_64.so.1 ./a.out", these look like
(from the kernel):

    2, &"/lib/ld-musl-x86_64.so.1", &"./a.out", 0, ...

In order to pass control to the application, ldso skips past the part
of the command line that belongs to ldso, decrementing argc by the
number of slots skipped, and writes the new argc there overwriting
whatever pointer was there before:

    2, 1, &"./a.out", 0, ...

Then it passes execution to the main program entry point with the
stack pointer pointing to the newly written argc slot. That slot's
address is whatever it was before, so it will be 8 mod 16 if an odd
number of slots were skipped. The only way this could be changed is by
shifting the whole array with memmove. Normally that wouldn't cost
much, but it would be at least slightly nontrivial time if you had a
gigantic number of arguments or environment slots.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.