|
Message-ID: <64245dca-3c6e-3918-701c-dcf3f8e00783@bitwagon.com> Date: Mon, 1 Jan 2018 19:15:50 -0800 From: John Reiser <jreiser@...wagon.com> To: musl@...ts.openwall.com Subject: Re: [PATCH] Add comments to i386 assembly source On 01/01/2018 13:49 UTC, Rich Felker wrote: > On Mon, Jan 01, 2018 at 02:57:02PM -0800, John Reiser wrote: >> There's a bug. clone() is a user-level function that can be used >> independently of the musl internal implementation of threads. >> Thus when clone() in musl/src/linux/clone.c calls >> return __syscall_ret(__clone(func, stack, flags, arg, ptid, tls, ctid)); >> then the i386 implementation of __clone has no guarantee about >> the value in %gs, and it is a bug to assume that (%gs >> 3) >> fits in 8 bits. > > The ABI is that at function call or any time a signal could be > received, %gs must always be a valid segment register value reflecting > the current thread's thread pointer. If this is violated, the program > has undefined behavior. More than one segment descriptor can designate the same subset of the linear address space. Duplicate the segment descriptor to a target selector that is >= 256, and load %gs with the duplicate selector before calling clone(). > >> The code in musl/src/thread/i386/clone.s wastes up to 12 bytes >> when aligning the new stack, by aligning before [pre-]allocating >> space for the one argument to the thread function. > > I suspect the initial value happens to be aligned anyway in which case > reserving 16 bytes and aligning to 16 is the same as reserving 4 and > aligning to 16. If you think it's not, I don't mind changing if you > can do careful testing to make sure it doesn't introduce any bugs. This is another bug! Consider the valid code: void **lo_stack = malloc(5 * sizeof(void *)); /* malloc() guarantees 16-byte alignment of lo_stack */ clone(func, &lo_stack[5], ...); then __clone() does: and $-16,%ecx /* &lo_stack[4] */ sub $ 16,%ecx /* &lo_stack[0] */ ... mov %ecx,%esp /* new thread: implicit action of ___NR_clone system call */ call *%eax /* OUT-OF-BOUNDS: lo_stack[-1] = return address */ Thus, starting the thread function has scribbled outside the allocated area, even though the lo_stack[] array can accommodate the call by the code I showed: lea -NBPW(arg2),%ecx /* &lo_stack[4] */ and $-16,%ecx /* still &lo_stack[4] */ ... mov %ecx,%esp /* new thread: implicit action of __NR_clone system call */ call *%eax /* lo_stack[3] = return address */ The danger is not "new bugs", but rather revealing latent bugs that were obscured by the less-strict old code. For instance, if the thread function actually has two formal parameters, or if it uses va_arg() to reference beyond the first actual argument, then running the optimal code is more likely to notice. --
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.