musl - Re: Musl incompatibility with Docker and AWS's C5 class

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20180315145247.GE1436@brightrain.aerifal.cx>
Date: Thu, 15 Mar 2018 10:52:47 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Musl incompatibility with Docker and AWS's C5 class

On Thu, Mar 15, 2018 at 09:37:28AM -0400, Ryan Wilson-Perkin wrote:
> Hey musl-devs,
> 
> Yesterday we tested out the new C5 instance class that AWS offers using our
> Alpine-based images and discovered that we would get a segfault whenever we
> ran `npm install`. Tracing the code, it appeared to be happening due to the
> use of node's "process.setuid" and "process.setgid" commands, either of
> which would cause a segfault.
> 
> We're running Alpine containers inside Docker on EC2, and the smallest
> thing I can provide to reproduce this issue would be to run the following
> on a C5 EC2 instance:
> 
> docker run -it node:9-alpine sh -c "node -e 'process.setgid(0)'"
> 
> A core dump provided the following limited information:
> 
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> warning: Unexpected size of section `.reg-xstate/26' in core file.
> #0 __cp_end () at src/thread/x86_64/syscall_cp.s:29
> 29 src/thread/x86_64/syscall_cp.s: No such file or directory.
> [Current thread is 1 (LWP 26)]
> (gdb) bt
> #0 __cp_end () at src/thread/x86_64/syscall_cp.s:29
> #1 0x00007fd6161eecd8 in __syscall_cp_c (nr=202, u=<optimized out>,
> v=<optimized out>, w=<optimized out>, x=<optimized out>, y=<optimized out>,
> z=0) at src/thread/pthread_cancel.c:35
> #2 0x00007fd6161ee2f5 in __timedwait_cp (addr=addr@...ry=0x5612e9ebf820,
> val=val@...ry=-1, clk=clk@...ry=0, at=at@...ry=0x0,
> priv=<optimized out>) at src/thread/__timedwait.c:31
> #3 0x00007fd6161f0e2c in sem_timedwait (sem=0x5612e9ebf820, at=0x0) at
> src/thread/sem_timedwait.c:23
> #4 0x00007fd615d7a5a4 in uv_sem_wait () from /usr/lib/libuv.so.1
> #5 0x00005612e94dc00c in node::DebugSignalThreadMain(void*) ()
> #6 0x00007fd6161ef665 in start (p=0x7fd616424ab0) at
> src/thread/pthread_create.c:145
> #7 0x00007fd6161f13e4 in __clone () at src/thread/x86_64/clone.s:21
> Backtrace stopped: frame did not save the PC

Changing uids/gids in a multithreaded process involves synchronizing
all the threads with a signal. Based on the information, my guess is
that the stack for at least one thread is barely large enough, and
when the signal arrives, creation of the signal frame (in the kernel)
overflows the stack and the kernel generates SIGSEGV for the process.

One approach to test if this is the case and mitigate it: LD_PRELOAD a
library that calls pthread_setattr_default_np from a constructor to
set a larger default thread stack size. If that turns out to be the
problem, the Alpine node package should probably be patched to
increase the stack size. We may also be increasing the default in musl
somewhat (from 80k to 128k or so) in the near future; if so it would
likely be enough to solve your problem here.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.