Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181110234259.GH5150@brightrain.aerifal.cx>
Date: Sat, 10 Nov 2018 18:42:59 -0500
From: Rich Felker <dalias@...c.org>
To: Sebastian Kemper <sebastian_ml@....net>
Cc: musl@...ts.openwall.com
Subject: Re: SIGSEGV related to threads since 1.1.20?

On Sun, Nov 11, 2018 at 12:31:45AM +0100, Sebastian Kemper wrote:
> Hello all,
> 
> I've got an issue with mariadb segfaulting. And apparently it has to do
> with the switch from musl 1.1.19 to 1.1.20.
> 
> First off, I'm not a programmer, so the info below might be warped a
> bit.
> 
> I maintain the mariadb package on OpenWrt. There was a report on the
> issues tracker about a segfault:
> https://github.com/openwrt/packages/issues/7230
> 
> I installed a current openwrt snapshot today, then installed
> mariadb-server. Afterwards I ran
> 
> mysql_install_db --force --basedir=/usr
> 
> to init the database. And then there was a segfault:
> 
> Sat Nov 10 23:41:08 2018 kern.info kernel: [17053.144829] do_page_fault(): sending SIGSEGV to mysqld for invalid write access to 00000000
> Sat Nov 10 23:41:08 2018 kern.info kernel: [17053.144839] epc = 77fc2058 in libc.so[77f4a000+93000]
> Sat Nov 10 23:41:08 2018 kern.info kernel: [17053.144863] ra  = 77fc1fa0 in libc.so[77f4a000+93000]
> 
> The messages look the same as in the report. Although the reporter used
> a different way to get to this result (he attempted to connect to the
> running server, whereas I tried to create a DB).
> 
> This is on an old dlink router (mips_24kc, ar71xx). The reporter used
> something else (mips32r2, mir3g).
> 
> I went and compiled mariadb with debug symbols and installed the
> unstripped binaries. Then I ran gdbserver on the mips device and
> connected to it from my laptop. When I ran the commands in gdb I got
> this output:
> 
> (gdb) c
> Continuing.
> 
> Thread 2 "mysqld" received signal SIGSEGV, Segmentation fault.
> __pthread_timedjoin_np (t=0x6bdced60, res=0x0, at=0x0) at src/thread/pthread_join.c:15
> 15                      if (state >= DT_DETACHED) a_crash();
> (gdb) bt
> #0  __pthread_timedjoin_np (t=0x6bdced60, res=0x0, at=0x0) at src/thread/pthread_join.c:15
> #1  0x006bf754 in handle_bootstrap_impl (thd=<optimized out>) at /home/sk/tmp/openwrt/build_dir/target-mips_24kc_musl/mariadb-10.2.17/sql/sql_parse.cc:950
> #2  0x006bfd58 in do_handle_bootstrap (thd=<optimized out>) at /home/sk/tmp/openwrt/build_dir/target-mips_24kc_musl/mariadb-10.2.17/sql/sql_parse.cc:1094
> #3  0x006bfdfc in handle_bootstrap (arg=0x1dc7448) at /home/sk/tmp/openwrt/build_dir/target-mips_24kc_musl/mariadb-10.2.17/sql/sql_parse.cc:1077
> #4  0x77fd10fc in start (p=0x77fd10fc <start+100>) at src/thread/pthread_create.c:147
> #5  0x77f6702c in __clone () at src/thread/mips/clone.s:32
> Backtrace stopped: frame did not save the PC
> 
> So apparently __pthread_timedjoin_np gets some NULL input and then the
> program segfaults. I reran this with a breakpoint on the function and it
> got called before the segfault and in these calls the args were not
> NULL.

This it an intentional trap for undefined behavior when the caller
attempts to join a detached thread or detach a thread that was not
joinable (already detached or already being joined by another thread). 

In the case of mariadb, it was reported as:

https://jira.mariadb.org/browse/MDEV-17200

and the corresponding Alping Linux bug:

https://bugs.alpinelinux.org/issues/9407

The patch is available in Alpine Linux's aport repo:

https://git.alpinelinux.org/cgit/aports/tree/main/mariadb/fix-pthread-detach.patch

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.