Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181110233145.GA9199@darth.lan>
Date: Sun, 11 Nov 2018 00:31:45 +0100
From: Sebastian Kemper <sebastian_ml@....net>
To: musl@...ts.openwall.com
Subject: SIGSEGV related to threads since 1.1.20?

Hello all,

I've got an issue with mariadb segfaulting. And apparently it has to do
with the switch from musl 1.1.19 to 1.1.20.

First off, I'm not a programmer, so the info below might be warped a
bit.

I maintain the mariadb package on OpenWrt. There was a report on the
issues tracker about a segfault:
https://github.com/openwrt/packages/issues/7230

I installed a current openwrt snapshot today, then installed
mariadb-server. Afterwards I ran

mysql_install_db --force --basedir=/usr

to init the database. And then there was a segfault:

Sat Nov 10 23:41:08 2018 kern.info kernel: [17053.144829] do_page_fault(): sending SIGSEGV to mysqld for invalid write access to 00000000
Sat Nov 10 23:41:08 2018 kern.info kernel: [17053.144839] epc = 77fc2058 in libc.so[77f4a000+93000]
Sat Nov 10 23:41:08 2018 kern.info kernel: [17053.144863] ra  = 77fc1fa0 in libc.so[77f4a000+93000]

The messages look the same as in the report. Although the reporter used
a different way to get to this result (he attempted to connect to the
running server, whereas I tried to create a DB).

This is on an old dlink router (mips_24kc, ar71xx). The reporter used
something else (mips32r2, mir3g).

I went and compiled mariadb with debug symbols and installed the
unstripped binaries. Then I ran gdbserver on the mips device and
connected to it from my laptop. When I ran the commands in gdb I got
this output:

(gdb) c
Continuing.

Thread 2 "mysqld" received signal SIGSEGV, Segmentation fault.
__pthread_timedjoin_np (t=0x6bdced60, res=0x0, at=0x0) at src/thread/pthread_join.c:15
15                      if (state >= DT_DETACHED) a_crash();
(gdb) bt
#0  __pthread_timedjoin_np (t=0x6bdced60, res=0x0, at=0x0) at src/thread/pthread_join.c:15
#1  0x006bf754 in handle_bootstrap_impl (thd=<optimized out>) at /home/sk/tmp/openwrt/build_dir/target-mips_24kc_musl/mariadb-10.2.17/sql/sql_parse.cc:950
#2  0x006bfd58 in do_handle_bootstrap (thd=<optimized out>) at /home/sk/tmp/openwrt/build_dir/target-mips_24kc_musl/mariadb-10.2.17/sql/sql_parse.cc:1094
#3  0x006bfdfc in handle_bootstrap (arg=0x1dc7448) at /home/sk/tmp/openwrt/build_dir/target-mips_24kc_musl/mariadb-10.2.17/sql/sql_parse.cc:1077
#4  0x77fd10fc in start (p=0x77fd10fc <start+100>) at src/thread/pthread_create.c:147
#5  0x77f6702c in __clone () at src/thread/mips/clone.s:32
Backtrace stopped: frame did not save the PC

So apparently __pthread_timedjoin_np gets some NULL input and then the
program segfaults. I reran this with a breakpoint on the function and it
got called before the segfault and in these calls the args were not
NULL.

Anyway. I checked on openwrt's github what happened to musl in the past
months. And on Sep 21 musl was upgraded from 1.1.19 to 1.1.20. So I
reverted this commit and compiled 1.1.19. I then just downgraded musl on
the router (on-the-fly). That caused some programs like dropbear to stop
working properly due to missing symbols. OK, expected.

But when I ran 

mysql_install_db --force --basedir=/usr

it completed without errors. And once I upgraded to musl 1.1.20 I got
the segfault again.

I was hoping that maybe you could take a look at this :)

Kind regards,
Seb

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.