Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240913152522.GA10433@brightrain.aerifal.cx>
Date: Fri, 13 Sep 2024 11:25:23 -0400
From: Rich Felker <dalias@...c.org>
To: Lukas Zeller <luz@...n44.ch>
Cc: musl@...ts.openwall.com
Subject: Re: SIGSEGV/stack overflow in pthread_create - race condition?

On Fri, Sep 13, 2024 at 01:30:00PM +0200, Lukas Zeller wrote:
> Hello list,
> 
> I hope this is the right place to post the following.
> 
> Using OpenWrt 22.03 with musl 1.2.3, *some* times, on *some* RPi devices (the faster, the more likely) I get the following:
> 
> > Thread 2 "debugtarget" received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 4993.5022]
> > 0xb6ec42f0 in pk_parse_kite_request () from /Volumes/CaseSens/openwrt-2/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-bcm27xx/usr/lib/libpagekite.so.1
> > (gdb) bt
> > #0  0xb6ec42f0 in pk_parse_kite_request ()
> >    from /Volumes/CaseSens/openwrt-2/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-bcm27xx/usr/lib/libpagekite.so.1
> > #1  0xb6ec457c in pk_parse_pagekite_response ()
> >    from /Volumes/CaseSens/openwrt-2/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-bcm27xx/usr/lib/libpagekite.so.1
> > #2  0xb6ec4b1c in pk_connect_ai ()
> >    from /Volumes/CaseSens/openwrt-2/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-bcm27xx/usr/lib/libpagekite.so.1
> > #3  0xb6ec8494 in pkm_reconnect_all ()
> >    from /Volumes/CaseSens/openwrt-2/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-bcm27xx/usr/lib/libpagekite.so.1
> > #4  0xb6ec79d4 in pkb_check_tunnels ()
> >    from /Volumes/CaseSens/openwrt-2/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-bcm27xx/usr/lib/libpagekite.so.1
> > #5  0xb6ec7b94 in pkb_run_blocker ()
> >    from /Volumes/CaseSens/openwrt-2/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-bcm27xx/usr/lib/libpagekite.so.1
> > #6  0xb6fd0af4 in start (p=0xb6adfd68) at src/thread/pthread_create.c:203
> > #7  0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #8  0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #9  0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #10 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #11 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #12 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #13 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #14 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #15 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #16 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #17 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #18 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #19 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #20 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #21 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #22 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #23 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #24 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #25 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > #26 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> > [... thousands of iterations ...]
> 
> Searching the internet i found that this is not specific to my
> setup, OpenWrt or libpagekite, but happens in different, otherwise
> completely unrelated setups, such as
> https://github.com/mikebrady/shairport-sync/issues/388 or
> https://github.com/void-linux/void-packages/issues/980.
> 
> I could not spot any conclusive findings - in the second example,
> apparently they just made the stack bigger to "solve" it, which
> indicates that maybe the race can come to a benign end eventually
> and unwind the stack before it explodes.

Why do you expect this is a race condition? The backtrace is not
sufficient to show it, but my default assumption would just be that
this is just a stack overflow in the application code, i.e. allocating
too much on the stack (in automatic storage local variables).

You can increase the default stack size at link time with
-Wl,stack-size=N where N is the size you want (default 128k so
increase from there), or make the program explicitly request the
amount of space it needs with pthread attribute functions.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.