Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <516E797A-8CD1-461D-8CC6-025BE9CBAD06@plan44.ch>
Date: Fri, 13 Sep 2024 13:30:00 +0200
From: Lukas Zeller <luz@...n44.ch>
To: musl@...ts.openwall.com
Subject: SIGSEGV/stack overflow in pthread_create - race condition?

Hello list,

I hope this is the right place to post the following.

Using OpenWrt 22.03 with musl 1.2.3, *some* times, on *some* RPi devices (the faster, the more likely) I get the following:

> Thread 2 "debugtarget" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 4993.5022]
> 0xb6ec42f0 in pk_parse_kite_request () from /Volumes/CaseSens/openwrt-2/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-bcm27xx/usr/lib/libpagekite.so.1
> (gdb) bt
> #0  0xb6ec42f0 in pk_parse_kite_request ()
>    from /Volumes/CaseSens/openwrt-2/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-bcm27xx/usr/lib/libpagekite.so.1
> #1  0xb6ec457c in pk_parse_pagekite_response ()
>    from /Volumes/CaseSens/openwrt-2/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-bcm27xx/usr/lib/libpagekite.so.1
> #2  0xb6ec4b1c in pk_connect_ai ()
>    from /Volumes/CaseSens/openwrt-2/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-bcm27xx/usr/lib/libpagekite.so.1
> #3  0xb6ec8494 in pkm_reconnect_all ()
>    from /Volumes/CaseSens/openwrt-2/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-bcm27xx/usr/lib/libpagekite.so.1
> #4  0xb6ec79d4 in pkb_check_tunnels ()
>    from /Volumes/CaseSens/openwrt-2/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-bcm27xx/usr/lib/libpagekite.so.1
> #5  0xb6ec7b94 in pkb_run_blocker ()
>    from /Volumes/CaseSens/openwrt-2/scripts/../staging_dir/target-arm_cortex-a7+neon-vfpv4_musl_eabi/root-bcm27xx/usr/lib/libpagekite.so.1
> #6  0xb6fd0af4 in start (p=0xb6adfd68) at src/thread/pthread_create.c:203
> #7  0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #8  0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #9  0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #10 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #11 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #12 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #13 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #14 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #15 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #16 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #17 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #18 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #19 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #20 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #21 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #22 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #23 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #24 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #25 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> #26 0xb6fcf22c in __clone () at src/thread/arm/clone.s:23
> [... thousands of iterations ...]

Searching the internet i found that this is not specific to my setup, OpenWrt or libpagekite, but happens in different, otherwise completely unrelated setups, such as https://github.com/mikebrady/shairport-sync/issues/388 or https://github.com/void-linux/void-packages/issues/980. 

I could not spot any conclusive findings - in the second example, apparently they just made the stack bigger to "solve" it, which indicates that maybe the race can come to a benign end eventually and unwind the stack before it explodes.

As I am aware musl 1.2.3 is not the current version, I applied the changes in pthread_create() between 1.2.3 and current master, which is only one commit, "d64148a - fix potential unsynchronized access to killlock state at thread exit". Applying this did not make any difference.

Any ideas how to start digging deeper here? I guess I'm out of my depth here, neither familiar with musl internals (nor pagekitec's, to hack a workaround).

Thanks in advance!

Lukas

--
Lukas Zeller, plan44.ch
luz@...n44.ch - https://plan44.ch





Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.