|
Message-ID: <3201c36ee287e6d38e0f3805440a507de8fb52bf.camel@postmarketos.org> Date: Thu, 30 May 2024 12:17:59 +0200 From: Pablo Correa Gomez <pabloyoyoista@...tmarketos.org> To: Rich Felker <dalias@...c.org> Cc: musl@...ts.openwall.com Subject: Re: Crash in kill(..., SIGHUP) when using SA_ONSTACK Hi Rich, thanks a lot for your reply El mie, 29-05-2024 a las 09:15 -0400, Rich Felker escribió: > On Wed, May 29, 2024 at 02:04:25PM +0200, Pablo Correa Gomez wrote: > > Hi everybody, > > > > I am responsible for musl CI in GNOME's GLib, and we have recently > > bumped into a crash that I have been unable to resolve. > > > > https://gitlab.gnome.org/GNOME/glib/- > > /commit/137db219a7266300ffde1aa75d781284fb0807cb > > introduced in GLib an alternate stack by setting the signal action > > SA_ONSTACK if available. However, the tests that were introduced, > > and > > that pass in most other libc's (there's CI for a lot more than just > > glibc and musl) crash in my alpine linux edge installation with > > SIGSEGV > > (stack trace below) while doing: kill (getpid(), SIGHUP) > > > > I have verified that not adding SA_ONSTACK fixes the crash. Would > > anybody have some pointers of what could possibly be going wrong? > > If > > anybody is really interested, the public issue is > > https://gitlab.gnome.org/GNOME/glib/-/issues/3315 > > > > Stack trace > > ------------ > > > > Thread 1 "unix" received signal SIGSEGV, Segmentation fault. > > 0x00007ffff7fa96e8 in __syscall2 (a2=1, a1=17483, n=62) at > > ../arch/x86_64/syscall_arch.h:21 > > warning: 21 ./arch/x86_64/syscall_arch.h: No such file or > > directory > > (gdb) bt > > #0 0x00007ffff7fa96e8 in __syscall2 (a2=1, a1=17483, n=62) at > > ../arch/x86_64/syscall_arch.h:21 > > #1 kill (pid=17483, sig=sig@...ry=1) at src/signal/kill.c:6 > > #2 0x0000555555556e96 in test_signal (signum=signum@...ry=1) at > > .../glib/tests/unix.c:534 > > #3 0x0000555555557200 in test_signal_alternate_stack (signal=1) at > > .../glib/tests/unix.c:590 > > #4 0x00007ffff7e8f364 in test_case_run (path=<optimized out>, > > test_run_name=0x55555555d3f0 "/glib-unix/sighup/alternate-stack", > > tc=0x55555555db60) at ../glib/gtestutils.c:2988 > > #5 g_test_run_suite_internal (suite=suite@...ry=0x55555555da70, > > path=path@...ry=0x0) at ../glib/gtestutils.c:3090 > > #6 0x00007ffff7e8f2db in g_test_run_suite_internal > > (suite=suite@...ry=0x7ffff7ffee20, path=path@...ry=0x0) at > > .../glib/gtestutils.c:3109 > > #7 0x00007ffff7e8f2db in g_test_run_suite_internal > > (suite=suite@...ry=0x7ffff7ffede0, path=path@...ry=0x0) at > > .../glib/gtestutils.c:3109 > > #8 0x00007ffff7e8f86a in g_test_run_suite > > (suite=suite@...ry=0x7ffff7ffede0) at ../glib/gtestutils.c:3189 > > #9 0x00007ffff7e8f8ea in g_test_run () at > > ../glib/gtestutils.c:2275 > > #10 0x00005555555561f7 in main (argc=<optimized out>, > > argv=<optimized > > out>) at ../glib/tests/unix.c:910 > > Can you get a disassembly and register dump at the point of crash? (gdb) layout asm 0x7ffff7fa96f9 <kill+7> movslq %esi,%rsi 0x7ffff7fa96fc <kill+10> mov $0x3e,%eax 0x7ffff7fa9701 <kill+15> syscall >0x7ffff7fa9703 <kill+17> mov %rax,%rdi 0x7ffff7fa9706 <kill+20> call 0x7ffff7f7afb7 <__syscall_ret> 0x7ffff7fa970b <kill+25> add $0x8,%rsp 0x7ffff7fa970f <kill+29> ret 0x7ffff7fa9710 <killpg> test %edi,%edi 0x7ffff7fa9712 <killpg+2> js 0x7ffff7fa971b <killpg+11> 0x7ffff7fa9714 <killpg+4> neg %edi 0x7ffff7fa9716 <killpg+6> jmp 0x7ffff7fa96f2 <kill> 0x7ffff7fa971b <killpg+11> sub $0x8,%rsp 0x7ffff7fa971f <killpg+15> call 0x7ffff7f78bae <__errno_location> 0x7ffff7fa9724 <killpg+20> movl $0x16,(%rax) 0x7ffff7fa972a <killpg+26> mov $0xffffffff,%eax 0x7ffff7fa972f <killpg+31> add $0x8,%rsp 0x7ffff7fa9733 <killpg+35> ret 0x7ffff7fa9734 <psiginfo> mov (%rdi),%edi 0x7ffff7fa9736 <psiginfo+2> jmp 0x7ffff7fa973b <psignal> 0x7ffff7fa973b <psignal> push %r15 0x7ffff7fa973d <psignal+2> push %r14 0x7ffff7fa973f <psignal+4> push %r13 0x7ffff7fa9741 <psignal+6> lea 0x51938(%rip),%r13 # 0x7ffff7ffb080 <__stderr_FILE> 0x7ffff7fa9748 <psignal+13> push %r12 0x7ffff7fa974a <psignal+15> xor %r12d,%r12d 0x7ffff7fa974d <psignal+18> push %rbp 0x7ffff7fa974e <psignal+19> push %rbx 0x7ffff7fa974f <psignal+20> mov %rsi,%rbx 0x7ffff7fa9752 <psignal+23> sub $0x18,%rsp 0x7ffff7fa9756 <psignal+27> call 0x7ffff7fb5780 <strsignal> (gdb) info registers rax 0x0 0 rbx 0x7ffff7f55c30 140737353440304 rcx 0x7ffff7fa9703 140737353783043 rdx 0x0 0 rsi 0x1 1 rdi 0x525e 21086 rbp 0x1 0x1 rsp 0x7fffffffd5d0 0x7fffffffd5d0 r8 0x0 0 r9 0x80 128 r10 0x8 8 r11 0x202 514 r12 0x7ffff7ffdb5c 140737354128220 r13 0x1 1 r14 0x7fffffffd6d0 140737488344784 r15 0x7fffffffd6f0 140737488344816 rip 0x7ffff7fa9703 0x7ffff7fa9703 <kill+17> eflags 0x202 [ IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 fs_base 0x7ffff7ffdb28 140737354128168 gs_base 0x0 0 Does this tell you anything? > I'm not sure if the crashing code is running on the signal stack or > main stack, but here's a thought: is it possible the CI machines are > running on a cpu/kernel with some monster AVX512 or whatever > extension > enabled with register file that doesn't fit in MINSIGSTKSZ? That might be the case. Would explain why I could not reproduce in my 9-year old laptop I was running last month, but I can reproduce it now in a new machine with a 13th Gen Intel(R) Core(TM) i7-1360P > If so, > using sysconf(_SC_MINSIGSTKSZ) (conditional on _SC_MINSIGSTKSZ being > defined) to allocate the alt stack should mitigate the problem. If > doing this, it should probably be allocated by mmap or malloc, since > in principle it could be too large for the caller's stack. > I'll forward this to the maintainers, let's see if we can come up with a solution. Thanks a lot for your feedback! > It's also possible that the kernel may have some weird behavior > deciding if the task is already "running on the alt stack" when the > alt stack is embedded in the normal stack like this. Just getting rid > of that might be worth trying. If so, whether the problem manifests > could be subject to timing of signal delivery (although I would not > expect that for synchronously generated signals like here). > > Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.