|
Message-ID: <349f4e17-8027-c521-eeb3-aa69e8f2b5a4@landley.net> Date: Fri, 16 Feb 2024 19:48:27 -0600 From: Rob Landley <rob@...dley.net> To: toybox <toybox@...ts.landley.net>, musl <musl@...ts.openwall.com> Subject: Not sure how to debug this one. While grinding away at release prep, I hit a WEIRD one. The qemu-system-sh4 target got broken by commit 3e0e8c687eee (PID 1 exits trying to run the init script), which is the commit that changed the stdout buffering type. It's not the kernel, if I use the last release kernel with the new root filesystem I see the problem, and newly built kernel from today's git with last release's initramfs.cpio.gz boots to a shell prompt. The actual _problem_ is that sigsetjmp() is faulting (in sh.c function run_command()), for NO OBVIOUS REASON. Calling memset() to zero the struct before the sigsetjmp() works fine, but the sigsetjmp() call (built against musl-libc) never returns. Not siglongjmp, _sigsetjmp_. Which means it's failing somewhere in: https://git.musl-libc.org/cgit/musl/tree/src/signal/sh/sigsetjmp.s And I dunno how to stick a printf into superh assembly code. The sigjmp_buf lives on the stack, but I confirmed it's 8 byte aligned, and not even straddling a page boundary. I can access variables I stick before and after it, so it can't be some kind of "fault due to guard page" weirdness? (I suppose the optimizer may be invalidating that test, I could try adding "volatile"...) While debugging I made the problem GO AWAY more than once by sticking printfs() and similar into the code, but that's not FIXING it. Adding another sigjmp_buf declaration and call to sigsetjmp() right at the start of the function works fine (although the other one in the place it's in now still fails). I confirmed that sigsetjmp() is annotated returns_twice in musl (but even if it _wasn't_ problems would show up when you did a longjmp, it wouldn't manifest as the first call to setjmp never returning. This isn't making it to the line after the function on the first pass through, even if I move it outside the if(). I confirmed it happens with both gcc 11.2 (musl 1.2.4?) and the older gcc 9.4 toolchain (musl 1.2.3? I think? It would be nice if musl actually had a way to identify the version of installed library binaries). I do not currently have a superh build of gdbserver and corresponding host gdb that understands foreign binaries, and the last time I built gdb as part of a cross compiler was many moons ago. I'm open to suggestions, this one's funky. (The problem with trying to configure the kernel to produce core dumps and compare against the readelf -d output is it's running as PID 1. Um... maybe kgdb? Still think I need to build a host cross-gdb to connect to it though...) It would be really nice if somebody who understood the assembly could spot something... Rob
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.