Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <349f4e17-8027-c521-eeb3-aa69e8f2b5a4@landley.net>
Date: Fri, 16 Feb 2024 19:48:27 -0600
From: Rob Landley <rob@...dley.net>
To: toybox <toybox@...ts.landley.net>, musl <musl@...ts.openwall.com>
Subject: Not sure how to debug this one.

While grinding away at release prep, I hit a WEIRD one. The qemu-system-sh4
target got broken by commit 3e0e8c687eee (PID 1 exits trying to run the init
script), which is the commit that changed the stdout buffering type.

It's not the kernel, if I use the last release kernel with the new root
filesystem I see the problem, and newly built kernel from today's git with last
release's initramfs.cpio.gz boots to a shell prompt.

The actual _problem_ is that sigsetjmp() is faulting (in sh.c function
run_command()), for NO OBVIOUS REASON. Calling memset() to zero the struct
before the sigsetjmp() works fine, but the sigsetjmp() call (built against
musl-libc) never returns.

Not siglongjmp, _sigsetjmp_. Which means it's failing somewhere in:

https://git.musl-libc.org/cgit/musl/tree/src/signal/sh/sigsetjmp.s

And I dunno how to stick a printf into superh assembly code.

The sigjmp_buf lives on the stack, but I confirmed it's 8 byte aligned, and not
even straddling a page boundary. I can access variables I stick before and after
it, so it can't be some kind of "fault due to guard page" weirdness? (I suppose
the optimizer may be invalidating that test, I could try adding "volatile"...)

While debugging I made the problem GO AWAY more than once by sticking printfs()
and similar into the code, but that's not FIXING it. Adding another sigjmp_buf
declaration and call to sigsetjmp() right at the start of the function works
fine (although the other one in the place it's in now still fails). I confirmed
that sigsetjmp() is annotated returns_twice in musl (but even if it _wasn't_
problems would show up when you did a longjmp, it wouldn't manifest as the first
call to setjmp never returning. This isn't making it to the line after the
function on the first pass through, even if I move it outside the if().

I confirmed it happens with both gcc 11.2 (musl 1.2.4?) and the older gcc 9.4
toolchain (musl 1.2.3? I think? It would be nice if musl actually had a way to
identify the version of installed library binaries).

I do not currently have a superh build of gdbserver and corresponding host gdb
that understands foreign binaries, and the last time I built gdb as part of a
cross compiler was many moons ago.

I'm open to suggestions, this one's funky. (The problem with trying to configure
the kernel to produce core dumps and compare against the readelf -d output is
it's running as PID 1. Um... maybe kgdb? Still think I need to build a host
cross-gdb to connect to it though...)

It would be really nice if somebody who understood the assembly could spot
something...

Rob

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.