|
Message-ID: <20150427213603.GA23866@brightrain.aerifal.cx> Date: Mon, 27 Apr 2015 17:36:03 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Cc: yuri.nunami@...wc.com, sumpei.kawasaki@...wc.com Subject: musl sh2 support Recently nsz and I have been looking at the state of the sh port and noticed that the gusa soft atomics, which Bobby Bingham (original port author) and I assumed would be sufficient for anything pre-sh4a, actually don't work on pre-sh3 targets. This is explained on the GCC bug-tracker threads here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50457 but the TL;DR is that gusa works by setting an invalid stack pointer as a sentinel to the kernel whereas sh1/sh2 exception-handling requires a valid stack pointer. This issue may also affect __unmapself which runs momentarily (roughly 1-2 cycles in userspace) without a valid stack pointer. For non-SMP configurations I suspect it should suffice for __unmapself to just set the stack pointer to point at some global data for the kernel to use momentarily during exceptions. Alternatively the first thread to call __unmapself could transform into a reaper that never exits but unmaps future detached exiting threads; this could even be a decent default C-only implementation of __unmapself for archs/ABIs that can't handle threads unmapping their own stacks. Anyway, back to atomics. GCC introduced a new soft-tcb atomic model that works like the old gusa but stores a flag (for the kernel to inspect) indicating that an atomic sequence is in progress at a fixed offset from the thread-pointer register, GBR. This offset has to be aligned to 4 and in the range 0 to 1020. I can't find any documentation on a default/ABI-accepted location for this flag, though. The offsets that would be possible for musl to use immediately are 0 and 4. These offsets are used by glibc to store the DTV pointer and a pointer to the full thread structure; on musl they're unused but kept to maintain the same TLS ABI used by the toolchain. So we could use either of these, but the ABI would not be compatible with glibc, which might be irrelevant since glibc will probably never support sh1/sh2. The other option is to use offset 8 by putting a TLS (.tdata section) object in crt1.o to reserve the very first slot of application-owned TLS for soft-tcb atomic use. Actual application TLS would then begin at offset 12. Offset -8 or -12 would be even better (sticking the flag in the end of struct __pthread) but the GBR-relative addressing modes used don't seem to support negative offsets. In addition to the question of what to do with atomics, there's a question of whether we need full runtime selection for the atomic method at all. I've been told (but I'm not clear whether it's right) that sh1/sh2(/sh2a?) have a different kernel syscall ABI, and since they're nommu, it wouldn't be possible (or at least not efficiently) to run normal dynamic-linked ELF binaries (where syscall ABI wouldn't matter as long as you have the right libc.so installed on the system you're running on) for sh3+ on sh1/2. So it might make sense to treat sh1/sh2 as a separate arch for musl's purposes. But if this arch will possibly have SMP implementations (e.g. running on sh4a or new tech) then soft-tcb atomics will not suffice and it might need its own method of runtime-atomic-selection to get a working atomic cas. Ideas? Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.