musl - musl sh2 support

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150427213603.GA23866@brightrain.aerifal.cx>
Date: Mon, 27 Apr 2015 17:36:03 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Cc: yuri.nunami@...wc.com, sumpei.kawasaki@...wc.com
Subject: musl sh2 support

Recently nsz and I have been looking at the state of the sh port and
noticed that the gusa soft atomics, which Bobby Bingham (original port
author) and I assumed would be sufficient for anything pre-sh4a,
actually don't work on pre-sh3 targets. This is explained on the GCC
bug-tracker threads here:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50457

but the TL;DR is that gusa works by setting an invalid stack pointer
as a sentinel to the kernel whereas sh1/sh2 exception-handling
requires a valid stack pointer. This issue may also affect __unmapself
which runs momentarily (roughly 1-2 cycles in userspace) without a
valid stack pointer. For non-SMP configurations I suspect it should
suffice for __unmapself to just set the stack pointer to point at some
global data for the kernel to use momentarily during exceptions.
Alternatively the first thread to call __unmapself could transform
into a reaper that never exits but unmaps future detached exiting
threads; this could even be a decent default C-only implementation of
__unmapself for archs/ABIs that can't handle threads unmapping their
own stacks.

Anyway, back to atomics. GCC introduced a new soft-tcb atomic model
that works like the old gusa but stores a flag (for the kernel to
inspect) indicating that an atomic sequence is in progress at a fixed
offset from the thread-pointer register, GBR. This offset has to be
aligned to 4 and in the range 0 to 1020. I can't find any
documentation on a default/ABI-accepted location for this flag,
though. The offsets that would be possible for musl to use immediately
are 0 and 4. These offsets are used by glibc to store the DTV pointer
and a pointer to the full thread structure; on musl they're unused but
kept to maintain the same TLS ABI used by the toolchain. So we could
use either of these, but the ABI would not be compatible with glibc,
which might be irrelevant since glibc will probably never support
sh1/sh2.

The other option is to use offset 8 by putting a TLS (.tdata section)
object in crt1.o to reserve the very first slot of application-owned
TLS for soft-tcb atomic use. Actual application TLS would then begin
at offset 12.

Offset -8 or -12 would be even better (sticking the flag in the end of
struct __pthread) but the GBR-relative addressing modes used don't
seem to support negative offsets.

In addition to the question of what to do with atomics, there's a
question of whether we need full runtime selection for the atomic
method at all. I've been told (but I'm not clear whether it's right)
that sh1/sh2(/sh2a?) have a different kernel syscall ABI, and since
they're nommu, it wouldn't be possible (or at least not efficiently)
to run normal dynamic-linked ELF binaries (where syscall ABI wouldn't
matter as long as you have the right libc.so installed on the system
you're running on) for sh3+ on sh1/2. So it might make sense to treat
sh1/sh2 as a separate arch for musl's purposes. But if this arch will
possibly have SMP implementations (e.g. running on sh4a or new tech)
then soft-tcb atomics will not suffice and it might need its own
method of runtime-atomic-selection to get a working atomic cas.

Ideas?

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.