|
Message-ID: <20141118191545.GA17522@brightrain.aerifal.cx>
Date: Tue, 18 Nov 2014 14:15:45 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: [PATCH] ARM atomics overhaul, try 2
Here's a new version of the ARM atomics overhaul patch which I'm much
happier with. Whereas the old version imposed a heavy address
computation in the caller at each point where an atomic was used, the
new version achieves a light computed jump inside the callee, using an
idiom of the form:
ldr ip,1f
ldr ip,[pc,ip]
add pc,pc,ip
1: .word relativeptr-1b
When relativeptr contains zero, as at program startup, the code
continues with the instruction after the .word directive (a dummy
version that's safe to use before initialization). Later, relativeptr
is filled with the difference between the address of the desired
version and the address of this dummy code.
As before, v7+ is the most highly optimized, with special versions of
the various atomics using ldrex/strex directly to avoid a nested cas
loop. For atomics, compile-time v6 builds are not significantly better
than baseline (v4t) builds, although the thread-pointer load is
optimized with a hard-coded instruction. I could make v6 builds use
the inline asm like v7+ does, but with "bl __a_barrier" instead of
"dmb ish", but I'm not sure how much of a win this would be, if any.
Comments? If no problems are noticed right away I'll probably commit
this soon as a basis for any future work that needs to be done
improving it, since I think it's already reasonably good (and much
better than what we had).
Rich
View attachment "arm_atomics_overhaul_try2.diff" of type "text/plain" (10131 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.