|
Message-ID: <20160110122139.GF2016@debian> Date: Sun, 10 Jan 2016 13:21:39 +0100 From: Markus Wichmann <nullplan@....net> To: musl@...ts.openwall.com Subject: atomic.h cleanup Hi all, The development roadmap on the musl wiki lists the ominous point "atomic.h cleanup" for 1.2.0. I assume you mean a sort of simplification and unification. I noticed that for the RISC arch's there are rather liberal amounts of inline assembly for the atomic operations. And I have always been taught, that as soon as you start copying code, you are probably doing it wrong. So first thing I'd do: add a new file, let's call it atomic_debruijn.h. It contains an implementation of a_ctz() and a_ctz_64() based on the DeBruijn number. That way, all the architectures currently implementing a_ctz() in this manner can just include that file, and a lot of duplicate code goes out the window. Second thing: We can reduce the inline assembly footprint and the amount of duplicate code by adding a new file, let's call it atomic_llsc.h, that implements a_cas(), a_cas_p(), a_swap(), a_fetch_add(), a_inc(), a_dec(), a_and() and a_or() in terms of new functions that would have to be defined, namely: static inline void a_presync(void) - execute any barrier needed before attempting an atomic operation, like "dmb ish" for arm, or "sync" for ppc. static inline void a_postsync(void) - execute any barrier needed afterwards, like "isync" for PPC, or, again, "dmb ish" for ARM. static inline int a_ll(int*) - perform an LL on the given pointer and return the value there. This would be "lwarx" for PPC, or "ldrex" for ARM. static inline int a_sc(int*, int) - perform an SC on the given pointer with the given value. Return zero iff that failed. static inline void* a_ll_p(void*) - same as a_ll(), but with machine words instead of int, if that's a difference. static inline int a_sc_p(void*, void*) - same as a_sc(), but with machine words. With these function we can implement e.g. CAS as: static inline int a_cas(volatile int *p, int t, int s) { int v; do { v = a_ll(p); if (v != t) break; } while (!a_sc(p, s)); return v; } Add some #ifdefs to only activate the pointer variations if they're needed (i.e. if we're on 64 bits) and Bob's your uncle. The only hardship would be in implementing a_sc(), but that can be solved by using a feature often referenced but rarely seen in the wild: ASM goto. How that works is that, if the arch's SC instruction returns success or failure in a flag and the CPU can jump on that flag (unlike, say, microblaze, which can only jump on comparisons), then you encode the jump in the assembly snippet but let the compiler handle the targets for you. Since in all cases, we want to jump on failure, that's what the assembly should do, so for instance for PowerPC: static inline int a_sc(volatile int* p, int x) { __asm__ goto ("stwcx. %0, 0, %1\n\tbne- %l2" : : "r"(x), "r"(p) : "cc", "memory" : fail); return 1; fail: return 0; } I already tried the compiler results for such a design, but I never tried running it for lack of hardware. Anyway, this code makes it possible for the compiler to redirect the conditional jump on failure to the top of the loop in a_cas(). Since the return value isn't used otherwise, the values 1 and 0 never appear in the generated assembly. What do you say to this design? Ciao, Markus
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.