Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160110122139.GF2016@debian>
Date: Sun, 10 Jan 2016 13:21:39 +0100
From: Markus Wichmann <nullplan@....net>
To: musl@...ts.openwall.com
Subject: atomic.h cleanup

Hi all,

The development roadmap on the musl wiki lists the ominous point
"atomic.h cleanup" for 1.2.0.

I assume you mean a sort of simplification and unification. I noticed
that for the RISC arch's there are rather liberal amounts of inline
assembly for the atomic operations. And I have always been taught, that
as soon as you start copying code, you are probably doing it wrong.

So first thing I'd do: add a new file, let's call it atomic_debruijn.h.
It contains an implementation of a_ctz() and a_ctz_64() based on the
DeBruijn number. That way, all the architectures currently implementing
a_ctz() in this manner can just include that file, and a lot of
duplicate code goes out the window.

Second thing: We can reduce the inline assembly footprint and the amount
of duplicate code by adding a new file, let's call it atomic_llsc.h,
that implements a_cas(), a_cas_p(), a_swap(), a_fetch_add(), a_inc(),
a_dec(), a_and() and a_or() in terms of new functions that would have to
be defined, namely:

static inline void a_presync(void) - execute any barrier needed before
attempting an atomic operation, like "dmb ish" for arm, or "sync" for
ppc.

static inline void a_postsync(void) - execute any barrier needed
afterwards, like "isync" for PPC, or, again, "dmb ish" for ARM.

static inline int a_ll(int*) - perform an LL on the given pointer and
return the value there. This would be "lwarx" for PPC, or "ldrex" for
ARM.

static inline int a_sc(int*, int) - perform an SC on the given pointer
with the given value. Return zero iff that failed.

static inline void* a_ll_p(void*) - same as a_ll(), but with machine
words instead of int, if that's a difference.

static inline int a_sc_p(void*, void*) - same as a_sc(), but with
machine words.


With these function we can implement e.g. CAS as:

static inline int a_cas(volatile int *p, int t, int s)
{
    int v;
    do {
        v = a_ll(p);
        if (v != t)
            break;
    } while (!a_sc(p, s));
    return v;
}

Add some #ifdefs to only activate the pointer variations if they're
needed (i.e. if we're on 64 bits) and Bob's your uncle.

The only hardship would be in implementing a_sc(), but that can be
solved by using a feature often referenced but rarely seen in the wild:
ASM goto. How that works is that, if the arch's SC instruction returns
success or failure in a flag and the CPU can jump on that flag (unlike,
say, microblaze, which can only jump on comparisons), then you encode
the jump in the assembly snippet but let the compiler handle the targets
for you. Since in all cases, we want to jump on failure, that's what the
assembly should do, so for instance for PowerPC:

static inline int a_sc(volatile int* p, int x)
{
    __asm__ goto ("stwcx. %0, 0, %1\n\tbne- %l2" : : "r"(x), "r"(p) : "cc", "memory" : fail);
    return 1;
fail:
    return 0;
}

I already tried the compiler results for such a design, but I never
tried running it for lack of hardware.

Anyway, this code makes it possible for the compiler to redirect the
conditional jump on failure to the top of the loop in a_cas(). Since the
return value isn't used otherwise, the values 1 and 0 never appear in
the generated assembly.

What do you say to this design?

Ciao,
Markus

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.