Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPfzE3aoD4mpO9RrV-enuXxkCvMPY_7rEE6e9w8NuX-ntEqtqA@mail.gmail.com>
Date: Thu, 11 Jul 2013 17:10:41 +1200
From: Andre Renaud <andre@...ewatersys.com>
To: musl@...ts.openwall.com
Cc: Rich Felker <dalias@...ifal.cx>
Subject: Re: Thinking about release

> I can't see any obvious reason why this shouldn't work, although the
> assembler as it stands makes pretty heavy use of all the registers,
> and I can't immediately see how to rework it to free up 2 more (I can
> free up 1 by dropping the attempted preload). Given my (lack of)
> skills with ARM assembler, I'm not sure I'll be able to look too
> deeply into either of these options, but I'll have a go at the inline
> ASM version to force 8*4byte loads to see if it improves things.

I've given it a bit of a go, and at first it appears to be working
(although I don't exactly have a comprehensive test suite, so this is
very preliminary). Anyone with some more ARM assembler experience is
welcome to chip in with a comment.

I also managed to mess up my last set of benchmarking - I'd indicated
that I got 65 vs 95 vs 105, however I'd stuffed up the fact that the
first call would have poor cache performance. Once I corrected that
the results have become more like 65(naive) vs 105(typedef) vs
113(asm).

Using the below code, it becomes 65(naive), 113(inline asm), 113(full
asm). So the inline is able to do perform as we'd expect. Assuming
that it is technically correct (which is probably the biggest
question).

#define SS (8 * 4)
#define ALIGN (SS - 1)
void * noinline my_asm_memcpy(void * restrict dest, const void *
restrict src, size_t n)
{
    unsigned char *d = dest;
    const unsigned char *s = src;

    if (((uintptr_t)d & ALIGN) != ((uintptr_t)s & ALIGN))
        goto misaligned;

    for (; ((uintptr_t)d & ALIGN) && n; n--) *d++ = *s++;
    if (n) {
        for (; n>=SS; n-= SS) {
                __asm__("ldmia %0, {r4-r11}"
                                : "=r" (s)
                                : "0" (s)
                                : "r4", "r5", "r6", "r7", "r8", "r9",
"r10", "r11");
                s+=SS;
                __asm__("stmia %0, {r4-r11}"
                                : "=r" (d)
                                :"0" (d));
                d+=SS;
        }

misaligned:
        for (; n; n--) *d++ = *s++;
    }
    return dest;
}

Regards,
Andre

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.