Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPfzE3ZTxynUeJjq7KWijZGhsV==NymW4vqLhnQbEYCXRxVf-g@mail.gmail.com>
Date: Wed, 10 Jul 2013 09:28:21 +1200
From: Andre Renaud <andre@...ewatersys.com>
To: musl@...ts.openwall.com
Subject: Re: Thinking about release

Hi Rich,

> I think that's a reasonable place to begin. I do mildly question the
> relevance of memmove to performance, so if we end up having to do a
> lot of review or changes to get the asm committed, it might make sense
> to leave memmove for later.

I wasn't too sure on memmove, but I've seen a reasonable amount of
code which just uses memmove as standard (rather than memcpy), to
avoid the possibility of overlapping regions. Not a great policy, but
still. I'm fine with dropping it at this stage.

> At first glance, this looks like a clear improvement, but have you
> compared it to much more naive optimizations? My _general_ experience
> with optimized memcpy asm that's complex like this and that goes out
> of its way to deal explicitly with cache lines and such is that it's
> no faster than just naively moving large blocks at a time. Of course
> this may or may not be the case for ARM, but I'd like to know if
> you've done any tests.
>
> The basic principle in my mind here is that a complex solution is not
> necessarily wrong if it's a big win in other ways, but that a complex
> solution which is at most 1-2% faster than a much simpler solution is
> probably not the best choice.

Certainly if there was a more straight forward C implementation that
achieved similar results that would be superior. However the existing
musl C memcpy code is already optimised to some degree (doing 32-bit
rather than 8-bit copies), and it is difficult to convince gcc to use
the load-multiple & store-multiple instructions via C code I've found,
without resorting to pretty horrible C code. It may still be
preferable to the assembler though. At this stage I haven't
benchmarked this - I'll see if I can come up with something.

> It's an open question whether it's better to sync something like this
> with an 'upstream' or adapt it to musl coding conventions. Generally
> musl uses explicit instructions rather than pseudo-instructions/macros
> for prologue and epilogue, and does not use named labels.

Given that most of the other systems do some form of compile time
optimisations (which we're trying to avoid), and that these are not
functions that see a lot of code churn, I don't think it's too bad to
have it adapted to musl's style. I haven't really done that so far.

>> Does anyone have any comments on the suitability of this code, or what
>
> If nothing else, it fails to be armv4 compatible. Fixing that should
> not be hard, but it would require a bit of an audit. The return
> sequences are the obvious issue, but there may be other instructions
> in use that are not available on armv4 or maybe not even on armv5...?

Rob Landley mentioned a while ago that armv4 has issues with the EABI
stuff. Is armv4 a definite lower bound for musl support, as opposed to
armv4t or armv5?

Regards,
Andre

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.