Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPfzE3bLXQnybVyJXHYUzjJ4Dj3KVphRaooVyBXO3qQqWO1TbQ@mail.gmail.com>
Date: Sun, 3 Mar 2013 09:33:54 +1300
From: Andre Renaud <andre@...ewatersys.com>
To: musl@...ts.openwall.com
Subject: Re: ARM optimisations

On 3 March 2013 00:34, Szabolcs Nagy <nsz@...t70.net> wrote:
> * Rob Landley <rob@...dley.net> [2013-03-01 22:33:19 -0600]:
>> I'd actually say that armv5 is probably the one to optimize for,
>> because it's somewhere over 80% of the installed base of arm systems
>> and generally provides an additonal 25% speedup from armv4 to armv5.
>> Anything lower than that can use C, anything newer than that can
>> benefit from an armv5 version vs C.
> ...
>> I believe armv6 was mostly just SMP extensions, so not worth
>> optimizing memcpy for. armv7 is nice but not uibiquitous the way
>> armv5 is, and armv7 brings with it the "thumb2" instruction set
>> which means you'd need 2 versions depending on what target you
>> wanted to compile for...
>
> a quick research shows that
>
> glibc has ifdefs for armv5te and armv4t optimizations
> http://sourceware.org/git/?p=glibc.git;a=blob;f=ports/sysdeps/arm/memcpy.S
>
> linaro has armv7 optimized version
> http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/view/head:/src/linaro-a9/memcpy.S
>
> olibc (the bionic one not the openbsd one) has armv7+neon optimized memcpy
> https://github.com/olibc/olibc/blob/master/libc/arch-arm/bionic/memcpy.S

The bionic code uses a couple of pre-processor tricks to combine the
ARMv4 & ARMv5 code, specifically around the PLD and CALIGN
instructions. Since (I assume) bionic is built at compile time for a
specific CPU, it is relatively easy to do these, however I got the
impression (and may be mistaken) that we were trying to avoid compile
time CPU detection in favour of run-time CPU detection. If that is the
case, then you would need two separate implementations (possibly with
some code sharing), and I thought that the overall code-size bloat
that this would bring wouldn't be worth it. This is especially true
when you talk about ARM NEON/v7, as it is essentially completely
different, so you'd end up with somewhere between 300% & 500% code
size increase on ARM to support all three platforms (based on the
current implementation going from 1k to 1.5k when I used the ASM
optimised version).

Having said all that, I do tend to agree that the ARMv4 platforms are
relatively archaic, and simply not having an optimised version for
them could be an acceptable alternative. ARMv5t is probably still too
popular to ignore.

Regards,
Andre

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.