|
Message-ID: <CAPfzE3bLXQnybVyJXHYUzjJ4Dj3KVphRaooVyBXO3qQqWO1TbQ@mail.gmail.com> Date: Sun, 3 Mar 2013 09:33:54 +1300 From: Andre Renaud <andre@...ewatersys.com> To: musl@...ts.openwall.com Subject: Re: ARM optimisations On 3 March 2013 00:34, Szabolcs Nagy <nsz@...t70.net> wrote: > * Rob Landley <rob@...dley.net> [2013-03-01 22:33:19 -0600]: >> I'd actually say that armv5 is probably the one to optimize for, >> because it's somewhere over 80% of the installed base of arm systems >> and generally provides an additonal 25% speedup from armv4 to armv5. >> Anything lower than that can use C, anything newer than that can >> benefit from an armv5 version vs C. > ... >> I believe armv6 was mostly just SMP extensions, so not worth >> optimizing memcpy for. armv7 is nice but not uibiquitous the way >> armv5 is, and armv7 brings with it the "thumb2" instruction set >> which means you'd need 2 versions depending on what target you >> wanted to compile for... > > a quick research shows that > > glibc has ifdefs for armv5te and armv4t optimizations > http://sourceware.org/git/?p=glibc.git;a=blob;f=ports/sysdeps/arm/memcpy.S > > linaro has armv7 optimized version > http://bazaar.launchpad.net/~linaro-toolchain-dev/cortex-strings/trunk/view/head:/src/linaro-a9/memcpy.S > > olibc (the bionic one not the openbsd one) has armv7+neon optimized memcpy > https://github.com/olibc/olibc/blob/master/libc/arch-arm/bionic/memcpy.S The bionic code uses a couple of pre-processor tricks to combine the ARMv4 & ARMv5 code, specifically around the PLD and CALIGN instructions. Since (I assume) bionic is built at compile time for a specific CPU, it is relatively easy to do these, however I got the impression (and may be mistaken) that we were trying to avoid compile time CPU detection in favour of run-time CPU detection. If that is the case, then you would need two separate implementations (possibly with some code sharing), and I thought that the overall code-size bloat that this would bring wouldn't be worth it. This is especially true when you talk about ARM NEON/v7, as it is essentially completely different, so you'd end up with somewhere between 300% & 500% code size increase on ARM to support all three platforms (based on the current implementation going from 1k to 1.5k when I used the ASM optimised version). Having said all that, I do tend to agree that the ARMv4 platforms are relatively archaic, and simply not having an optimised version for them could be an acceptable alternative. ARMv5t is probably still too popular to ignore. Regards, Andre
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.