musl - Re: Re: [PATCH] Added ARM optimised memcpy implementation

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPfzE3ZA5i1Ag75CaZRTCa_pvf9-=7kPmLhE6RZt+ZucD4N46w@mail.gmail.com>
Date: Sun, 3 Mar 2013 09:28:23 +1300
From: Andre Renaud <andre@...ewatersys.com>
To: musl@...ts.openwall.com
Subject: Re: Re: [PATCH] Added ARM optimised memcpy implementation

On 1 March 2013 20:26, Szabolcs Nagy <nsz@...t70.net> wrote:
> * nwmcsween@...il.com <nwmcsween@...il.com> [2013-02-28 19:14:28 -0800]:
>> Hmm what does this do that builtins cannot? What I'm asking is why is this more preformant than preload + word-at-a-time.
>>
>
> musl uses naive memcpy if src and dst are not congruent (src%4 != dst%4)
>
> the android asm takes care of that by fetching a 32bytes
> from src into registers and dumping it into dst with
> apropriate shifts
>
> and in the congruent case 32byte alignment is used (cacheline aligned)

It also takes advantage of the arm load/store multiple instructions,
allowing it to manipulate up to 8 32-bit words with a single
instruction.

Regards,
Andre

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.