|
Message-ID: <CAPfzE3ZA5i1Ag75CaZRTCa_pvf9-=7kPmLhE6RZt+ZucD4N46w@mail.gmail.com> Date: Sun, 3 Mar 2013 09:28:23 +1300 From: Andre Renaud <andre@...ewatersys.com> To: musl@...ts.openwall.com Subject: Re: Re: [PATCH] Added ARM optimised memcpy implementation On 1 March 2013 20:26, Szabolcs Nagy <nsz@...t70.net> wrote: > * nwmcsween@...il.com <nwmcsween@...il.com> [2013-02-28 19:14:28 -0800]: >> Hmm what does this do that builtins cannot? What I'm asking is why is this more preformant than preload + word-at-a-time. >> > > musl uses naive memcpy if src and dst are not congruent (src%4 != dst%4) > > the android asm takes care of that by fetching a 32bytes > from src into registers and dumping it into dst with > apropriate shifts > > and in the congruent case 32byte alignment is used (cacheline aligned) It also takes advantage of the arm load/store multiple instructions, allowing it to manipulate up to 8 32-bit words with a single instruction. Regards, Andre
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.