|
Message-ID: <CAPfzE3a8hRbpcmD55D-y9nwaJn6YaD7BA9dhxM7OkwpnHeEc5w@mail.gmail.com> Date: Fri, 12 Jul 2013 15:36:42 +1200 From: Andre Renaud <andre@...ewatersys.com> To: musl@...ts.openwall.com Subject: Re: Thinking about release > I was unable to measure any difference in performance of your version > with the prefetch hack versus simply: > > __asm__ __volatile__( > "ldmia %1!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t" > "stmia %0!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t" > : "+r"(d), "+r"(s) : > : "a4", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "memory"); What kind of machine were you using? I see a change of 115MB/s -> 105MB/s when I drop the prefetch, even using the code that you suggested. This is on an Atmel AT91sam9g45 (ARM926ejs @ 400MHz). I'm assuming this is some subtlety about how the cache is operating? Sticking the ldrhi back in brings the speed back, ie: __asm__ __volatile__( "ldmia %1!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t" "ldrhi r12, [%1]\n" "stmia %0!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t" : "+r"(d), "+r"(s) : : "a4", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "r12", "memory"); Regards, Andre
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.