|
Message-ID: <CAPLrYERP1DO=rfneK+m3dMo+E+AcJrzbjXAry8-r3EZzLqPKiQ@mail.gmail.com> Date: Thu, 9 Apr 2015 08:50:24 +0200 From: Daniel Cegiełka <daniel.cegielka@...il.com> To: musl@...ts.openwall.com Cc: John Mudd <johnbmudd@...il.com> Subject: Re: musl perf, 20% slower than native build? 2015-04-08 22:59 GMT+02:00 Paul Schutte <sjpschutte@...il.com>: > Hi Daniel, > > Pardon my stupidity, but with what did you replace the memcpy ? I use memcpy more suited to my CPU. memcpy latency was very important for me because it had a big impact on the total latency (in my code). I suppose that most of the problems with latency will have its cause in musl's memcpy. This is quite a complex topic, because the memcpy's optimal code depends on how large blocks of memory will be copied. Sometimes faster will be SSE2 and sometimes AVX2, but heavily optimized code is not portable (eg AVX2) and this is a problem. Fast memcpy implementations usualy uses CPUID to choose the right code, but such code is blown and ugly. Daniel > Regards > Paul > > On Wed, Apr 8, 2015 at 9:28 PM, Daniel Cegiełka <daniel.cegielka@...il.com> > wrote: >> >> 2015-04-08 21:10 GMT+02:00 John Mudd <johnbmudd@...il.com>: >> >> > Here's output from perf record/report for libc. This looks consistent >> > with >> > the 5% longer run time. >> > >> > native: >> > 2.20% python libc-2.19.so [.] __memcpy_ssse3 >> >> > >> > musl: >> > 4.74% python libc.so [.] memcpy >> >> I was able to get twice speed-up (in my code) just by replacing memcpy >> in the musl. >> >> Daniel > >
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.