|
Message-ID: <20130731032315.GA221@brightrain.aerifal.cx> Date: Tue, 30 Jul 2013 23:23:15 -0400 From: Rich Felker <dalias@...ifal.cx> To: Harald Becker <ralda@....de> Cc: musl@...ts.openwall.com Subject: Re: ARM memcpy post-0.9.12-release thread On Wed, Jul 31, 2013 at 05:13:47AM +0200, Harald Becker wrote: > Hi Rich ! > > 30-07-2013 22:26 Rich Felker <dalias@...ifal.cx>: > > > Some rough times (128k copy repeated 10000 times): > > > > Aligned case: > > Current C code: 1.2s > > My best-attempt C code: 0.75s > > My best-attempt inline asm: 0.57s > > Bionic asm: 0.63s > > Bionic asm without prefetch: 0.57s > > > > Misaligned case: > > Current C code: 4.7s > > My best-attempt inline asm: 2.9s > > Bionic asm: 1.1s > > I like to throw in a question, as my cent to this topic: > > Does modern C Compiler not try to align all data types? So > following this path in most cases aligned data structures are > used and copying them around usually hit the aligned case. The Yes but these are small anyway and the compiler will be generating inline code to copy them with ldmia/stmia. > misaligned case happens mostly due to working with strings, and > those are usually short. Can't we consider other misaligned cases > violation of the programmer or code generator? If so, I would > prefer the best-attempt inline asm versions of code or even > best attempt C code over arch specific asm versions ... and add Part of the problem discussed on #musl was that I was having to be really careful with "best attempt C" since GCC will _generate_ calls to memcpy for some code, even when -ffreestanding is used. The folks on #gcc claim this is not a bug. So, if compilers deem themselves at liberty to make this kind of transformation, any C implementation of memcpy that's not intentionally crippled (e.g. using volatile temps and 20x slower than it should be) is a time-bomb that might blow up on us with the next GCC version... This makes asm (either inline or standalone) a lot more appealing for memcpy than it otherwise would be. > a warning for performance lose on misaligned data in > documentation, with giving a rough percentage of this lose. You'd prefer video processing being 4 to 5 times slower? Video typically consists of single-byte samples (planar YUV) and operations like cropping to a non-multiple-of-4 size, motion compensation, etc. all involve misaligned memcpy. Same goes for image transformations in gimp, image blitting in web browsers (not necessarily aligned to multiple-of-4 boundaries unless you're using 32bpp), etc... Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.