|
Message-ID: <20130731061858.07c30257@ralda.gmx.de> Date: Wed, 31 Jul 2013 06:18:58 +0200 From: Harald Becker <ralda@....de> To: Rich Felker <dalias@...ifal.cx> Cc: musl@...ts.openwall.com Subject: Re: ARM memcpy post-0.9.12-release thread Hi Rich ! 30-07-2013 23:23 Rich Felker <dalias@...ifal.cx>: > > misaligned case happens mostly due to working with strings, > > and those are usually short. Can't we consider other > > misaligned cases violation of the programmer or code > > generator? If so, I would prefer the best-attempt inline asm > > versions of code or even best attempt C code over arch > > specific asm versions ... and add > > Part of the problem discussed on #musl was that I was having to > be really careful with "best attempt C" since GCC will > _generate_ calls to memcpy for some code, even when > -ffreestanding is used. The folks on #gcc claim this is not a > bug. So, if compilers deem themselves at liberty to make this > kind of transformation, any C implementation of memcpy that's > not intentionally crippled (e.g. using volatile temps and 20x > slower than it should be) is a time-bomb that might blow up on > us with the next GCC version... I never deal with the details of this type of gcc code generation, but doesn't this only happen on small and structure copies? Structure copies which shall usually be aligned? So if they are aligned the simpler version saves code space. > This makes asm (either inline or standalone) a lot more > appealing for memcpy than it otherwise would be. Optimization is always a question of decision, which I consider the hard part of the job ... :( > > a warning for performance lose on misaligned data in > > documentation, with giving a rough percentage of this lose. > > You'd prefer video processing being 4 to 5 times slower? No, definitely not, but video processing is one of the cases I consider candidate for optimized processing. So such projects shall include an optimize version of of low level processing functions (including memcpy, but not only - candidate for library with optimized functions?). > Video typically consists of single-byte samples (planar YUV) and > operations like cropping to a non-multiple-of-4 size, motion > compensation, etc. all involve misaligned memcpy. Same goes for > image transformations in gimp, image blitting in web browsers > (not necessarily aligned to multiple-of-4 boundaries unless > you're using 32bpp), etc... You are all right, but the programmer shall know of this and consider to use appropriate functions. You can write the code for those parts which need the speed in a way, which call optimized functions. A way which usually does not conflict with gcc self inserted calls. So this self inserted calls usually hit the aligned scope, or the programmer did not behave well (not the compiler). -- Harald
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.