|
Message-ID: <52077217.9070004@gentoo.org> Date: Sun, 11 Aug 2013 13:14:31 +0200 From: Luca Barbato <lu_zero@...too.org> To: musl@...ts.openwall.com Subject: Re: Optimized C memcpy [updated] On 11/08/13 10:13, Rich Felker wrote: >> Unfortunately this case seems to be compiling to a call to memcpy on >> powerpc (but nowhere else I found). So I may need to drop the special >> case for 64-bit alignment. I wish there was some source for knowledge >> of the cases that can trigger gcc's stupidity, though... > > It turns out mips at certain optimization levels is also generating a > memcpy for the structure assignments. I think I just need to drop all > of the structure-assignment tricks and use a mildly unrolled loop with > uint32_t units for the aligned case. This gives much worse performance > on ARM, where gcc fails to generate the proper ldmia/stmia without the > struct, but we have asm we can use for ARM anyway. On other archs, the > struct copy code does not even seem to help. The simple integer loop > works just as well. > > I'll do some more experimenting and probably commit the ARM asm soon, > followed by the C code once I get some better feedback on how it > performs on real machines. What about sprinkling volatile here and there? lu
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.