|
Message-ID: <501C5D22.1000405@barfooze.de> Date: Sat, 04 Aug 2012 01:22:10 +0200 From: John Spencer <maillist-musl@...fooze.de> To: musl@...ts.openwall.com Subject: Re: Re: musl libc, memcpy i've setup a perfomance test ( https://github.com/rofl0r/memcpy-test ) these are the average results for i386 (100 runs on big sizes, 10000 on smaller ones) asm version current c-version size: 3 172 ticks 199 ticks size: 4 167 ticks 167 ticks size: 5 197 ticks 186 ticks size: 8 187 ticks 186 ticks size: 15 195 ticks 196 ticks size: 16 186 ticks 185 ticks size: 23 202 ticks 199 ticks size: 24 193 ticks 188 ticks size: 25 205 ticks 212 ticks size: 31 199 ticks 198 ticks size: 32 195 ticks 192 ticks size: 33 204 ticks 192 ticks size: 63 213 ticks 255 ticks size: 64 219 ticks 226 ticks size: 65 208 ticks 238 ticks size: 95 220 ticks 247 ticks size: 96 214 ticks 239 ticks size: 97 217 ticks 243 ticks size: 127 233 ticks 261 ticks size: 128 225 ticks 254 ticks size: 129 229 ticks 266 ticks size: 159 242 ticks 279 ticks size: 160 235 ticks 268 ticks size: 161 238 ticks 273 ticks size: 191 255 ticks 288 ticks size: 192 264 ticks 288 ticks size: 193 248 ticks 287 ticks size: 255 279 ticks 323 ticks size: 256 266 ticks 313 ticks size: 257 269 ticks 319 ticks size: 383 332 ticks 391 ticks size: 384 308 ticks 370 ticks size: 385 307 ticks 384 ticks size: 511 345 ticks 439 ticks size: 512 315 ticks 434 ticks size: 513 318 ticks 439 ticks size: 767 370 ticks 571 ticks size: 768 330 ticks 555 ticks size: 769 334 ticks 566 ticks size: 1023 382 ticks 740 ticks size: 1024 349 ticks 727 ticks size: 1025 358 ticks 694 ticks size: 1535 423 ticks 936 ticks size: 1536 393 ticks 930 ticks size: 1537 400 ticks 929 ticks size: 2048 448 ticks 1176 ticks size: 4096 822 ticks 2404 ticks size: 8192 3136 ticks 8310 ticks size: 16384 6481 ticks 9780 ticks size: 32768 11645 ticks 19060 ticks size: 65536 29700 ticks 52051 ticks size: 131072 307029 ticks 310875 ticks size: 262144 608502 ticks 617698 ticks size: 524288 1222116 ticks 1244987 ticks size: 1048576 2500207 ticks 2712991 ticks size: 2097152 5279016 ticks 5566665 ticks size: 4194304 10586333 ticks 10849110 ticks size: 8388608 21961730 ticks 22473953 ticks size: 16777216 45966254 ticks 47159258 ticks size: 33554432 92434464 ticks 95873868 ticks size: 67108864 189858530 ticks 190456107 ticks it looks as if the asm version is up to twice as fast, depending on the size of data copied. now waiting for the x86_64 version (if you could provide a working 64bit rdtsc inline asm function, i'll gladly take that as well) someone on ##asm suggested that movaps with xmm regs was fastest in his tests. would be interesting to test such a version as well. On 08/01/2012 08:19 AM, Rich Felker wrote: > On Wed, Aug 01, 2012 at 01:40:11AM -0400, Rich Felker wrote: >> On Wed, Aug 01, 2012 at 12:27:22AM -0400, Rich Felker wrote: >>> I'm attaching a (possibly buggy; not heavily tested) rep-movsd-based >>> version. I'd be interested in hearing how it performs. >> And here is the attachment... > And here's a version that might be faster; reportedly, rep movsd works > better when the destination address is aligned. > > Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.