|
|
Message-ID: <501C5D22.1000405@barfooze.de>
Date: Sat, 04 Aug 2012 01:22:10 +0200
From: John Spencer <maillist-musl@...fooze.de>
To: musl@...ts.openwall.com
Subject: Re: Re: musl libc, memcpy
i've setup a perfomance test ( https://github.com/rofl0r/memcpy-test )
these are the average results for i386 (100 runs on big sizes, 10000 on
smaller ones)
asm version current c-version
size: 3 172 ticks 199 ticks
size: 4 167 ticks 167 ticks
size: 5 197 ticks 186 ticks
size: 8 187 ticks 186 ticks
size: 15 195 ticks 196 ticks
size: 16 186 ticks 185 ticks
size: 23 202 ticks 199 ticks
size: 24 193 ticks 188 ticks
size: 25 205 ticks 212 ticks
size: 31 199 ticks 198 ticks
size: 32 195 ticks 192 ticks
size: 33 204 ticks 192 ticks
size: 63 213 ticks 255 ticks
size: 64 219 ticks 226 ticks
size: 65 208 ticks 238 ticks
size: 95 220 ticks 247 ticks
size: 96 214 ticks 239 ticks
size: 97 217 ticks 243 ticks
size: 127 233 ticks 261 ticks
size: 128 225 ticks 254 ticks
size: 129 229 ticks 266 ticks
size: 159 242 ticks 279 ticks
size: 160 235 ticks 268 ticks
size: 161 238 ticks 273 ticks
size: 191 255 ticks 288 ticks
size: 192 264 ticks 288 ticks
size: 193 248 ticks 287 ticks
size: 255 279 ticks 323 ticks
size: 256 266 ticks 313 ticks
size: 257 269 ticks 319 ticks
size: 383 332 ticks 391 ticks
size: 384 308 ticks 370 ticks
size: 385 307 ticks 384 ticks
size: 511 345 ticks 439 ticks
size: 512 315 ticks 434 ticks
size: 513 318 ticks 439 ticks
size: 767 370 ticks 571 ticks
size: 768 330 ticks 555 ticks
size: 769 334 ticks 566 ticks
size: 1023 382 ticks 740 ticks
size: 1024 349 ticks 727 ticks
size: 1025 358 ticks 694 ticks
size: 1535 423 ticks 936 ticks
size: 1536 393 ticks 930 ticks
size: 1537 400 ticks 929 ticks
size: 2048 448 ticks 1176 ticks
size: 4096 822 ticks 2404 ticks
size: 8192 3136 ticks 8310 ticks
size: 16384 6481 ticks 9780 ticks
size: 32768 11645 ticks 19060 ticks
size: 65536 29700 ticks 52051 ticks
size: 131072 307029 ticks 310875 ticks
size: 262144 608502 ticks 617698 ticks
size: 524288 1222116 ticks 1244987 ticks
size: 1048576 2500207 ticks 2712991 ticks
size: 2097152 5279016 ticks 5566665 ticks
size: 4194304 10586333 ticks 10849110 ticks
size: 8388608 21961730 ticks 22473953 ticks
size: 16777216 45966254 ticks 47159258 ticks
size: 33554432 92434464 ticks 95873868 ticks
size: 67108864 189858530 ticks 190456107 ticks
it looks as if the asm version is up to twice as fast, depending on the
size of data copied.
now waiting for the x86_64 version (if you could provide a working 64bit
rdtsc inline asm function, i'll gladly take that as well)
someone on ##asm suggested that movaps with xmm regs was fastest in his
tests.
would be interesting to test such a version as well.
On 08/01/2012 08:19 AM, Rich Felker wrote:
> On Wed, Aug 01, 2012 at 01:40:11AM -0400, Rich Felker wrote:
>> On Wed, Aug 01, 2012 at 12:27:22AM -0400, Rich Felker wrote:
>>> I'm attaching a (possibly buggy; not heavily tested) rep-movsd-based
>>> version. I'd be interested in hearing how it performs.
>> And here is the attachment...
> And here's a version that might be faster; reportedly, rep movsd works
> better when the destination address is aligned.
>
> Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.