|
Message-ID: <CAK1hOcOaN4SnpO2jMGib3tFEf+c8=Tu8Nwi2YnOhzefpSSqTng@mail.gmail.com>
Date: Tue, 17 Feb 2015 17:51:11 +0100
From: Denys Vlasenko <vda.linux@...glemail.com>
To: Rich Felker <dalias@...c.org>
Cc: musl <musl@...ts.openwall.com>
Subject: Re: [PATCH] x86_64/memset: use "small block" code for blocks
up to 30 bytes long
On Tue, Feb 17, 2015 at 5:12 PM, Rich Felker <dalias@...c.org> wrote:
> On Tue, Feb 17, 2015 at 02:08:52PM +0100, Denys Vlasenko wrote:
>> >> Please see attached file.
>> >
>> > I tried it and it's ~1 cycle slower for at least sizes 16-30;
>> > presumably we're seeing the cost of the extra compare/branch at these
>> > sizes but not at others. What does your timing test show?
>>
>> See below.
>> First column - result of my2.s
>> Second column - result of vda1.s
>>
>> Basically, the "rep stosq" code path got a bit faster, while
>> small memsets stayed the same.
>
> Can you post your test program for me to try out? Here's what I've
> been using, attached.
With your program I see similar results:
...
size 50: min=10, avg=10 min=10, avg=10
size 52: min=10, avg=10 min=10, avg=10
size 54: min=10, avg=11 min=10, avg=11
size 56: min=10, avg=11 min=10, avg=11
size 58: min=10, avg=11 min=10, avg=10
size 60: min=10, avg=10 min=10, avg=12
size 62: min=10, avg=10 min=10, avg=11
size 64: min=18, avg=18 min=18, avg=22
size 96: min=17, avg=17 min=18, avg=18
size 128: min=31, avg=32 min=32, avg=32
size 160: min=35, avg=37 min=33, avg=37
size 192: min=40, avg=40 min=36, avg=37
size 224: min=43, avg=43 min=40, avg=40
size 256: min=44, avg=47 min=43, avg=43
size 288: min=47, avg=48 min=46, avg=47
size 320: min=50, avg=52 min=52, avg=52
size 352: min=53, avg=54 min=52, avg=60
size 384: min=56, avg=57 min=55, avg=57
size 416: min=59, avg=60 min=62, avg=63
size 448: min=63, avg=65 min=66, avg=66
size 480: min=66, avg=71 min=69, avg=69
size 512: min=73, avg=74 min=73, avg=76
size 1024: min=127, avg=129 min=127, avg=129
size 2048: min=221, avg=236 min=221, avg=236
size 4096: min=425, avg=444 min=424, avg=450
size 8192: min=831, avg=881 min=830, avg=883
size 16384: min=1644, avg=1717 min=1643, avg=1748
My test program is attached, I use:
gcc -O2 -Wall memset-cycles.c FOO.s
View attachment "t.c" of type "text/x-csrc" (3388 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.