musl - Re: Re: [PATCH] x86_64/memset: simple optimizations

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20150210205458.GL23507@brightrain.aerifal.cx>
Date: Tue, 10 Feb 2015 15:54:58 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Re: [PATCH] x86_64/memset: simple optimizations

On Tue, Feb 10, 2015 at 09:52:54PM +0100, Denys Vlasenko wrote:
> On Tue, Feb 10, 2015 at 9:43 PM, Rich Felker <dalias@...ifal.cx> wrote:
> > On Tue, Feb 10, 2015 at 09:27:17PM +0100, Denys Vlasenko wrote:
> >> On Sat, Feb 7, 2015 at 2:06 PM, Rich Felker <dalias@...ifal.cx> wrote:
> >> /* libc has incredibly messy way of doing this,
> >>  * typically requiring -lrt. We just skip all this mess */
> >> static void get_mono(struct timespec *ts)
> >> {
> >>         syscall(__NR_clock_gettime, CLOCK_MONOTONIC, ts);
> >> }
> >
> > FWIW, this is a bad idea; you get syscall overhead in your
> > measurements. If you just use clock_gettime (the function) you'll get
> > vdso results (no syscall).
> 
> I repeat memset 32 times between reading timespamp.
> Thus, even with "small" 20kb memset test
> there are 640kb of writes to L1. This is bit enough
> to make overhead insignificant.

Yes, I agree it's probably okay the way you've structured the test
here; that's why I mentioned it as a "FWIW" rather than an objection
to the results. It was more an aside remark about how this technique
could be problematic in the future. Sorry for not being clear.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.