musl - Re: crypt_blowfish integration, optimization

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120810171803.GB29839@openwall.com>
Date: Fri, 10 Aug 2012 21:18:03 +0400
From: Solar Designer <solar@...nwall.com>
To: musl@...ts.openwall.com
Subject: Re: crypt_blowfish integration, optimization

On Thu, Aug 09, 2012 at 06:32:59PM -0400, Rich Felker wrote:
> On Fri, Aug 10, 2012 at 02:21:03AM +0400, Solar Designer wrote:
> > Hmm, for me "gcc -Q -O2 --help=optimizers" and ditto for -O3 both show
> > "disabled" for -funroll-loops.  Why was the loop unrolled for you?
> 
> Not sure. I've found -Q --help=optimizers completely unreliable in the
> past though. It only reports minimal differences between -Os, -O2, and
> -O3, and trying to start with -O3 and reproduce -Os by just changing
> the options that are different does not give effects even remotely
> similar to -Os.

Frankly, this matches my experience.  OK, -Q --help=optimizers is
unreliable.  But is -O3 supposed to include -funroll-loops now?  Does
it?  Or did you get loop unrolling done for some other reason?  I think
this needs to be understood by us.

> > As discussed, the problem with avoiding such hand-unrolls is that the
> > compiler doesn't know just which loops are most important to unroll.
> 
> My experience has been that it tends to make good decisions overall,

Yes, good decisions overall - like measured in terms of geometric mean
or median for performance change across many functions (I wrote a script
called relbench that reports such measurements for JtR builds) - but
sometimes poor decisions for individual performance-critical functions.
So hand-unrolling in those special cases helps.

> and that if somebody is using -Os, they really want smallest size, not
> performance.

Maybe, however:

So far, -Os was often providing good performance as well, on par with -O2.
IIRC, in the relbench tests mentioned above, it was 92% of -O2 on gcc 4.6
on x86_64 for the geometric mean across about 150 separate benchmark
results, but in some cases -Os code was actually faster than -O2.

So someone using -Os may want nearly optimal code that is also slightly
smaller.  If for some function we get a more than ~8% hit with -Os vs.
-O3 (or whatever does the unrolling), this means that the function could
use some hand-optimization to fix that.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.