Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120809031639.GM27715@brightrain.aerifal.cx>
Date: Wed, 8 Aug 2012 23:16:40 -0400
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: crypt* files in crypt directory

On Wed, Aug 08, 2012 at 04:08:10PM -0700, Isaac Dunham wrote:
> On Wed, 8 Aug 2012 17:48:55 -0400
> Rich Felker <dalias@...ifal.cx> wrote:
> 
> > > > Maybe you could support -DFAST_CRYPT or the like.  It could enable
> > > > forced inlining and manual unrolls in crypt_blowfish.c.
> ...
> > Unless there's a really compelling reason to do so, I'd like to avoid
> > having multiple alternative versions of the same code in a codebase.
> > It makes it so there's more combinations you have to test to be sure
> > the code works and doesn't have regressions.
> > 
> > As it stands, the code I posted with the manual unrolling removed
> > performs _better_ than the manually unrolled code with gcc 4 on x86_64
> > when optimized for speed, and it's 33% smaller when optimized for
> > size.
> 
> Per your own tests?
> I say this because the test previously mentioned shows the
> opposite:

OK, I misread the units as c=cycles and s=?? instead of c=crypts and
s=sec. But of course that doesn't make sense..

> > > The impact on x86-64 is less.  With Ubuntu 12.04's gcc 4.6.3 on
> > > FX-8120 I get 490 c/s for the original code, 450 c/s for your code
> > > without inlining/unrolling, and somehow only 430 c/s with
> > > -finline-functions -funroll-loops.  
> 
> that's :
> Raw	%speed	version
> 490 c/s	100%	original
> 450 c/s	92%	rich's version
> 430 c/s	88%	rich's version, unrolled by compiler
> Higher is faster.
> IE, unrolling is actually slowing your version down more.
> 
> GCC 3/x86 is getting 80% with rich's version, optimized.
> 
> Also, how much "bloat" does solar designer's proposal (unroll inside
> BF_body) add?

Source bloat, even worse than either version. It requires completely
duplicating the whole function (once unrolled, once straight). I have
no idea how much binary bloat it adds; anybody care to try it? My
principal hesitation to even go there is that it (1) makes really ugly
source bloat, and (2) perhaps cuts the binary bloat savings in half or
even worse, making the savings marginal and arguably no longer worth
the cost of the source bloat from having 2 copies of the same code.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.