Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F05FE4F.2030808@hushmail.com>
Date: Thu, 05 Jan 2012 20:47:27 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: gcc versions

On 01/05/2012 07:13 PM, Solar Designer wrote:
> On Thu, Jan 05, 2012 at 10:01:59PM +0400, Solar Designer wrote:
>> gcc version: gcc (gcc version 3.4.6)
>> Best paras:
>>    raw-MD4: 1  (13852K c/s)
>>  crypt-MD5: 3  (9808 c/s)
>>   raw-SHA1: 1  (6500K c/s)
>
> BTW, right now your tree picks PARA 1 for this version of gcc, so we get:
>
> Benchmarking: FreeBSD MD5 [SSE2i 4x]... DONE
> Raw:    4400 c/s real, 4400 c/s virtual
>
> which is twice slower than 1.7.9's original code (with the same gcc).

Yes, the git tree is not ready in this aspect, more #elif clauses are 
needed. But this still would not address the issue of AMD wanting 
totally different figures. And for the 32-bit figures, Actual 32-bit 
systems seem to show other best para's than a cross compile on a Core2. 
I haven't had time to do any systematic tests though.

> ...and I was wrong about 10000+ c/s - not with this old gcc.  Here's
> what I actually get with 3.4.6:

I'm not sure what you refer to about 10000+ c/s.

> Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
> Raw:    8998 c/s real, 8998 c/s virtual
>
> So PARA 3 is in fact a bit faster for this old gcc here, but I am not
> comfortable relying on that - it might be a lot slower e.g. on AMD CPUs.
> The code is clearly so far from optimal that its performance is likely
> unstable across different CPUs.  X2 is a safer bet for gcc<  4.0.

Yes. I still think we may need to offer an optional tuning option to the 
user. Like having this testpara target actually output a para.h file, 
that survives make clean but gets overwritten by another make testpara.

> As to MD4 and SHA-1, here's what -jumbo-5 picks with gcc 3.4.6:
>
> Benchmarking: Raw MD4 [SSE2i 12x]... DONE
> Raw:    13100K c/s real, 13100K c/s virtual
>
> Benchmarking: Raw SHA-1 [SSE2i 4x]... DONE
> Raw:    4930K c/s real, 4930K c/s virtual
>
> I don't know why the 6500K vs. 4930K discrepancy.

I thought best.c mostly presents a raw measure of crypt_all() (and this 
is in fact desired here). Vector keybuffer setup has LOTS of impact on 
the fast formats and this is why I've been concentrating on optimising 
some formats' set_key() lately. My laptop presents 44000K c/s for 
raw-MD4 in testpara while the real format benchmark gets just 24500K.

BTW, at one point I did both raw-MD5 and crypt-MD5 in testpara and they 
did not always come up with the same figure. Doh! I dropped the raw test 
just as a convenient way to ignore this problem...

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.