|
Message-ID: <4F05FE4F.2030808@hushmail.com> Date: Thu, 05 Jan 2012 20:47:27 +0100 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: gcc versions On 01/05/2012 07:13 PM, Solar Designer wrote: > On Thu, Jan 05, 2012 at 10:01:59PM +0400, Solar Designer wrote: >> gcc version: gcc (gcc version 3.4.6) >> Best paras: >> raw-MD4: 1 (13852K c/s) >> crypt-MD5: 3 (9808 c/s) >> raw-SHA1: 1 (6500K c/s) > > BTW, right now your tree picks PARA 1 for this version of gcc, so we get: > > Benchmarking: FreeBSD MD5 [SSE2i 4x]... DONE > Raw: 4400 c/s real, 4400 c/s virtual > > which is twice slower than 1.7.9's original code (with the same gcc). Yes, the git tree is not ready in this aspect, more #elif clauses are needed. But this still would not address the issue of AMD wanting totally different figures. And for the 32-bit figures, Actual 32-bit systems seem to show other best para's than a cross compile on a Core2. I haven't had time to do any systematic tests though. > ...and I was wrong about 10000+ c/s - not with this old gcc. Here's > what I actually get with 3.4.6: I'm not sure what you refer to about 10000+ c/s. > Benchmarking: FreeBSD MD5 [32/64 X2]... DONE > Raw: 8998 c/s real, 8998 c/s virtual > > So PARA 3 is in fact a bit faster for this old gcc here, but I am not > comfortable relying on that - it might be a lot slower e.g. on AMD CPUs. > The code is clearly so far from optimal that its performance is likely > unstable across different CPUs. X2 is a safer bet for gcc< 4.0. Yes. I still think we may need to offer an optional tuning option to the user. Like having this testpara target actually output a para.h file, that survives make clean but gets overwritten by another make testpara. > As to MD4 and SHA-1, here's what -jumbo-5 picks with gcc 3.4.6: > > Benchmarking: Raw MD4 [SSE2i 12x]... DONE > Raw: 13100K c/s real, 13100K c/s virtual > > Benchmarking: Raw SHA-1 [SSE2i 4x]... DONE > Raw: 4930K c/s real, 4930K c/s virtual > > I don't know why the 6500K vs. 4930K discrepancy. I thought best.c mostly presents a raw measure of crypt_all() (and this is in fact desired here). Vector keybuffer setup has LOTS of impact on the fast formats and this is why I've been concentrating on optimising some formats' set_key() lately. My laptop presents 44000K c/s for raw-MD4 in testpara while the real format benchmark gets just 24500K. BTW, at one point I did both raw-MD5 and crypt-MD5 in testpara and they did not always come up with the same figure. Doh! I dropped the raw test just as a convenient way to ignore this problem... magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.