|
Message-ID: <CABh=JREW+Ouk_RyVnHqpNG=2_Y+oqoJ9ivF1=OEw8tE6p_5cpg@mail.gmail.com>
Date: Thu, 15 Mar 2012 11:46:09 +0200
From: Milen Rangelov <gat3way@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: AMD Bulldozer and XOP (was: RAR format finally proper)
Hello Alexander,
That's as expected. In fact, you're lucky that the FX is not slower in
> your case.
>
> The 6-core (and 8-core for FX-81x0) is actually 3-module (or
> 4-module, respectively), where each module has two sets of register
> files (like with Intel's Hyperthreading) and two sets of integer ALUs
> and AGUs (that's a new thing compared to Intel's Hyperthreading), but
> only a shared set of other execution units (including vector). So when
> you primarily use the vector ops, the CPU is effectively 3-core (or
> 4-core for FX-81x0) with SMT.
>
>
> http://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)#Bulldozer_core_.28module.29
>
>
Should have read that before I spent EUR 250 for a new CPU and motherboard,
I feel pissed now eheh.
In fact it was a bit faster for MD5 and MD4, a bit slower for SHA1 and
almost the same speed for DES-based hashes.
> Yeah, I was planning to try that in JtR as well, but didn't get around
> to it yet. It's good news that this worked well for you.
>
Actually the quoted improvement percentage was not correct. I did some more
improvements (like e.g using SSE3 shuffle to speed up the byte order
reversals in SHA1 and optimizing a bit the early checks). What I got after
those:
MD5 single hash: 128M c/s with SSE2 -> 181M c/s with XOP
NTLM single hash: 114M c/s with SSE2 -> 162M c/s with XOP
SHA1 single hash: 39M c/s with SSE2 -> 63M c/s with XOP
(on PhenomII X4 it used to be around 42M c/s)
All those numbers are singlehash. This skips slow bitmap lookups and I also
do an early check several steps in advance (but no MD5/MD4 step reversals
as it is incompatible with my design).
I noticed multihash MD5 improved a lot with the new CPU - I get about 120M
c/s as compared to ~65M c/s on PhenomII X4.
> They're Roman's, not mine; my role was to choose the versions producing
> more optimal code (considering register pressure and parallelism).
>
> That's puzzling indeed. XOP does provide some decent speedup over AVX
> for bitslice DES in JtR when benchmarked on the same Bulldozer CPU. You
> can see the numbers here: http://openwall.info/wiki/john/benchmarks
>
> 18527K / 14247K 128/128 BS XOP-16
> vs.
> 16442K / 12792K 128/128 BS AVX-16
>
> 4700K / 4418K 128/128 BS XOP-16
> vs.
> 3951K / 3786K 128/128 BS AVX-16
>
> These are for "FX-8120 o/c 3.6 GHz + turbo". These were
> user-contributed numbers. Somehow I am getting numbers similar to the
> above on my FX-8120 without any overclocking (well, maybe only slightly
> lower numbers). I'll do more benchmarking with different clock rates
> and update the wiki later. BTW, Core i7-2600 at stock clock rate
> (3.4 GHz + turbo) is faster at these despite of only having AVX:
>
> 22773K / 18284K 128/128 BS AVX-16 (8 threads)
> 5802K / 5491K 128/128 BS AVX-16 (non-OpenMP)
>
Well, I tried using 128-bit xmm instructions, not the ymm version (which
would require a lot of rework). Speed with both the Kwan's sboxes/SSE2 and
Roman's with XOP is about 5M c/s (6 threads). I believe the problem is
somewhere else (my key/block setup to blame perhaps).
> FX-8120 has slight advantage over Core i7-2600 at ALU-heavy code such as
> bcrypt, though: approx. 5500 c/s vs. 4800 c/s for 8 threads. These
> correspond to approx. 5.8x vs. 5.1x increase over single thread speed.
>
I don't support bcrypt yet :(
> What specific speeds did you get, and for what hash types?
>
> For LM hashes, the key setup may easily eat up more time than the DES
> encryption step does.
>
For LM hashes I get some improvement over Kwan's sboxes (42M c/s vs 35M
c/s). I think my bitslice key/block set up is to blame for this. I use some
loops with _mm_movemask_epi8 which at first glance look neat and fast,
however this same PMOVMSKB thing is not that fast indeed. I also suspect I
got a lot of register spills because of my not quite optimal code.
Regards.
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.