|
Message-ID: <20130420232753.GA863@openwall.com>
Date: Sun, 21 Apr 2013 03:27:53 +0400
From: Solar Designer <solar@...nwall.com>
To: Tavis Ormandy <taviso@...xchg8b.com>
Cc: john-dev@...ts.openwall.com
Subject: Re: minor raw-sha1-ng pull request
Tavis, magnum -
On Fri, Apr 19, 2013 at 06:25:50PM -0700, Tavis Ormandy wrote:
> Thanks for the explanation Magnum, I get similar results! I can restructure
> cmp_all so it's also omp safe, I sent you a pull request for that. It get's
> anoter 2000K c/s on my machine.
Thanks!
The attached patch replaces the heavy "#pragma omp atomic" with much
lighter OpenMP reduction for the bitwise OR. I've checked the OpenMP 2.5
spec (from 2005) - bitwise OR was already supported in the reduction
clause, so I think we're good in terms of portability.
Also, I get better speeds at high thread counts when OMP_SCALE is much
larger - not the current 32, but 1024 or even 10240. With 32, there's a
performance regression when going from 4 to 8 threads on FX-8120. With
1024, there's slight speedup. With 10240, it's roughly 50M vs. 60M c/s
for 4 vs. 8 threads. All of these numbers are quite low, though, given
that 1 thread does 29M, and 2 threads do 44M. Unfortunately, this is as
expected for a fast hash like this being parallelized at this level.
We'll deal with this separately, with parallelization at a higher level.
rotateright() and rotateleft() should probably be dropped. Only
rotateleft() is used, and not in a performance-critical place.
Moreover, it is probably slower than what gcc would generate on its own
(it uses the rol %cl,reg form of the instruction, whereas gcc would use
one with immediate shift count).
Thanks again,
Alexander
View attachment "john-rawSHA1_ng_fmt-omp-reduction.diff" of type "text/plain" (758 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.