|
Message-ID: <20150908084725.GA10914@openwall.com> Date: Tue, 8 Sep 2015 11:47:25 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: SHA-1 H() Lei, On Tue, Sep 08, 2015 at 03:04:57PM +0800, Lei Zhang wrote: > On Sep 2, 2015, at 11:20 PM, Solar Designer <solar@...nwall.com> wrote: > > > > Lei, will you test/benchmark on NEON and AltiVec once magnum commits the > > fixes, please? > > On AltiVec (4xOMP): Is this 4 threads likely across different CPU cores? That's no good. What we need for benchmarking is the maximum number of threads supported in hardware on a certain number of CPU cores (on 1 core is OK if you can't reliably use the entire machine's cores). So on POWER8 I guess you'll run 8 threads all locked to one physical CPU core. You should be able to do that with OpenMP env vars (affinity). Please also run non-OpenMP benchmarks (thus, using 1 thread on 1 core only) for reference. > [before] > pbkdf2-sha1: 35840 c/s real, 8982 c/s virtual > pbkdf2-sha256: 14194 c/s real, 3566 c/s virtual > pbkdf2-sha512: 5944 c/s real, 1489 c/s virtual > > [after] > pbkdf2-sha1: 36141 c/s real, 9057 c/s virtual > pbkdf2-sha256: 14336 c/s real, 3592 c/s virtual > pbkdf2-sha512: 5936 c/s real, 1498 c/s virtual Thanks, but why are you testing these 3 hash types? I think we made relevant changes to SHA-1 (optimized H using vcmov() as discussed in this thread), MD5 (ditto, using my newly found expression for I), and MD4 (ditto, realizing that G is the same as SHA-2 Maj). We also revised how vcmov() is emulated and what we do when it is emulated, but this should not affect AltiVec and NEON because those have non-emulated vcmov(). We also adjusted SHA-256's interleaving factor on XOP, but that's just XOP. There should be no change to SHA-256 and SHA-512 on AltiVec and NEON. > On NEON (2xOMP): > > [before] > pbkdf2-sha1: 578 c/s real, 289 c/s virtual > pbkdf2-sha256: 276 c/s real, 138 c/s virtual > pbkdf2-sha512: 125 c/s real, 62.7 c/s virtual > > [after] > pbkdf2-sha1: 501 c/s real, 250 c/s virtual > pbkdf2-sha256: 276 c/s real, 138 c/s virtual > pbkdf2-sha512: 125 c/s real, 62.7 c/s virtual > > There's no significant change on Altivec, OK, but you need to run 8 threads/core benchmarks. > while SHA1 somehow gets slower on NEON. It might need higher interleaving factor now. You haven't even tried introducing interleaving for these archs, have you? (I don't recall.) I think AltiVec probably won't need interleaving if we target modern POWER chips with multiple hardware threads per core, but NEON will. Also, as I suggested in the "MD5 on XOP, NEON, AltiVec" thread: "[...] we'll need to revise MD5_I in simd-intrinsics.c to use [...] the obvious expression with OR-NOT on NEON and AltiVec (IIRC, those archs have OR-NOT, which might be lower latency than select)." I think you should do that before benchmarking and before tuning of the interleaving factors for MD5. Thanks again, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.