|
Message-Id: <3465D09F-8C78-4120-8C7A-6A9E75712E9E@gmail.com> Date: Wed, 9 Sep 2015 23:43:41 +0800 From: Lei Zhang <zhanglei.april@...il.com> To: john-dev@...ts.openwall.com Subject: Re: SHA-1 H() On Sep 8, 2015, at 4:47 PM, Solar Designer <solar@...nwall.com> wrote: > > Lei, > > On Tue, Sep 08, 2015 at 03:04:57PM +0800, Lei Zhang wrote: >> On Sep 2, 2015, at 11:20 PM, Solar Designer <solar@...nwall.com> wrote: >>> >>> Lei, will you test/benchmark on NEON and AltiVec once magnum commits the >>> fixes, please? >> >> On AltiVec (4xOMP): > > Is this 4 threads likely across different CPU cores? I think so. The benchmark results just fluctuated too bad when I utilize the maximum number of hardware threads, so I switched to a small number of threads, without binding them to a specific core though. > What we need for benchmarking is the maximum number of threads supported > in hardware on a certain number of CPU cores (on 1 core is OK if you > can't reliably use the entire machine's cores). So on POWER8 I guess > you'll run 8 threads all locked to one physical CPU core. You should be > able to do that with OpenMP env vars (affinity). I'll post the updated results later. >> On NEON (2xOMP): >> >> [before] >> pbkdf2-sha1: 578 c/s real, 289 c/s virtual >> pbkdf2-sha256: 276 c/s real, 138 c/s virtual >> pbkdf2-sha512: 125 c/s real, 62.7 c/s virtual >> >> [after] >> pbkdf2-sha1: 501 c/s real, 250 c/s virtual >> pbkdf2-sha256: 276 c/s real, 138 c/s virtual >> pbkdf2-sha512: 125 c/s real, 62.7 c/s virtual >> >> There's no significant change on Altivec, > > OK, but you need to run 8 threads/core benchmarks. Why? Our ZedBoard has only two cores. >> while SHA1 somehow gets slower on NEON. > > It might need higher interleaving factor now. You haven't even tried > introducing interleaving for these archs, have you? (I don't recall.) No, I haven't. I'll put this on my todo list. > Also, as I suggested in the "MD5 on XOP, NEON, AltiVec" thread: > > "[...] we'll need to revise MD5_I in simd-intrinsics.c to use [...] > the obvious expression with OR-NOT on NEON and AltiVec (IIRC, those > archs have OR-NOT, which might be lower latency than select)." I just checked the manuals. NEON does support OR-NOT, but AltiVec seems to only support NOT-OR (~(a|b)). So only NEON can benefit from this optimization perhaps. Lei
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.