|
Message-ID: <20150530002433.GA10980@openwall.com> Date: Sat, 30 May 2015 03:24:33 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: bitslice SHA-256 On Sat, May 30, 2015 at 01:59:46AM +0200, magnum wrote: > On 2015-05-29 20:13, Alain Espinosa wrote: > >Hand-crafted AVX2 assembly code done for "normal" SHA256. Performance > >in a core i5-4670 3.4GHz, single thread: > > > >- 23.7 millions keys per second. 87% faster than the bitslice one > >with AVX2 intrinsics. > > Alain, Solar, > > The bitslice track is very interesting, but on a side note: What's the > main cause for this huge difference between normal SHA256 implemented in > assembly versus intrinsics? Perhaps the optimizer make some poor > choices? Could we learn something from analyzing compiled intrinsics and > tweak the source a little? Perhaps. > OTOH I think the JtR implementation of SHA256 is a lot faster than 12.5M > keys/s - benchmarking on well (i7-4770K 3.5GHz) shows over 19M. but we > might not compare apples to apples. i5-4670 is documented to have max turbo at 3.8 GHz, i7-4770K has it at 3.9 GHz (confirmed by my own testing). When comparing single thread speeds on otherwise idle CPUs we get: (23.7/3.8) / (19.3/3.9) = 1.26 so Alain's assembly code is 26% faster than our intrinsics. Aleksey recently reported a 22% speedup for raw-sha256 relative to what's committed in jumbo, using his john-devkit to generate a C+intrinsics replacement raw-sha256 format. This is on i7-3770. He did not report the specific speed figures yet, just the percentage. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.