Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20150530002433.GA10980@openwall.com>
Date: Sat, 30 May 2015 03:24:33 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: bitslice SHA-256

On Sat, May 30, 2015 at 01:59:46AM +0200, magnum wrote:
> On 2015-05-29 20:13, Alain Espinosa wrote:
> >Hand-crafted AVX2 assembly code done for "normal" SHA256. Performance
> >in a core i5-4670 3.4GHz, single thread:
> >
> >- 23.7 millions keys per second. 87% faster than the bitslice one
> >with AVX2 intrinsics.
> 
> Alain, Solar,
> 
> The bitslice track is very interesting, but on a side note: What's the 
> main cause for this huge difference between normal SHA256 implemented in 
> assembly versus intrinsics? Perhaps the optimizer make some poor 
> choices? Could we learn something from analyzing compiled intrinsics and 
> tweak the source a little?

Perhaps.

> OTOH I think the JtR implementation of SHA256 is a lot faster than 12.5M 
> keys/s - benchmarking on well (i7-4770K 3.5GHz) shows over 19M. but we 
> might not compare apples to apples.

i5-4670 is documented to have max turbo at 3.8 GHz, i7-4770K has it at
3.9 GHz (confirmed by my own testing).  When comparing single thread
speeds on otherwise idle CPUs we get:

(23.7/3.8) / (19.3/3.9) = 1.26

so Alain's assembly code is 26% faster than our intrinsics.

Aleksey recently reported a 22% speedup for raw-sha256 relative to
what's committed in jumbo, using his john-devkit to generate a
C+intrinsics replacement raw-sha256 format.  This is on i7-3770.
He did not report the specific speed figures yet, just the percentage.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.