john-dev - Re: interleaving on GPUs

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150824024410.GB20884@openwall.com>
Date: Mon, 24 Aug 2015 05:44:10 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: interleaving on GPUs

On Sun, Aug 23, 2015 at 11:19:08PM +0200, magnum wrote:
> On 2015-08-23 23:10, magnum wrote:
> >On 2015-08-23 23:05, magnum wrote:
> >>On 2015-08-23 07:08, Solar Designer wrote:
> >>>Note that they explicitly mention "processing several data items
> >>>concurrently per thread".  So it appears that when targeting Kepler, up
> >>>to 2x interleaving at OpenCL kernel source level could make sense.
> >>
> >>Shouldn't simply using vectorized code (eg. using uint2) result in just
> >>the interleaving we want (on nvidia)?

With my current understanding of the extent to which we're stuck with
the pure SIMT model, yes, uint2 should be similar to 2x interleaving.

> >I tried PBKDF2-HMAC MD4, MD5 and SHA-1 but they all lost some performance.
> 
> The loss I saw might have been because my laptop Kepler is too slow so 
> auto-tune doesn't let it run optimally.

How much did they lose on your laptop?

> Here's super's Titan:
> 
> $ ../run/john -test -dev=5 -form:pbkdf2-hmac-md4-opencl
> Device 5: GeForce GTX TITAN
> Benchmarking: PBKDF2-HMAC-MD4-opencl [PBKDF2-MD4 OpenCL]... DONE
> Speed for cost 1 (iterations) of 1000
> Raw:	2933K c/s real, 2892K c/s virtual
> 
> $ ../run/john -test -dev=5 -form:pbkdf2-hmac-md4-opencl -force-vec=2
> Device 5: GeForce GTX TITAN
> Benchmarking: PBKDF2-HMAC-MD4-opencl [PBKDF2-MD4 OpenCL 2x]... DONE
> Speed for cost 1 (iterations) of 1000
> Raw:	3302K c/s real, 3201K c/s virtual
> 
> $ ../run/john -test -dev=5 -form:pbkdf2-hmac-md5-opencl
> Device 5: GeForce GTX TITAN
> Benchmarking: PBKDF2-HMAC-MD5-opencl [PBKDF2-MD5 OpenCL]... DONE
> Speed for cost 1 (iterations) of 1000
> Raw:	1906K c/s real, 1872K c/s virtual
> 
> $ ../run/john -test -dev=5 -form:pbkdf2-hmac-md5-opencl -force-vec=2
> Device 5: GeForce GTX TITAN
> Benchmarking: PBKDF2-HMAC-MD5-opencl [PBKDF2-MD5 OpenCL 2x]... DONE
> Speed for cost 1 (iterations) of 1000
> Raw:	2199K c/s real, 2169K c/s virtual
> 
> $ ../run/john -test -dev=5 -form:pbkdf2-hmac-sha1-opencl
> Device 5: GeForce GTX TITAN
> Benchmarking: PBKDF2-HMAC-SHA1-opencl [PBKDF2-SHA1 OpenCL]... DONE
> Speed for cost 1 (iterations) of 1000
> Raw:	864804 c/s real, 859488 c/s virtual
> 
> $ ../run/john -test -dev=5 -form:pbkdf2-hmac-sha1-opencl -force-vec=2
> Device 5: GeForce GTX TITAN
> Benchmarking: PBKDF2-HMAC-SHA1-opencl [PBKDF2-SHA1 OpenCL 2x]... DONE
> Speed for cost 1 (iterations) of 1000
> Raw:	718202 c/s real, 703742 c/s virtual
> 
> So there is indeed a speedup for MD4 and MD5 but not for SHA-1 in this case.

Cool!  If the loss on your laptop for MD4 and MD5 is less than the gain
on TITAN, then can we make this the default?

-force-vec=2 doesn't appear to affect md5crypt-opencl.  Why not?  Does
it require some per-format support?

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.