john-dev - Re: interleaving on GPUs

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a38d646288473fc3b4e113c1e8346437@smtp.hushmail.com>
Date: Sun, 23 Aug 2015 23:10:24 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: interleaving on GPUs

On 2015-08-23 23:05, magnum wrote:
> On 2015-08-23 07:08, Solar Designer wrote:
>> I just read this about NVIDIA's Kepler (such as the old GTX TITAN that
>> we have in super):
>>
>> http://docs.nvidia.com/cuda/kepler-tuning-guide/index.html#device-utilization-and-occupancy
>>
>>
>> "Also note that Kepler GPUs can utilize ILP in place of
>> thread/warp-level parallelism (TLP) more readily than Fermi GPUs can.
>> Furthermore, some degree of ILP in conjunction with TLP is required by
>> Kepler GPUs in order to approach peak single-precision performance,
>> since SMX's warp scheduler issues one or two independent instructions
>> from each of four warps per clock.  ILP can be increased by means of, for
>> example, processing several data items concurrently per thread or
>> unrolling loops in the device code, though note that either of these
>> approaches may also increase register pressure."
>>
>> Note that they explicitly mention "processing several data items
>> concurrently per thread".  So it appears that when targeting Kepler, up
>> to 2x interleaving at OpenCL kernel source level could make sense.
>
> Shouldn't simply using vectorized code (eg. using uint2) result in just
> the interleaving we want (on nvidia)? I tested this with some of our
> formats that can optionally run vectorized but they don't seem to gain
> from --force-vector=2.

BTW here's a list of such formats:

$ git grep -l v_width *fmt*c
opencl_encfs_fmt_plug.c
opencl_krb5pa-sha1_fmt_plug.c
opencl_ntlmv2_fmt_plug.c
opencl_office2007_fmt_plug.c
opencl_office2010_fmt_plug.c
opencl_office2013_fmt_plug.c
opencl_pbkdf2_hmac_md4_fmt_plug.c
opencl_pbkdf2_hmac_md5_fmt_plug.c
opencl_pbkdf2_hmac_sha1_fmt_plug.c
opencl_rakp_fmt_plug.c
opencl_sha1crypt_fmt_plug.c
opencl_wpapsk_fmt_plug.c

I tried PBKDF2-HMAC MD4, MD5 and SHA-1 but they all lost some performance.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.