Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+E3k90MAtR-1S=YP+QScbLPqeg=69vpHFq9ywvqnAro-8SapQ@mail.gmail.com>
Date: Mon, 29 Sep 2014 20:53:25 -0800
From: Royce Williams <royce@...ho.org>
To: john-dev <john-dev@...ts.openwall.com>
Subject: Re: NVIDIA GTX 970 (Maxwell 2 / GM204) opencl benchmarks

On Mon, Sep 29, 2014 at 10:07 AM, magnum <john.magnum@...hmail.com> wrote:

> On 2014-09-29 06:44, Royce Williams wrote:
>
>> Sayantan said that GTX 970 benchmarks might be useful for john-dev, so
>> here are opencl-specific benchmarks for the EVGA model 04G-P4-0972-KR,
>> non-overclocked.
>>
>
> Yes, thanks!
>
>   I wrote a quick wrapper to only run --test for the the *-opencl formats
>>
>
> Maybe you are not aware that you can use --format=opencl which will do
> just that. You can also use --format=cuda or do both with --format=gpu. You
> can even use a wildcard, as in --format=wpapsk* for testing all variants of
> that format, or --format=*crypt for all formats that ends in "crypt".


I was definitely not aware.  I really reinvented the wheel on that one.
Thanks for the clue.


>  The test system's current CPU (Sempron 145, sorry) is woefully
>> underpowered for any CPU-fed formats.  A better chip is on its way,
>> and I can test again if needed.  The JtR build is from a Sep 27
>> download bleeding-jumbo; I can run a different version if needed.
>> NVIDIA drivers are 343.22.
>>
>
> I'd much appreciate a benchmark before/after trying this change: In
> src/opencl/wpapsk_kernel.cl (and many others, but lets try this first),
> we have this:
>
>         #if gpu_amd(DEVICE_INFO)
>         #define USE_BITSELECT
>         #endif
>
> If you delete the #if and #endif so we define it for nvidia too, does it
> affect performance significantly? Note that after editing, you need to run
> "make" to have the change "activated" even though the kernel is built at
> run-time.


$ diff -u opencl/wpapsk_kernel.cl-dist opencl/wpapsk_kernel.cl
--- opencl/wpapsk_kernel.cl-dist        2014-09-26 17:52:50.000000000 -0800
+++ opencl/wpapsk_kernel.cl     2014-09-29 19:47:34.150062765 -0800
@@ -19,9 +19,7 @@
 #define SCALAR
 #endif

-#if gpu_amd(DEVICE_INFO)
 #define USE_BITSELECT
-#endif

 /* Workaround for problem seen with 9600GT */
 #if gpu_nvidia(DEVICE_INFO)


Just in case, I did a 'make clean; make -s'.  Looks to have not made much
of a difference:

$ ../run/john --test --format=wpapsk-opencl
Device 0: GeForce GTX 970
Build log:
ptxas info    : 0 bytes gmem, 28 bytes cmem[3]
ptxas info    : Compiling entry function 'wpapsk_final_md5' for 'sm_52'
ptxas info    : Function properties for wpapsk_final_md5
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 71 registers, 332 bytes cmem[0]
ptxas info    : Compiling entry function 'wpapsk_loop' for 'sm_52'
ptxas info    : Function properties for wpapsk_loop
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 70 registers, 324 bytes cmem[0]
ptxas info    : Compiling entry function 'wpapsk_pass2' for 'sm_52'
ptxas info    : Function properties for wpapsk_pass2
ptxas         .     64 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 48 registers, 328 bytes cmem[0]
ptxas info    : Compiling entry function 'wpapsk_init' for 'sm_52'
ptxas info    : Function properties for wpapsk_init
ptxas         .     64 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 45 registers, 332 bytes cmem[0]
ptxas info    : Compiling entry function 'wpapsk_final_sha1' for 'sm_52'
ptxas info    : Function properties for wpapsk_final_sha1
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 72 registers, 332 bytes cmem[0]
Local worksize (LWS) 64, global worksize (GWS) 65536
Benchmarking: wpapsk-opencl, WPA/WPA2 PSK [PBKDF2-SHA1 OpenCL]... DONE
Raw:    134663 c/s real, 135591 c/s virtual

$ ../run/john --test --format=wpapsk-opencl
Device 0: GeForce GTX 970
Local worksize (LWS) 64, global worksize (GWS) 262144
Benchmarking: wpapsk-opencl, WPA/WPA2 PSK [PBKDF2-SHA1 OpenCL]... DONE
Raw:    139438 c/s real, 138700 c/s virtual

$ ../run/john --test --format=wpapsk-opencl
Device 0: GeForce GTX 970
Local worksize (LWS) 64, global worksize (GWS) 131072
Benchmarking: wpapsk-opencl, WPA/WPA2 PSK [PBKDF2-SHA1 OpenCL]... DONE
Raw:    137248 c/s real, 136533 c/s virtual

$ ../run/john --test --format=wpapsk-opencl
Device 0: GeForce GTX 970
Local worksize (LWS) 64, global worksize (GWS) 262144
Benchmarking: wpapsk-opencl, WPA/WPA2 PSK [PBKDF2-SHA1 OpenCL]... DONE
Raw:    138700 c/s real, 138700 c/s virtual


I did a crude pass to do the same for these:

$ grep 'define USE_BITSELECT' *
7z_kernel.cl:#define USE_BITSELECT
cryptmd5_kernel.cl:#define USE_BITSELECT
gpg_kernel.cl:#define USE_BITSELECT
keyring_kernel.cl:#define USE_BITSELECT
krb5pa-md5_kernel.cl:#define USE_BITSELECT
md4_kernel.cl:#define USE_BITSELECT
md5_kernel.cl:#define USE_BITSELECT
msha_kernel.cl:#define USE_BITSELECT
ntlmv2_kernel.cl:#define USE_BITSELECT
o5logon_kernel.cl:#define USE_BITSELECT
office2007_kernel.cl:#define USE_BITSELECT
office2010_kernel.cl:#define USE_BITSELECT
pbkdf2_hmac_sha1_kernel.cl:#define USE_BITSELECT
pwsafe_kernel.cl:#define USE_BITSELECT
rakp_kernel.cl:#define USE_BITSELECT
rar_kernel.cl:#define USE_BITSELECT
sha1_kernel.cl:#define USE_BITSELECT
wpapsk_kernel.cl:#define USE_BITSELECT
wpapsk_kernel.cl-dist:#define USE_BITSELECT


>From the tests, ntlmv2-opencl, pwsafe-opencl and rakp-opencel seem a bit
better.  Others look about the same.

$ for format in `cat ../bitselect-formats.list`; do echo $format;
../run/john --test --format=${format}-opencl; done
7z
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Device 0: GeForce GTX 970
Build log:
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'sevenzip' for 'sm_52'
ptxas info    : Function properties for sevenzip
ptxas         .     208 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 100 registers, 332 bytes cmem[0]
ptxas info    : Function properties for sha256_process
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
Local worksize (LWS) 64, Global worksize (GWS) 8192
Benchmarking: 7z-opencl, 7-Zip [SHA256 AES OPENCL]... DONE
Raw:    1087 c/s real, 1090 c/s virtual

cryptmd5
Unknown ciphertext format name requested
gpg
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Device 0: GeForce GTX 970
Build log:
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'gpg' for 'sm_52'
ptxas info    : Function properties for gpg
ptxas         .     2744 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 48 registers, 332 bytes cmem[0]
Local worksize (LWS) 64, Global worksize (GWS) 8192
Benchmarking: gpg-opencl, OpenPGP / GnuPG Secret Key [SHA1 OpenCL]... DONE
Raw:    240941 c/s real, 240941 c/s virtual

keyring
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Device 0: GeForce GTX 970
Build log:
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'keyring' for 'sm_52'
ptxas info    : Function properties for keyring
ptxas         .     376 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 111 registers, 332 bytes cmem[0]
ptxas info    : Function properties for sha256_process
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
Local worksize (LWS) 64, Global worksize (GWS) 8192
Benchmarking: keyring-opencl, GNOME Keyring [SHA256 OpenCL AES]... FAILED
(cmp_all(1))

krb5pa-md5
Device 0: GeForce GTX 970
Build log:
ptxas info    : 0 bytes gmem, 360 bytes cmem[3]
ptxas info    : Compiling entry function 'krb5pa_md5_nthash' for 'sm_52'
ptxas info    : Function properties for krb5pa_md5_nthash
ptxas         .     64 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 27 registers, 332 bytes cmem[0]
ptxas info    : Compiling entry function 'krb5pa_md5_final' for 'sm_52'
ptxas info    : Function properties for krb5pa_md5_final
ptxas         .     256 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 179 registers, 332 bytes cmem[0]
Local worksize (LWS) 32, Global worksize (GWS) 8388608
Benchmarking: krb5pa-md5-opencl, Kerberos 5 AS-REQ Pre-Auth etype 23 [MD4
HMAC-MD5 RC4 OpenCL]... DONE
Many salts:     15679K c/s real, 15679K c/s virtual
Only one salt:  12905K c/s real, 12905K c/s virtual

md4
Unknown ciphertext format name requested
md5
Unknown ciphertext format name requested
msha
Unknown ciphertext format name requested

ntlmv2
Device 0: GeForce GTX 970
Build log:
ptxas info    : 0 bytes gmem, 104 bytes cmem[3]
ptxas info    : Compiling entry function 'ntlmv2_nthash' for 'sm_52'
ptxas info    : Function properties for ntlmv2_nthash
ptxas         .     64 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 27 registers, 332 bytes cmem[0]
ptxas info    : Compiling entry function 'ntlmv2_final' for 'sm_52'
ptxas info    : Function properties for ntlmv2_final
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 48 registers, 332 bytes cmem[0]
Local worksize (LWS) 32, Global worksize (GWS) 4194304
Benchmarking: ntlmv2-opencl, NTLMv2 C/R [MD4 HMAC-MD5 OpenCL]... DONE
Many salts:     187318K c/s real, 191027K c/s virtual
Only one salt:  47482K c/s real, 47482K c/s virtual

o5logon
Device 0: GeForce GTX 970
Build log:
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'o5logon_kernel' for 'sm_52'
ptxas info    : Function properties for o5logon_kernel
ptxas         .     64 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 31 registers, 336 bytes cmem[0]
Local worksize (LWS) 32, global worksize (GWS) 524288
Benchmarking: o5logon-opencl, Oracle O5LOGON protocol [SHA1 OpenCL AES
32/64]... DONE
Raw:    2184K c/s real, 2221K c/s virtual

office2007
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Device 0: GeForce GTX 970
Build log:
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'HashLoop' for 'sm_52'
ptxas info    : Function properties for HashLoop
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 38 registers, 324 bytes cmem[0]
ptxas info    : Compiling entry function 'Generate2007key' for 'sm_52'
ptxas info    : Function properties for Generate2007key
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 41 registers, 328 bytes cmem[0]
ptxas info    : Compiling entry function 'GenerateSHA1pwhash' for 'sm_52'
ptxas info    : Function properties for GenerateSHA1pwhash
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 36 registers, 336 bytes cmem[0]
Local worksize (LWS) 64, Global worksize (GWS) 262144
Benchmarking: office2007-opencl, MS Office 2007 (50,000 iterations) [SHA1
OpenCL AES]... DONE
Raw:    42281 c/s real, 42349 c/s virtual

office2010
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Device 0: GeForce GTX 970
Build log:
ptxas info    : 0 bytes gmem, 20 bytes cmem[3]
ptxas info    : Compiling entry function 'HashLoop' for 'sm_52'
ptxas info    : Function properties for HashLoop
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 38 registers, 324 bytes cmem[0]
ptxas info    : Compiling entry function 'Generate2010key' for 'sm_52'
ptxas info    : Function properties for Generate2010key
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 54 registers, 332 bytes cmem[0]
ptxas info    : Compiling entry function 'GenerateSHA1pwhash' for 'sm_52'
ptxas info    : Function properties for GenerateSHA1pwhash
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 36 registers, 336 bytes cmem[0]
Local worksize (LWS) 64, Global worksize (GWS) 131072
Benchmarking: office2010-opencl, MS Office 2010 (100,000 iterations) [SHA1
OpenCL AES]... DONE
Raw:    21452 c/s real, 21522 c/s virtual

pbkdf2
Unknown ciphertext format name requested

pwsafe
Device 0: GeForce GTX 970
Build log:
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'pwsafe_iter' for 'sm_52'
ptxas info    : Function properties for pwsafe_iter
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 38 registers, 324 bytes cmem[0]
ptxas info    : Compiling entry function 'pwsafe_check' for 'sm_52'
ptxas info    : Function properties for pwsafe_check
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 18 registers, 332 bytes cmem[0]
ptxas info    : Compiling entry function 'pwsafe_init' for 'sm_52'
ptxas info    : Function properties for pwsafe_init
ptxas         .     128 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 94 registers, 328 bytes cmem[0]
Local worksize (LWS) 512, global worksize (GWS) 65536
Benchmarking: pwsafe-opencl, Password Safe [SHA256 OpenCL]... DONE
Many salts:     472331 c/s real, 472331 c/s virtual
Only one salt:  468114 c/s real, 468114 c/s virtual

rakp
Device 0: GeForce GTX 970
Build log:
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'rakp_kernel' for 'sm_52'
ptxas info    : Function properties for rakp_kernel
ptxas         .     64 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 81 registers, 336 bytes cmem[0]
Local worksize (LWS) 128, global worksize (GWS) 524288
Benchmarking: RAKP-opencl, IPMI 2.0 RAKP (RMCP+) [HMAC-SHA1 OpenCL]... DONE
Many salts:     71303K c/s real, 72023K c/s virtual
Only one salt:  34260K c/s real, 34603K c/s virtual

rar
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Device 0: GeForce GTX 970
Build log:
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'RarHashLoop' for 'sm_52'
ptxas info    : Function properties for RarHashLoop
ptxas         .     3584 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 58 registers, 344 bytes cmem[0]
ptxas info    : Compiling entry function 'RarFinal' for 'sm_52'
ptxas info    : Function properties for RarFinal
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 35 registers, 332 bytes cmem[0]
ptxas info    : Compiling entry function 'RarInit' for 'sm_52'
ptxas info    : Function properties for RarInit
ptxas         .     0 bytes stack frame, 0 bytes spill stores, 0 bytes
spill loads
ptxas info    : Used 12 registers, 328 bytes cmem[0]
Local worksize (LWS) 64, global worksize (GWS) 4096
Benchmarking: rar-opencl, RAR3 (length 5) [SHA1 OpenCL AES]... DONE
Raw:    15753 c/s real, 15753 c/s virtual

sha1
Unknown ciphertext format name requested

wpapsk
Device 0: GeForce GTX 970
Local worksize (LWS) 64, global worksize (GWS) 262144
Benchmarking: wpapsk-opencl, WPA/WPA2 PSK [PBKDF2-SHA1 OpenCL]... DONE
Raw:    137970 c/s real, 138700 c/s virtual


Royce

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.