john-dev - Re: PHC: yescrypt on GPU

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKGDhHVM8ARh-9AVZkK3e-2JBT+2BYjwWY3h51ZXSRG9iw3Hwg@mail.gmail.com>
Date: Sun, 19 Jul 2015 14:20:10 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: yescrypt on GPU

I optimized yescrypt-opencl (960m) by copying one table to private memory
before(with some optimizations):

none@...e ~/Desktop/r/run $ GWS=1024 ./john --test --format=yescrypt-opencl
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 0: GeForce GTX 960M
memory per hash : 2.10 MB
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 8, cost 3 (p) of 11, cost
4 (t) of 0, cost 5 (g) of 0
Many salts:     247 c/s real, 247 c/s virtual
Only one salt:  247 c/s real, 247 c/s virtual

now:

none@...e ~/Desktop/r/src $  m;r;GWS=1024 ./john --test --format=yescrypt-opencl

Make process completed.
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 0: GeForce GTX 960M
memory per hash : 2.10 MB
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 8, cost 3 (p) of 11, cost
4 (t) of 0, cost 5 (g) of 0
Many salts:     409 c/s real, 407 c/s virtual
Only one salt:  409 c/s real, 407 c/s virtual

but if I want to run benchmarks for GWS=256,512 and 1024 I need to set
a quarter of needed memory in autotune
(I'm getting CL_MEM_OBJECT_ALLOCATION_FAILURE for GWS=2048)

none@...e ~/Desktop/r/run $ ./john --test --format=yescrypt-opencl --v=4
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 0: GeForce GTX 960M
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=131090 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=32 -DSALT_SIZE=64
-DPLAINTEXT_LENGTH=125 -DHASH_SIZE=44
memory per hash : 2.10 MB
Calculating best global worksize (GWS); max. 100s total for crypt_all()
gws:       256         159 c/s         159 rounds/s    1.608s per crypt_all()!
gws:       512         161 c/s         161 rounds/s    3.176s per crypt_all()+
gws:      1024         145 c/s         145 rounds/s    7.029s per crypt_all()
Local worksize (LWS) 64, global worksize (GWS) 512
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 8, cost 3 (p) of 11, cost
4 (t) of 0, cost 5 (g) of 0
Many salts:     355 c/s real, 358 c/s virtual
Only one salt:  358 c/s real, 358 c/s virtual

If I set all of needed memory:

none@...e ~/Desktop/r/run $ ./john --test --format=yescrypt-opencl --v=4
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 0: GeForce GTX 960M
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=131090 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=32 -DSALT_SIZE=64
-DPLAINTEXT_LENGTH=125 -DHASH_SIZE=44
memory per hash : 2.10 MB
Calculating best global worksize (GWS); max. 100s total for crypt_all()
gws:       256         158 c/s         158 rounds/s    1.612s per crypt_all()!
Local worksize (LWS) 64, global worksize (GWS) 256
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 8, cost 3 (p) of 11, cost
4 (t) of 0, cost 5 (g) of 0
Many salts:     230 c/s real, 230 c/s virtual
Only one salt:  237 c/s real, 237 c/s virtual


and the other thing is that benchamrks estimate the speed inproperly

none@...e ~/Desktop/r/run $ GWS=1024 ./john --test --format=yescrypt-opencl
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 0: GeForce GTX 960M
memory per hash : 2.10 MB
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 8, cost 3 (p) of 11, cost
4 (t) of 0, cost 5 (g) of 0
Many salts:     407 c/s real, 407 c/s virtual
Only one salt:  409 c/s real, 409 c/s virtual

none@...e ~/Desktop/r/run $ GWS=512 ./john --test --format=yescrypt-opencl
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 0: GeForce GTX 960M
memory per hash : 2.10 MB
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 8, cost 3 (p) of 11, cost
4 (t) of 0, cost 5 (g) of 0
Many salts:     358 c/s real, 358 c/s virtual
Only one salt:  358 c/s real, 360 c/s virtual


none@...e ~/Desktop/r/run $ ./john --test --format=yescrypt-opencl --v=4
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 0: GeForce GTX 960M
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=131090 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=32 -DSALT_SIZE=64
-DPLAINTEXT_LENGTH=125 -DHASH_SIZE=44
memory per hash : 2.10 MB
Calculating best global worksize (GWS); max. 100s total for crypt_all()
gws:       256         159 c/s         159 rounds/s    1.608s per crypt_all()!
gws:       512         161 c/s         161 rounds/s    3.176s per crypt_all()+
gws:      1024         145 c/s         145 rounds/s    7.029s per crypt_all()
Local worksize (LWS) 64, global worksize (GWS) 512
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 8, cost 3 (p) of 11, cost
4 (t) of 0, cost 5 (g) of 0
Many salts:     355 c/s real, 358 c/s virtual
Only one salt:  358 c/s real, 358 c/s virtual
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.