|
Message-ID: <CAKGDhHVM8ARh-9AVZkK3e-2JBT+2BYjwWY3h51ZXSRG9iw3Hwg@mail.gmail.com> Date: Sun, 19 Jul 2015 14:20:10 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: yescrypt on GPU I optimized yescrypt-opencl (960m) by copying one table to private memory before(with some optimizations): none@...e ~/Desktop/r/run $ GWS=1024 ./john --test --format=yescrypt-opencl Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient, development use only)]... Device 0: GeForce GTX 960M memory per hash : 2.10 MB DONE Speed for cost 1 (N) of 2048, cost 2 (r) of 8, cost 3 (p) of 11, cost 4 (t) of 0, cost 5 (g) of 0 Many salts: 247 c/s real, 247 c/s virtual Only one salt: 247 c/s real, 247 c/s virtual now: none@...e ~/Desktop/r/src $ m;r;GWS=1024 ./john --test --format=yescrypt-opencl Make process completed. Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient, development use only)]... Device 0: GeForce GTX 960M memory per hash : 2.10 MB DONE Speed for cost 1 (N) of 2048, cost 2 (r) of 8, cost 3 (p) of 11, cost 4 (t) of 0, cost 5 (g) of 0 Many salts: 409 c/s real, 407 c/s virtual Only one salt: 409 c/s real, 407 c/s virtual but if I want to run benchmarks for GWS=256,512 and 1024 I need to set a quarter of needed memory in autotune (I'm getting CL_MEM_OBJECT_ALLOCATION_FAILURE for GWS=2048) none@...e ~/Desktop/r/run $ ./john --test --format=yescrypt-opencl --v=4 Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient, development use only)]... Device 0: GeForce GTX 960M Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=131090 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21 -D_OPENCL_COMPILER -DBINARY_SIZE=32 -DSALT_SIZE=64 -DPLAINTEXT_LENGTH=125 -DHASH_SIZE=44 memory per hash : 2.10 MB Calculating best global worksize (GWS); max. 100s total for crypt_all() gws: 256 159 c/s 159 rounds/s 1.608s per crypt_all()! gws: 512 161 c/s 161 rounds/s 3.176s per crypt_all()+ gws: 1024 145 c/s 145 rounds/s 7.029s per crypt_all() Local worksize (LWS) 64, global worksize (GWS) 512 DONE Speed for cost 1 (N) of 2048, cost 2 (r) of 8, cost 3 (p) of 11, cost 4 (t) of 0, cost 5 (g) of 0 Many salts: 355 c/s real, 358 c/s virtual Only one salt: 358 c/s real, 358 c/s virtual If I set all of needed memory: none@...e ~/Desktop/r/run $ ./john --test --format=yescrypt-opencl --v=4 Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient, development use only)]... Device 0: GeForce GTX 960M Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=131090 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21 -D_OPENCL_COMPILER -DBINARY_SIZE=32 -DSALT_SIZE=64 -DPLAINTEXT_LENGTH=125 -DHASH_SIZE=44 memory per hash : 2.10 MB Calculating best global worksize (GWS); max. 100s total for crypt_all() gws: 256 158 c/s 158 rounds/s 1.612s per crypt_all()! Local worksize (LWS) 64, global worksize (GWS) 256 DONE Speed for cost 1 (N) of 2048, cost 2 (r) of 8, cost 3 (p) of 11, cost 4 (t) of 0, cost 5 (g) of 0 Many salts: 230 c/s real, 230 c/s virtual Only one salt: 237 c/s real, 237 c/s virtual and the other thing is that benchamrks estimate the speed inproperly none@...e ~/Desktop/r/run $ GWS=1024 ./john --test --format=yescrypt-opencl Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient, development use only)]... Device 0: GeForce GTX 960M memory per hash : 2.10 MB DONE Speed for cost 1 (N) of 2048, cost 2 (r) of 8, cost 3 (p) of 11, cost 4 (t) of 0, cost 5 (g) of 0 Many salts: 407 c/s real, 407 c/s virtual Only one salt: 409 c/s real, 409 c/s virtual none@...e ~/Desktop/r/run $ GWS=512 ./john --test --format=yescrypt-opencl Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient, development use only)]... Device 0: GeForce GTX 960M memory per hash : 2.10 MB DONE Speed for cost 1 (N) of 2048, cost 2 (r) of 8, cost 3 (p) of 11, cost 4 (t) of 0, cost 5 (g) of 0 Many salts: 358 c/s real, 358 c/s virtual Only one salt: 358 c/s real, 360 c/s virtual none@...e ~/Desktop/r/run $ ./john --test --format=yescrypt-opencl --v=4 Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient, development use only)]... Device 0: GeForce GTX 960M Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=131090 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21 -D_OPENCL_COMPILER -DBINARY_SIZE=32 -DSALT_SIZE=64 -DPLAINTEXT_LENGTH=125 -DHASH_SIZE=44 memory per hash : 2.10 MB Calculating best global worksize (GWS); max. 100s total for crypt_all() gws: 256 159 c/s 159 rounds/s 1.608s per crypt_all()! gws: 512 161 c/s 161 rounds/s 3.176s per crypt_all()+ gws: 1024 145 c/s 145 rounds/s 7.029s per crypt_all() Local worksize (LWS) 64, global worksize (GWS) 512 DONE Speed for cost 1 (N) of 2048, cost 2 (r) of 8, cost 3 (p) of 11, cost 4 (t) of 0, cost 5 (g) of 0 Many salts: 355 c/s real, 358 c/s virtual Only one salt: 358 c/s real, 358 c/s virtual
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.