|
Message-ID: <CAKGDhHX9_zJGoYpP4c22tG3O4fXjsyOkCPYFXxCexOeUgUAH8w@mail.gmail.com> Date: Sat, 25 Jul 2015 22:56:42 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: PHC: Lyra2 vs yescrypt benchmarks 2 Lyra2 CPU on well - 3808 GeForce GTX 960M - 506 AMD Radeon HD 7900 Series - 2438 GeForce GTX TITAN - 1625 memory: 1.5 MB yescrypt CPU on well - 4736 GeForce GTX 960M - 416 AMD Radeon HD 7900 Series - 930 GeForce GTX TITAN - 1107 memory: 1.5 MB I was testing using my modified file bench.c, was testing various LWS for AMD Radeon HD 7900 Series and GeForce GTX TITAN and only one LWS=64 for GeForce GTX 960M, but I set my get_default_workgroup() to return 64 and was setting LWS manually output (not everything): Lyra2 CPU a@...l:~/m/run$ ./john --test --format=lyra2 Will run 8 OpenMP threads Benchmarking: Lyra2 [Blake2 AVX2]... (8xOMP) DONE Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1 Raw: 3808 c/s real, 476 c/s virtual Lyra2 AMD [a@...er run]$ ./john --test --format=lyra2-opencl --v=4 Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 1.50 MB Calculating best global worksize (GWS); max. 1s single kernel invocation. gws: 256 436 c/s 436 rounds/s 586.434ms per crypt_all()! gws: 512 832 c/s 832 rounds/s 615.005ms per crypt_all()+ gws: 1024 1477 c/s 1477 rounds/s 693.232ms per crypt_all()+ Local worksize (LWS) 64, global worksize (GWS) 1024 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1 Raw: 1077 c/s real, 204800 c/s virtual [a@...er run]$ LWS=32 ./john --test --format=lyra2-opencl --v=4 Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 1.50 MB Calculating best global worksize (GWS); max. 1s single kernel invocation. gws: 256 769 c/s 769 rounds/s 332.717ms per crypt_all()! gws: 512 1476 c/s 1476 rounds/s 346.780ms per crypt_all()+ gws: 1024 2335 c/s 2335 rounds/s 438.494ms per crypt_all()+ Local worksize (LWS) 32, global worksize (GWS) 1024 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1 Raw: 1896 c/s real, 204800 c/s virtual [a@...er run]$ LWS=16 ./john --test --format=lyra2-opencl --v=4 Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 1.50 MB Calculating best global worksize (GWS); max. 1s single kernel invocation. gws: 256 1013 c/s 1013 rounds/s 252.703ms per crypt_all()! gws: 512 1992 c/s 1992 rounds/s 256.924ms per crypt_all()+ gws: 1024 2707 c/s 2707 rounds/s 378.248ms per crypt_all()+ Local worksize (LWS) 16, global worksize (GWS) 1024 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1 Raw: 2275 c/s real, 307200 c/s virtual [a@...er run]$ LWS=8 ./john --test --format=lyra2-opencl --v=4 Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 1.50 MB Calculating best global worksize (GWS); max. 1s single kernel invocation. gws: 256 1140 c/s 1140 rounds/s 224.520ms per crypt_all()! gws: 512 2161 c/s 2161 rounds/s 236.819ms per crypt_all()+ gws: 1024 2905 c/s 2905 rounds/s 352.445ms per crypt_all()+ Local worksize (LWS) 8, global worksize (GWS) 1024 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1 Raw: 2438 c/s real, 307200 c/s virtual Lyra2 TITAN [a@...er run]$ ./john --test --format=lyra2-opencl --v=4 --dev=5 Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use only)]... Device 5: GeForce GTX TITAN Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21 -D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64 memory per hash : 1.50 MB Calculating best global worksize (GWS); max. 1s single kernel invocation. gws: 256 272 c/s 272 rounds/s 941.174ms per crypt_all()! gws: 512 551 c/s 551 rounds/s 927.770ms per crypt_all()! Local worksize (LWS) 64, global worksize (GWS) 512 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1 Raw: 550 c/s real, 550 c/s virtual [a@...er run]$ LWS=32 ./john --test --format=lyra2-opencl --v=4 --dev=5 Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use only)]... Device 5: GeForce GTX TITAN Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21 -D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64 memory per hash : 1.50 MB Calculating best global worksize (GWS); max. 1s single kernel invocation. gws: 256 275 c/s 275 rounds/s 930.401ms per crypt_all()! gws: 512 561 c/s 561 rounds/s 911.203ms per crypt_all()! Local worksize (LWS) 32, global worksize (GWS) 512 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1 Raw: 559 c/s real, 559 c/s virtual [a@...er run]$ LWS=16 ./john --test --format=lyra2-opencl --v=4 --dev=5 Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use only)]... Device 5: GeForce GTX TITAN Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21 -D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64 memory per hash : 1.50 MB Calculating best global worksize (GWS); max. 1s single kernel invocation. gws: 256 475 c/s 475 rounds/s 538.159ms per crypt_all()! gws: 512 947 c/s 947 rounds/s 540.294ms per crypt_all()+ Local worksize (LWS) 16, global worksize (GWS) 512 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1 Raw: 939 c/s real, 930 c/s virtual [a@...er run]$ LWS=8 ./john --test --format=lyra2-opencl --v=4 --dev=5 Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use only)]... Device 5: GeForce GTX TITAN Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21 -D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64 memory per hash : 1.50 MB Calculating best global worksize (GWS); max. 1s single kernel invocation. gws: 256 722 c/s 722 rounds/s 354.516ms per crypt_all()! gws: 512 1231 c/s 1231 rounds/s 415.724ms per crypt_all()+ Local worksize (LWS) 8, global worksize (GWS) 512 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1 Raw: 1219 c/s real, 1219 c/s virtual [a@...er run]$ LWS=4 ./john --test --format=lyra2-opencl --v=4 --dev=5 Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use only)]... Device 5: GeForce GTX TITAN Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21 -D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64 memory per hash : 1.50 MB Calculating best global worksize (GWS); max. 1s single kernel invocation. gws: 256 894 c/s 894 rounds/s 286.152ms per crypt_all()! gws: 512 1629 c/s 1629 rounds/s 314.183ms per crypt_all()+ Local worksize (LWS) 4, global worksize (GWS) 512 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1 Raw: 1625 c/s real, 1638 c/s virtual Lyra2 960m none@...e ~/Desktop/rrr/run $ ./john --test --format=lyra2-opencl --v=4 Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use only)]... Device 0: GeForce GTX 960M Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=131090 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21 -D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64 memory per hash : 1.50 MB Calculating best global worksize (GWS); max. 1s single kernel invocation. gws: 256 491 c/s 491 rounds/s 521.098ms per crypt_all()! Local worksize (LWS) 64, global worksize (GWS) 256 DONE Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1 Raw: 506 c/s real, 506 c/s virtual yescrypt CPU a@...l:~/m/run$ ./john --test --format=yescrypt Will run 8 OpenMP threads Benchmarking: yescrypt [Salsa20/8 AVX]... (8xOMP) DONE Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4 (t) of 0, cost 5 (g) of 0 Raw: 4736 c/s real, 592 c/s virtual yescrypt AMD [a@...er run]$ GWS=1024 ./john --test --format=yescrypt-opencl --v=4 Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 1.51 MB Local worksize (LWS) 64, global worksize (GWS) 1024 DONE Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4 (t) of 0, cost 5 (g) of 0 Raw: 636 c/s real, 102400 c/s virtual [a@...er run]$ LWS=32 GWS=1024 ./john --test --format=yescrypt-opencl --v=4 Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 1.51 MB Local worksize (LWS) 32, global worksize (GWS) 1024 DONE Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4 (t) of 0, cost 5 (g) of 0 Raw: 875 c/s real, 102400 c/s virtual [a@...er run]$ LWS=16 GWS=1024 ./john --test --format=yescrypt-opencl --v=4 Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 1.51 MB Local worksize (LWS) 16, global worksize (GWS) 1024 DONE Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4 (t) of 0, cost 5 (g) of 0 Raw: 922 c/s real, 102400 c/s virtual [a@...er run]$ [a@...er run]$ LWS=8 GWS=1024 ./john --test --format=yescrypt-opencl --v=4 Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient, development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series] memory per hash : 1.51 MB Local worksize (LWS) 8, global worksize (GWS) 1024 DONE Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4 (t) of 0, cost 5 (g) of 0 Raw: 930 c/s real, 102400 c/s virtual yescrypt TITAN [a@...er run]$ ./john --test --format=yescrypt-opencl --v=4 --dev=5 Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient, development use only)]... Device 5: GeForce GTX TITAN Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21 -D_OPENCL_COMPILER -DBINARY_SIZE=32 -DSALT_SIZE=64 -DPLAINTEXT_LENGTH=125 -DHASH_SIZE=44 -DKEY_SIZE=125 memory per hash : 1.51 MB Calculating best global worksize (GWS); max. 100s total for crypt_all() gws: 256 230 c/s 230 rounds/s 1.110s per crypt_all()! gws: 512 470 c/s 470 rounds/s 1.088s per crypt_all()! gws: 1024 828 c/s 828 rounds/s 1.236s per crypt_all()+ gws: 2048 1016 c/s 1016 rounds/s 2.015s per crypt_all()+ Local worksize (LWS) 64, global worksize (GWS) 2048 DONE Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4 (t) of 0, cost 5 (g) of 0 Raw: 1107 c/s real, 1113 c/s virtual yescrypt 960m none@...e ~/Desktop/rrr/run $ ./john --test --format=yescrypt-opencl --v=4 Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient, development use only)]... Device 0: GeForce GTX 960M Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__ -DDEVICE_INFO=131090 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21 -D_OPENCL_COMPILER -DBINARY_SIZE=32 -DSALT_SIZE=64 -DPLAINTEXT_LENGTH=125 -DHASH_SIZE=44 -DKEY_SIZE=125 memory per hash : 1.51 MB Calculating best global worksize (GWS); max. 100s total for crypt_all() gws: 256 317 c/s 317 rounds/s 807.041ms per crypt_all()! gws: 512 415 c/s 415 rounds/s 1.232s per crypt_all()+ gws: 1024 406 c/s 406 rounds/s 2.515s per crypt_all() gws: 2048 405 c/s 405 rounds/s 5.050s per crypt_all() Local worksize (LWS) 64, global worksize (GWS) 512 DONE Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4 (t) of 0, cost 5 (g) of 0 Raw: 416 c/s real, 416 c/s virtual
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.