john-dev - PHC: Lyra2 vs yescrypt benchmarks 2

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKGDhHX9_zJGoYpP4c22tG3O4fXjsyOkCPYFXxCexOeUgUAH8w@mail.gmail.com>
Date: Sat, 25 Jul 2015 22:56:42 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: PHC: Lyra2 vs yescrypt benchmarks 2

Lyra2

CPU on well - 3808
GeForce GTX 960M - 506
AMD Radeon HD 7900 Series - 2438
GeForce GTX TITAN - 1625
memory: 1.5 MB

yescrypt

CPU on well - 4736
GeForce GTX 960M - 416
AMD Radeon HD 7900 Series - 930
GeForce GTX TITAN - 1107
memory: 1.5 MB

I was testing using my modified file bench.c, was testing various LWS
for AMD Radeon HD 7900 Series and GeForce GTX TITAN and only one
LWS=64 for GeForce GTX 960M, but I set my get_default_workgroup() to
return 64 and was setting LWS manually


output (not everything):

Lyra2 CPU

a@...l:~/m/run$ ./john --test --format=lyra2
Will run 8 OpenMP threads
Benchmarking: Lyra2 [Blake2 AVX2]... (8xOMP) DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    3808 c/s real, 476 c/s virtual


Lyra2 AMD

[a@...er run]$ ./john --test --format=lyra2-opencl --v=4
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 1.50 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         436 c/s         436 rounds/s 586.434ms per crypt_all()!
gws:       512         832 c/s         832 rounds/s 615.005ms per crypt_all()+
gws:      1024        1477 c/s        1477 rounds/s 693.232ms per crypt_all()+
Local worksize (LWS) 64, global worksize (GWS) 1024
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    1077 c/s real, 204800 c/s virtual

[a@...er run]$ LWS=32 ./john --test --format=lyra2-opencl --v=4
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 1.50 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         769 c/s         769 rounds/s 332.717ms per crypt_all()!
gws:       512        1476 c/s        1476 rounds/s 346.780ms per crypt_all()+
gws:      1024        2335 c/s        2335 rounds/s 438.494ms per crypt_all()+
Local worksize (LWS) 32, global worksize (GWS) 1024
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    1896 c/s real, 204800 c/s virtual

[a@...er run]$ LWS=16 ./john --test --format=lyra2-opencl --v=4
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 1.50 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256        1013 c/s        1013 rounds/s 252.703ms per crypt_all()!
gws:       512        1992 c/s        1992 rounds/s 256.924ms per crypt_all()+
gws:      1024        2707 c/s        2707 rounds/s 378.248ms per crypt_all()+
Local worksize (LWS) 16, global worksize (GWS) 1024
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    2275 c/s real, 307200 c/s virtual

[a@...er run]$ LWS=8 ./john --test --format=lyra2-opencl --v=4
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 1.50 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256        1140 c/s        1140 rounds/s 224.520ms per crypt_all()!
gws:       512        2161 c/s        2161 rounds/s 236.819ms per crypt_all()+
gws:      1024        2905 c/s        2905 rounds/s 352.445ms per crypt_all()+
Local worksize (LWS) 8, global worksize (GWS) 1024
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    2438 c/s real, 307200 c/s virtual

Lyra2 TITAN

[a@...er run]$ ./john --test --format=lyra2-opencl --v=4 --dev=5
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
memory per hash : 1.50 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         272 c/s         272 rounds/s 941.174ms per crypt_all()!
gws:       512         551 c/s         551 rounds/s 927.770ms per crypt_all()!
Local worksize (LWS) 64, global worksize (GWS) 512
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    550 c/s real, 550 c/s virtual


[a@...er run]$ LWS=32 ./john --test --format=lyra2-opencl --v=4 --dev=5
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
memory per hash : 1.50 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         275 c/s         275 rounds/s 930.401ms per crypt_all()!
gws:       512         561 c/s         561 rounds/s 911.203ms per crypt_all()!
Local worksize (LWS) 32, global worksize (GWS) 512
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    559 c/s real, 559 c/s virtual

[a@...er run]$ LWS=16 ./john --test --format=lyra2-opencl --v=4 --dev=5
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
memory per hash : 1.50 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         475 c/s         475 rounds/s 538.159ms per crypt_all()!
gws:       512         947 c/s         947 rounds/s 540.294ms per crypt_all()+
Local worksize (LWS) 16, global worksize (GWS) 512
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    939 c/s real, 930 c/s virtual

[a@...er run]$ LWS=8 ./john --test --format=lyra2-opencl --v=4 --dev=5
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
memory per hash : 1.50 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         722 c/s         722 rounds/s 354.516ms per crypt_all()!
gws:       512        1231 c/s        1231 rounds/s 415.724ms per crypt_all()+
Local worksize (LWS) 8, global worksize (GWS) 512
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    1219 c/s real, 1219 c/s virtual

[a@...er run]$ LWS=4 ./john --test --format=lyra2-opencl --v=4 --dev=5
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
memory per hash : 1.50 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         894 c/s         894 rounds/s 286.152ms per crypt_all()!
gws:       512        1629 c/s        1629 rounds/s 314.183ms per crypt_all()+
Local worksize (LWS) 4, global worksize (GWS) 512
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    1625 c/s real, 1638 c/s virtual


Lyra2 960m

none@...e ~/Desktop/rrr/run $ ./john --test --format=lyra2-opencl --v=4
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 0: GeForce GTX 960M
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=131090 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
memory per hash : 1.50 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         491 c/s         491 rounds/s 521.098ms per crypt_all()!
Local worksize (LWS) 64, global worksize (GWS) 256
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    506 c/s real, 506 c/s virtual


yescrypt CPU

a@...l:~/m/run$ ./john --test --format=yescrypt
Will run 8 OpenMP threads
Benchmarking: yescrypt [Salsa20/8 AVX]... (8xOMP) DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    4736 c/s real, 592 c/s virtual


yescrypt AMD

[a@...er run]$ GWS=1024 ./john --test --format=yescrypt-opencl --v=4
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 1.51 MB
Local worksize (LWS) 64, global worksize (GWS) 1024
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    636 c/s real, 102400 c/s virtual

[a@...er run]$ LWS=32 GWS=1024 ./john --test --format=yescrypt-opencl --v=4
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 1.51 MB
Local worksize (LWS) 32, global worksize (GWS) 1024
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    875 c/s real, 102400 c/s virtual

[a@...er run]$ LWS=16 GWS=1024 ./john --test --format=yescrypt-opencl --v=4
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 1.51 MB
Local worksize (LWS) 16, global worksize (GWS) 1024
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    922 c/s real, 102400 c/s virtual

[a@...er run]$
[a@...er run]$ LWS=8 GWS=1024 ./john --test --format=yescrypt-opencl --v=4
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 1.51 MB
Local worksize (LWS) 8, global worksize (GWS) 1024
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    930 c/s real, 102400 c/s virtual

yescrypt TITAN

[a@...er run]$ ./john --test --format=yescrypt-opencl --v=4 --dev=5
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=32 -DSALT_SIZE=64
-DPLAINTEXT_LENGTH=125 -DHASH_SIZE=44 -DKEY_SIZE=125
memory per hash : 1.51 MB
Calculating best global worksize (GWS); max. 100s total for crypt_all()
gws:       256         230 c/s         230 rounds/s    1.110s per crypt_all()!
gws:       512         470 c/s         470 rounds/s    1.088s per crypt_all()!
gws:      1024         828 c/s         828 rounds/s    1.236s per crypt_all()+
gws:      2048        1016 c/s        1016 rounds/s    2.015s per crypt_all()+
Local worksize (LWS) 64, global worksize (GWS) 2048
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    1107 c/s real, 1113 c/s virtual


yescrypt 960m

none@...e ~/Desktop/rrr/run $ ./john --test --format=yescrypt-opencl --v=4
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 0: GeForce GTX 960M
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=131090 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=32 -DSALT_SIZE=64
-DPLAINTEXT_LENGTH=125 -DHASH_SIZE=44 -DKEY_SIZE=125
memory per hash : 1.51 MB
Calculating best global worksize (GWS); max. 100s total for crypt_all()
gws:       256         317 c/s         317 rounds/s 807.041ms per crypt_all()!
gws:       512         415 c/s         415 rounds/s    1.232s per crypt_all()+
gws:      1024         406 c/s         406 rounds/s    2.515s per crypt_all()
gws:      2048         405 c/s         405 rounds/s    5.050s per crypt_all()
Local worksize (LWS) 64, global worksize (GWS) 512
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    416 c/s real, 416 c/s virtual
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.