john-dev - PHC: my yescrypt and lyra2 benchmarks

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKGDhHVa6jEORdsr-h8Vog0J=q+cDaFZ9q0atqsFhYTOmSFL0A@mail.gmail.com>
Date: Wed, 22 Jul 2015 03:49:14 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: PHC: my yescrypt and lyra2 benchmarks

hi, http://www.openwall.com/lists/john-dev/2015/07/05/9

I couldn't set pomelo to ~4330 c/s on well: I was getting 1932 for
m_cost=8 and 8364 for m_cost=7 so I postponed

Parallel doesn't support costs like t_cost and m_cost

I did benchmarks only for lyra2 and yescrypt for my implementations
(but maybe it's possible yescrypt make faster)

Lyra2

well - 4264
GeForce GTX 960M - 522
AMD Radeon HD 7900 Series - 3385
GeForce GTX TITAN - 1735

yescrypt

well - 4688
GeForce GTX 960M - 206
AMD Radeon HD 7900 Series - 319
GeForce GTX TITAN - 326

I was testing using my modified file bench.c and added option
--skip-self-test in lyra2 because I modified by hand only costs in
generated previously hash for another costs, was testing various LWS
for AMD Radeon HD 7900 Series and GeForce GTX TITAN and only one
LWS=64 for GeForce GTX 960M, but I set my get_default_workgroup() to
return 64 and was setting LWS manually


output (not everything):

lyra2

a@...l:~/m/run$ ./john --test --format=lyra2 --skip-self-test
Will run 8 OpenMP threads
Benchmarking: Lyra2 [Blake2 AVX2]... (8xOMP) DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 62, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    4264 c/s real, 534 c/s virtual


960m:

none@...e ~/Desktop/r/run $ ./john --test --format=lyra2-opencl --v=4
--skip-self-test
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 0: GeForce GTX 960M
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=131090 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
memory per hash : 1.45 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         370 c/s         370 rounds/s 691.268ms per crypt_all()!
gws:       512         522 c/s         522 rounds/s 979.697ms per crypt_all()+
Max local worksize 384, Local worksize (LWS) 64, global worksize (GWS) 512
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 62, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    522 c/s real, 522 c/s virtual

none@...e ~/Desktop/r/run $ GWS=1024 ./john --test
--format=lyra2-opencl --v=4 --skip-self-test
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 0: GeForce GTX 960M
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=131090 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
memory per hash : 1.45 MB
Max local worksize 384, OpenCL error (CL_OUT_OF_RESOURCES) in file
(opencl_lyra2_fmt_plug.c) at line (655) - (failed in reading data
back)

AMD----------------------------------------------------------------------------

[a@...er run]$ ./john --test --format=lyra2-opencl --skip-self-test --v=4
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 1.45 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         586 c/s         586 rounds/s 436.436ms per crypt_all()!
gws:       512        1144 c/s        1144 rounds/s 447.520ms per crypt_all()+
gws:      1024        2050 c/s        2050 rounds/s 499.336ms per crypt_all()+
Local worksize (LWS) 64, global worksize (GWS) 1024

DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 62, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    1735 c/s real, 204800 c/s virtual

[a@...er run]$ GWS=2048 ./john --test --format=lyra2-opencl
--skip-self-test --v=4
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 1.45 MB
OpenCL error (CL_INVALID_BUFFER_SIZE) in file
(opencl_lyra2_fmt_plug.c) at line (148) - (Error creating device
buffer)

[a@...er run]$ LWS=32 ./john --test --format=lyra2-opencl --skip-self-test --v=4
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 1.45 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         910 c/s         910 rounds/s 281.291ms per crypt_all()!
gws:       512        1790 c/s        1790 rounds/s 285.934ms per crypt_all()+
gws:      1024        3050 c/s        3050 rounds/s 335.677ms per crypt_all()+
Local worksize (LWS) 32, global worksize (GWS) 1024
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 62, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    2844 c/s real, 153600 c/s virtual

[a@...er run]$ LWS=16 ./john --test --format=lyra2-opencl --skip-self-test --v=4
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 1.45 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256        1117 c/s        1117 rounds/s 229.006ms per crypt_all()!
gws:       512        2247 c/s        2247 rounds/s 227.776ms per crypt_all()!
gws:      1024        3501 c/s        3501 rounds/s 292.412ms per crypt_all()+
Local worksize (LWS) 16, global worksize (GWS) 1024
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 62, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    3357 c/s real, 68266 c/s virtual

[a@...er run]$ LWS=8 ./john --test --format=lyra2-opencl --skip-self-test --v=4
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 1.45 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256        1202 c/s        1202 rounds/s 212.892ms per crypt_all()!
gws:       512        2362 c/s        2362 rounds/s 216.759ms per crypt_all()+
gws:      1024        3644 c/s        3644 rounds/s 281.006ms per crypt_all()+
Local worksize (LWS) 8, global worksize (GWS) 1024
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 62, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    3385 c/s real, 68266 c/s virtual

nvidia----------------------------------------------------

[a@...er run]$ ./john --test --format=lyra2-opencl --skip-self-test
--v=4 --dev=5
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
memory per hash : 1.45 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         285 c/s         285 rounds/s 897.446ms per crypt_all()!
gws:       512         575 c/s         575 rounds/s 890.154ms per crypt_all()!
Local worksize (LWS) 64, global worksize (GWS) 512
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 62, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    575 c/s real, 572 c/s virtual

[a@...er run]$ GWS=1024 ./john --test --format=lyra2-opencl
--skip-self-test --v=4 --dev=5
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
memory per hash : 1.45 MB
Local worksize (LWS) 64, global worksize (GWS) 1024
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 62, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    541 c/s real, 544 c/s virtual

[a@...er run]$ LWS=32 ./john --test --format=lyra2-opencl
--skip-self-test --v=4 --dev=5
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
memory per hash : 1.45 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         284 c/s         284 rounds/s 899.936ms per crypt_all()!
gws:       512         580 c/s         580 rounds/s 881.939ms per crypt_all()!
Local worksize (LWS) 32, global worksize (GWS) 512
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 62, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    578 c/s real, 578 c/s virtual

[a@...er run]$ LWS=16 ./john --test --format=lyra2-opencl
--skip-self-test --v=4 --dev=5
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
memory per hash : 1.45 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         490 c/s         490 rounds/s 522.369ms per crypt_all()!
gws:       512         977 c/s         977 rounds/s 523.742ms per crypt_all()+
Local worksize (LWS) 16, global worksize (GWS) 512
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 62, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    975 c/s real, 975 c/s virtual

[a@...er run]$ LWS=8 ./john --test --format=lyra2-opencl
--skip-self-test --v=4 --dev=5
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
memory per hash : 1.45 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         744 c/s         744 rounds/s 343.665ms per crypt_all()!
gws:       512        1308 c/s        1308 rounds/s 391.434ms per crypt_all()+
Local worksize (LWS) 8, global worksize (GWS) 512
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 62, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    1301 c/s real, 1312 c/s virtual

[a@...er run]$ LWS=4 ./john --test --format=lyra2-opencl
--skip-self-test --v=4 --dev=5
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
memory per hash : 1.45 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         922 c/s         922 rounds/s 277.531ms per crypt_all()!
gws:       512        1721 c/s        1721 rounds/s 297.414ms per crypt_all()+
Local worksize (LWS) 4, global worksize (GWS) 512
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 62, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    1735 c/s real, 1721 c/s virtual

[a@...er run]$ LWS=2 ./john --test --format=lyra2-opencl
--skip-self-test --v=4 --dev=5
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
memory per hash : 1.45 MB
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256        1143 c/s        1143 rounds/s 223.793ms per crypt_all()!
gws:       512        1196 c/s        1196 rounds/s 427.747ms per crypt_all()+
Local worksize (LWS) 2, global worksize (GWS) 512
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 62, cost 3 (c) of 256, cost 4 (p) of 1
Raw:    1209 c/s real, 1219 c/s virtual

yescrypt

a@...l:~/m/run$ ./john --test --format=yescrypt
Will run 8 OpenMP threads
Benchmarking: yescrypt [Salsa20/8 AVX]... (8xOMP) DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 7, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    3904 c/s real, 488 c/s virtual

a@...l:~/m/run$ ./john --test --format=yescrypt                  //r-2
Will run 8 OpenMP threads
Benchmarking: yescrypt [Salsa20/8 AVX]... (8xOMP) DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    4688 c/s real, 586 c/s virtual

a@...l:~/m/run$ ./john --test --format=yescrypt
Will run 8 OpenMP threads
Benchmarking: yescrypt [Salsa20/8 AVX]... (8xOMP) DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    4736 c/s real, 592 c/s virtual

960m

none@...e ~/Desktop/r/run $ ./john --test --format=yescrypt-opencl --v=4
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 0: GeForce GTX 960M
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=131090 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=32 -DSALT_SIZE=64
-DPLAINTEXT_LENGTH=125 -DHASH_SIZE=44 -DKEY_SIZE=125
memory per hash : 1.51 MB
Calculating best global worksize (GWS); max. 100s total for crypt_all()
gws:       256         213 c/s         213 rounds/s    1.199s per crypt_all()!
gws:       512         210 c/s         210 rounds/s    2.437s per crypt_all()
gws:      1024         188 c/s         188 rounds/s    5.433s per crypt_all()
gws:      2048         182 c/s         182 rounds/s   11.225s per crypt_all()
Local worksize (LWS) 64, global worksize (GWS) 256
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    206 c/s real, 206 c/s virtual

AMD

[a@...er run]$ GWS=1024 ./john --test --format=yescrypt-opencl --v=4
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 1.51 MB
Local worksize (LWS) 64, global worksize (GWS) 1024
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    319 c/s real, 102400 c/s virtual

[a@...er run]$ LWS=128 GWS=1024 ./john --test --format=yescrypt-opencl --v=4
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 1.51 MB
Local worksize (LWS) 128, global worksize (GWS) 1024
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    259 c/s real, 51200 c/s virtual

nvidia

[a@...er run]$ ./john --test --format=yescrypt-opencl --v=4 --dev=5
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=32 -DSALT_SIZE=64
-DPLAINTEXT_LENGTH=125 -DHASH_SIZE=44 -DKEY_SIZE=125
memory per hash : 1.51 MB
Calculating best global worksize (GWS); max. 100s total for crypt_all()
gws:       256         107 c/s         107 rounds/s    2.384s per crypt_all()!
gws:       512         193 c/s         193 rounds/s    2.639s per crypt_all()+
gws:      1024         255 c/s         255 rounds/s    4.005s per crypt_all()+
gws:      2048         303 c/s         303 rounds/s    6.753s per crypt_all()+
Local worksize (LWS) 64, global worksize (GWS) 2048
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    305 c/s real, 305 c/s virtual

[a@...er run]$ LWS=32 ./john --test --format=yescrypt-opencl --v=4 --dev=5
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=32 -DSALT_SIZE=64
-DPLAINTEXT_LENGTH=125 -DHASH_SIZE=44 -DKEY_SIZE=125
memory per hash : 1.51 MB
Calculating best global worksize (GWS); max. 100s total for crypt_all()
gws:       256         122 c/s         122 rounds/s    2.087s per crypt_all()!
gws:       512         197 c/s         197 rounds/s    2.591s per crypt_all()+
gws:      1024         288 c/s         288 rounds/s    3.544s per crypt_all()+
gws:      2048         325 c/s         325 rounds/s    6.289s per crypt_all()+
Local worksize (LWS) 32, global worksize (GWS) 2048
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    326 c/s real, 326 c/s virtual

[a@...er run]$ LWS=16 ./john --test --format=yescrypt-opencl --v=4 --dev=5
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=32 -DSALT_SIZE=64
-DPLAINTEXT_LENGTH=125 -DHASH_SIZE=44 -DKEY_SIZE=125
memory per hash : 1.51 MB
Calculating best global worksize (GWS); max. 100s total for crypt_all()
gws:       256         171 c/s         171 rounds/s    1.496s per crypt_all()!
gws:       512         267 c/s         267 rounds/s    1.912s per crypt_all()+
gws:      1024         314 c/s         314 rounds/s    3.256s per crypt_all()+
gws:      2048         240 c/s         240 rounds/s    8.504s per crypt_all()
Local worksize (LWS) 16, global worksize (GWS) 1024
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    315 c/s real, 314 c/s virtual

[a@...er run]$ LWS=8 ./john --test --format=yescrypt-opencl --v=4 --dev=5
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=32 -DSALT_SIZE=64
-DPLAINTEXT_LENGTH=125 -DHASH_SIZE=44 -DKEY_SIZE=125
memory per hash : 1.51 MB
Calculating best global worksize (GWS); max. 100s total for crypt_all()
gws:       256         234 c/s         234 rounds/s    1.089s per crypt_all()!
gws:       512         303 c/s         303 rounds/s    1.687s per crypt_all()+
gws:      1024         252 c/s         252 rounds/s    4.051s per crypt_all()
gws:      2048         289 c/s         289 rounds/s    7.069s per crypt_all()
Local worksize (LWS) 8, global worksize (GWS) 512
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    302 c/s real, 304 c/s virtual

[a@...er run]$ LWS=128 ./john --test --format=yescrypt-opencl --v=4 --dev=5
Benchmarking: yescrypt-opencl [Salsa20/8 OpenCL (inefficient,
development use only)]... Device 5: GeForce GTX TITAN
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=65554 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=32 -DSALT_SIZE=64
-DPLAINTEXT_LENGTH=125 -DHASH_SIZE=44 -DKEY_SIZE=125
memory per hash : 1.51 MB
Calculating best global worksize (GWS); max. 100s total for crypt_all()
gws:       256          75 c/s          75 rounds/s    3.381s per crypt_all()!
gws:       512         139 c/s         139 rounds/s    3.671s per crypt_all()+
gws:      1024         253 c/s         253 rounds/s    4.039s per crypt_all()+
gws:      2048         261 c/s         261 rounds/s    7.842s per crypt_all()+
Local worksize (LWS) 128, global worksize (GWS) 2048
DONE
Speed for cost 1 (N) of 2048, cost 2 (r) of 6, cost 3 (p) of 1, cost 4
(t) of 0, cost 5 (g) of 0
Raw:    263 c/s real, 263 c/s virtual
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.