|
|
Message-Id: <8039EE63-F47E-43FB-94E2-DF929735C338@m.patpro.net>
Date: Thu, 23 Mar 2017 21:04:39 +0100
From: Patrick Proniewski <p+password@...atpro.net>
To: john-users@...ts.openwall.com
Subject: GPU performance
Hello,
I'm very new to GPU cracking. I've only used few times hashcat on a Windows PC with an old Radeon.
Now I have @work a dedicated Linux PC with Nvidia Geforce GTX 1080. I've compiled john on Ubuntu 16.x LTS, following doc/INSTALL-UBUNTU. I've made a simple bench comparing john and hashcat and I'm quite surprised by the results:
> patpro@...cracker:~$ ./john/run/john -test -format=raw-sha1
> Benchmarking: Raw-SHA1 [SHA1 256/256 AVX2 8x]... DONE
> Raw: 26527K c/s real, 26527K c/s virtual
>
> patpro@...cracker:~$ ./john/run/john -test -format=raw-sha1-opencl
> Device 1: GeForce GTX 1080
> Benchmarking: Raw-SHA1-opencl [SHA1 OpenCL]... Build log:
> ptxas info : 0 bytes gmem
> ptxas info : Compiling entry function 'sha1' for 'sm_61'
> ptxas info : Function properties for sha1
> ptxas . 64 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
> ptxas info : Used 28 registers, 16388 bytes smem, 400 bytes cmem[0], 28 bytes cmem[2]
> DONE
> Raw: 87208K c/s real, 87208K c/s virtual
On one hand, I find it surprising that raw-sha1-opencl on GTX 1080 is only 3.3 times faster than raw-sha1 on 2.1 GHz Xeon (E5-2620 v4).
On the other hand, hashcat got a very nice 8191.7 MH/s on the GPU:
> patpro@...cracker:~$ ./hashcat/hashcat64.bin -m 100 -b
> hashcat (v3.40) starting in benchmark mode...
>
> * Device #2: WARNING! Kernel exec timeout is not disabled, it might cause you errors of code CL_OUT_OF_RESOURCES
> See the wiki on how to disable it: https://hashcat.net/wiki/doku.php?id=timeout_patch
> nvmlDeviceSetPowerManagementLimit(): Insufficient Permissions
>
> OpenCL Platform #1: Intel(R) Corporation
> ========================================
> * Device #1: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz, skipped
>
> OpenCL Platform #2: NVIDIA Corporation
> ======================================
> * Device #2: GeForce GTX 1080, 2027/8110 MB allocatable, 20MCU
>
> Hashtype: SHA1
>
> Speed.Dev.#2.....: 8191.7 MH/s (81.86ms)
>
> Started: Thu Mar 23 21:21:58 2017
> Stopped: Thu Mar 23 21:22:01 2017
I wonder if hashcat's H/s is the same thing as john's c/s.
I wonder if I've made something wrong when compiling john with OpenCL that could explain the low 3.3x gain.
I'm not saying hashcat is better. I just want to understand the difference here (and I'm a big fan of john).
Any info welcome!
thanks.
john --list=build-info
Version: 1.8.0.9-jumbo-1-bleeding
Build: linux-gnu 64-bit AVX2-ac OMP
SIMD: AVX2, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1
$JOHN is ./john/run/
Format interface version: 14
Max. number of reported tunable costs: 3
Rec file version: REC4
Charset file version: CHR3
CHARSET_MIN: 1 (0x01)
CHARSET_MAX: 255 (0xff)
CHARSET_LENGTH: 24
SALT_HASH_SIZE: 1048576
Max. Markov mode level: 400
Max. Markov mode password length: 30
gcc version: 5.4.0
GNU libc version: 2.23 (loaded: 2.23)
OpenCL headers version: 2.0
Crypto library: OpenSSL
OpenSSL library version: 01000207f
OpenSSL 1.0.2g 1 Mar 2016
GMP library version: 6.1.0
File locking: fcntl()
fseek(): fseek
ftell(): ftell
fopen(): fopen
memmem(): System's
john --list=opencl-devices
Platform #0 name: Intel(R) OpenCL, version: OpenCL 2.0 LINUX
Device #0 (0) name: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
Device vendor: Intel(R) Corporation
Device type: CPU (LE)
Device version: OpenCL 2.0 (Build 25)
Driver version: 1.2.0.25
Native vector widths: char 32, short 16, int 8, long 4
Preferred vector width: char 1, short 1, int 1, long 1
Global Memory: 31.0 GB
Global Memory Cache: 256.2 KB
Local Memory: 32.0 KB (Global)
Max memory alloc. size: 7.0 GB
Max clock (MHz): 2100
Profiling timer res.: 1 ns
Max Work Group Size: 8192
Parallel compute cores: 32
Speed index: 537600
Platform #1 name: NVIDIA CUDA, version: OpenCL 1.2 CUDA 8.0.0
Device #0 (1) name: GeForce GTX 1080
Device vendor: NVIDIA Corporation
Device type: GPU (LE)
Device version: OpenCL 1.2 CUDA
Driver version: 375.39 [recommended]
Native vector widths: char 1, short 1, int 1, long 1
Preferred vector width: char 1, short 1, int 1, long 1
Global Memory: 7.0 GB
Global Memory Cache: 320.3 KB
Local Memory: 48.0 KB (Local)
Max memory alloc. size: 1.0 GB
Max clock (MHz): 1733
Profiling timer res.: 1000 ns
Max Work Group Size: 1024
Parallel compute cores: 20
CUDA cores: 2560 (20 x 128)
Speed index: 4436480
Warp size: 32
Max. GPRs/work-group: 65536
Compute capability: 6.1 (sm_61)
Kernel exec. timeout: yes
NVML id: 0
PCI device topology: 03:00.0
PCI lanes: 16/16
Fan speed: 27%
Temperature: 37°C
Utilization: 0%
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.