john-users - GPU performance

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <8039EE63-F47E-43FB-94E2-DF929735C338@m.patpro.net>
Date: Thu, 23 Mar 2017 21:04:39 +0100
From: Patrick Proniewski <p+password@...atpro.net>
To: john-users@...ts.openwall.com
Subject: GPU performance

Hello,

I'm very new to GPU cracking. I've only used few times hashcat on a Windows PC with an old Radeon.
Now I have @work a dedicated Linux PC with Nvidia Geforce GTX 1080. I've compiled john on Ubuntu 16.x LTS, following doc/INSTALL-UBUNTU. I've made a simple bench comparing john and hashcat and I'm quite surprised by the results:

> patpro@...cracker:~$ ./john/run/john -test -format=raw-sha1
> Benchmarking: Raw-SHA1 [SHA1 256/256 AVX2 8x]... DONE
> Raw:	26527K c/s real, 26527K c/s virtual
> 
> patpro@...cracker:~$ ./john/run/john -test -format=raw-sha1-opencl
> Device 1: GeForce GTX 1080
> Benchmarking: Raw-SHA1-opencl [SHA1 OpenCL]... Build log: 
> ptxas info    : 0 bytes gmem
> ptxas info    : Compiling entry function 'sha1' for 'sm_61'
> ptxas info    : Function properties for sha1
> ptxas         .     64 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
> ptxas info    : Used 28 registers, 16388 bytes smem, 400 bytes cmem[0], 28 bytes cmem[2]
> DONE
> Raw:	87208K c/s real, 87208K c/s virtual

On one hand, I find it surprising that raw-sha1-opencl on GTX 1080 is only 3.3 times faster than raw-sha1 on 2.1 GHz Xeon (E5-2620 v4).

On the other hand, hashcat got a very nice 8191.7 MH/s on the GPU:

> patpro@...cracker:~$ ./hashcat/hashcat64.bin -m 100 -b
> hashcat (v3.40) starting in benchmark mode...
> 
> * Device #2: WARNING! Kernel exec timeout is not disabled, it might cause you errors of code CL_OUT_OF_RESOURCES
>              See the wiki on how to disable it: https://hashcat.net/wiki/doku.php?id=timeout_patch
> nvmlDeviceSetPowerManagementLimit(): Insufficient Permissions
> 
> OpenCL Platform #1: Intel(R) Corporation
> ========================================
> * Device #1: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz, skipped
> 
> OpenCL Platform #2: NVIDIA Corporation
> ======================================
> * Device #2: GeForce GTX 1080, 2027/8110 MB allocatable, 20MCU
> 
> Hashtype: SHA1
> 
> Speed.Dev.#2.....:  8191.7 MH/s (81.86ms)
> 
> Started: Thu Mar 23 21:21:58 2017
> Stopped: Thu Mar 23 21:22:01 2017


I wonder if hashcat's H/s is the same thing as john's c/s. 
I wonder if I've made something wrong when compiling john with OpenCL that could explain the low 3.3x gain.
I'm not saying hashcat is better. I just want to understand the difference here (and I'm a big fan of john).

Any info welcome!
thanks.

john --list=build-info
Version: 1.8.0.9-jumbo-1-bleeding
Build: linux-gnu 64-bit AVX2-ac OMP
SIMD: AVX2, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1
$JOHN is ./john/run/
Format interface version: 14
Max. number of reported tunable costs: 3
Rec file version: REC4
Charset file version: CHR3
CHARSET_MIN: 1 (0x01)
CHARSET_MAX: 255 (0xff)
CHARSET_LENGTH: 24
SALT_HASH_SIZE: 1048576
Max. Markov mode level: 400
Max. Markov mode password length: 30
gcc version: 5.4.0
GNU libc version: 2.23 (loaded: 2.23)
OpenCL headers version: 2.0
Crypto library: OpenSSL
OpenSSL library version: 01000207f
OpenSSL 1.0.2g  1 Mar 2016
GMP library version: 6.1.0
File locking: fcntl()
fseek(): fseek
ftell(): ftell
fopen(): fopen
memmem(): System's

john --list=opencl-devices
Platform #0 name: Intel(R) OpenCL, version: OpenCL 2.0 LINUX
    Device #0 (0) name:     Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
    Device vendor:          Intel(R) Corporation
    Device type:            CPU (LE)
    Device version:         OpenCL 2.0 (Build 25)
    Driver version:         1.2.0.25 
    Native vector widths:   char 32, short 16, int 8, long 4
    Preferred vector width: char 1, short 1, int 1, long 1
    Global Memory:          31.0 GB
    Global Memory Cache:    256.2 KB
    Local Memory:           32.0 KB (Global)
    Max memory alloc. size: 7.0 GB
    Max clock (MHz):        2100
    Profiling timer res.:   1 ns
    Max Work Group Size:    8192
    Parallel compute cores: 32
    Speed index:            537600

Platform #1 name: NVIDIA CUDA, version: OpenCL 1.2 CUDA 8.0.0
    Device #0 (1) name:     GeForce GTX 1080
    Device vendor:          NVIDIA Corporation
    Device type:            GPU (LE)
    Device version:         OpenCL 1.2 CUDA
    Driver version:         375.39 [recommended]
    Native vector widths:   char 1, short 1, int 1, long 1
    Preferred vector width: char 1, short 1, int 1, long 1
    Global Memory:          7.0 GB
    Global Memory Cache:    320.3 KB
    Local Memory:           48.0 KB (Local)
    Max memory alloc. size: 1.0 GB
    Max clock (MHz):        1733
    Profiling timer res.:   1000 ns
    Max Work Group Size:    1024
    Parallel compute cores: 20
    CUDA cores:             2560  (20 x 128)
    Speed index:            4436480
    Warp size:              32
    Max. GPRs/work-group:   65536
    Compute capability:     6.1 (sm_61)
    Kernel exec. timeout:   yes
    NVML id:                0
    PCI device topology:    03:00.0
    PCI lanes:              16/16
    Fan speed:              27%
    Temperature:            37°C
    Utilization:            0%
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.