john-dev - Re: WPA-PSK fixes, OpenMP support

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABob6iruyHeT=X5q3m1ZJJWP0fm5E4uFSXVTZW6fawt6iTDQdg@mail.gmail.com>
Date: Sat, 23 Jun 2012 22:39:00 +0200
From: Lukas Odzioba <lukas.odzioba@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: WPA-PSK fixes, OpenMP support

2012/6/23 Solar Designer <solar@...nwall.com>:
> Lukas, magnum, all -
>
> The attached patch fixes two out of bounds writes that occurred all the
> time (both of them in CPU code, one of them in GPU code), prevents out
> of bounds writes on over-long passwords and on missed set_key() (which
> may happen during self-tests with large max_keys_per_crypt), removes the
> dependency on some char arrays on the stack being int-aligned, and
> finally adds OpenMP support for the CPU code.  (Of course, we'd achieve
> much better speed by also using sse-intrinsics.c code for SHA-1.)
Thank you for working on that. I tried to add omp support but my code
introduced some bug and I sent magnum code without that.

> BTW, where does the length 15 limit come from?  Can/should we avoid it?
It is something that I've got in mind and forgot to change. We can
easily avoid that, even for gpu patches. I'll send fix for that.

> Here are some speeds.  FX-8120, one CPU core in use:
>
> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [32/64]... DONE
> Raw:    401 c/s real, 401 c/s virtual
>
> FX-8120, OpenMP build:
>
> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [32/64]... (8xOMP) DONE
> Raw:    2032 c/s real, 253 c/s virtual
>
> GTX 570 1600 MHz:
>
> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [CUDA]... DONE
> Raw:    28444 c/s real, 28595 c/s virtual
>
> HD 7970:
>
> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
> Using device 0: Tahiti
> Max Group Work Size 256
> Optimal Group work Size = 96
> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
> Raw:    42164 c/s real, 121720 c/s virtual
>
> Same two GPUs, OpenMP build:
>
> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [CUDA]... (8xOMP) DONE
> Raw:    32385 c/s real, 16541 c/s virtual
>
> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
> Using device 0: Tahiti
> Max Group Work Size 256
> Optimal Group work Size = 128
> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [OpenCL]... (8xOMP) DONE
> Raw:    55138 c/s real, 41890 c/s virtual
>
> Hmm, somehow "Optimal Group work Size" is different here.
>
> BTW, "group work size" sounds weird.  Do we actually mean "global work
> size" or "work-group size"?

It should be work group size.
Even better results we could get by overlapping cpu and gpu code ( we
would utilize cpu AND gpu at one time), and there is a chance ( cpu
time < gpu time )that sse-intrinsics.c  won't be needed, (speedup
won't be noticable because of low cpu usage) - but this thesis need to
be verified.

Lukas

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.