Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABob6iruyHeT=X5q3m1ZJJWP0fm5E4uFSXVTZW6fawt6iTDQdg@mail.gmail.com>
Date: Sat, 23 Jun 2012 22:39:00 +0200
From: Lukas Odzioba <lukas.odzioba@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: WPA-PSK fixes, OpenMP support

2012/6/23 Solar Designer <solar@...nwall.com>:
> Lukas, magnum, all -
>
> The attached patch fixes two out of bounds writes that occurred all the
> time (both of them in CPU code, one of them in GPU code), prevents out
> of bounds writes on over-long passwords and on missed set_key() (which
> may happen during self-tests with large max_keys_per_crypt), removes the
> dependency on some char arrays on the stack being int-aligned, and
> finally adds OpenMP support for the CPU code.  (Of course, we'd achieve
> much better speed by also using sse-intrinsics.c code for SHA-1.)
Thank you for working on that. I tried to add omp support but my code
introduced some bug and I sent magnum code without that.

> BTW, where does the length 15 limit come from?  Can/should we avoid it?
It is something that I've got in mind and forgot to change. We can
easily avoid that, even for gpu patches. I'll send fix for that.

> Here are some speeds.  FX-8120, one CPU core in use:
>
> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [32/64]... DONE
> Raw:    401 c/s real, 401 c/s virtual
>
> FX-8120, OpenMP build:
>
> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [32/64]... (8xOMP) DONE
> Raw:    2032 c/s real, 253 c/s virtual
>
> GTX 570 1600 MHz:
>
> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [CUDA]... DONE
> Raw:    28444 c/s real, 28595 c/s virtual
>
> HD 7970:
>
> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
> Using device 0: Tahiti
> Max Group Work Size 256
> Optimal Group work Size = 96
> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
> Raw:    42164 c/s real, 121720 c/s virtual
>
> Same two GPUs, OpenMP build:
>
> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [CUDA]... (8xOMP) DONE
> Raw:    32385 c/s real, 16541 c/s virtual
>
> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
> Using device 0: Tahiti
> Max Group Work Size 256
> Optimal Group work Size = 128
> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [OpenCL]... (8xOMP) DONE
> Raw:    55138 c/s real, 41890 c/s virtual
>
> Hmm, somehow "Optimal Group work Size" is different here.
>
> BTW, "group work size" sounds weird.  Do we actually mean "global work
> size" or "work-group size"?

It should be work group size.
Even better results we could get by overlapping cpu and gpu code ( we
would utilize cpu AND gpu at one time), and there is a chance ( cpu
time < gpu time )that sse-intrinsics.c  won't be needed, (speedup
won't be noticable because of low cpu usage) - but this thesis need to
be verified.

Lukas

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.