|
Message-ID: <CABob6iruyHeT=X5q3m1ZJJWP0fm5E4uFSXVTZW6fawt6iTDQdg@mail.gmail.com> Date: Sat, 23 Jun 2012 22:39:00 +0200 From: Lukas Odzioba <lukas.odzioba@...il.com> To: john-dev@...ts.openwall.com Subject: Re: WPA-PSK fixes, OpenMP support 2012/6/23 Solar Designer <solar@...nwall.com>: > Lukas, magnum, all - > > The attached patch fixes two out of bounds writes that occurred all the > time (both of them in CPU code, one of them in GPU code), prevents out > of bounds writes on over-long passwords and on missed set_key() (which > may happen during self-tests with large max_keys_per_crypt), removes the > dependency on some char arrays on the stack being int-aligned, and > finally adds OpenMP support for the CPU code. (Of course, we'd achieve > much better speed by also using sse-intrinsics.c code for SHA-1.) Thank you for working on that. I tried to add omp support but my code introduced some bug and I sent magnum code without that. > BTW, where does the length 15 limit come from? Can/should we avoid it? It is something that I've got in mind and forgot to change. We can easily avoid that, even for gpu patches. I'll send fix for that. > Here are some speeds. FX-8120, one CPU core in use: > > Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [32/64]... DONE > Raw: 401 c/s real, 401 c/s virtual > > FX-8120, OpenMP build: > > Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [32/64]... (8xOMP) DONE > Raw: 2032 c/s real, 253 c/s virtual > > GTX 570 1600 MHz: > > Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [CUDA]... DONE > Raw: 28444 c/s real, 28595 c/s virtual > > HD 7970: > > OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). > Using device 0: Tahiti > Max Group Work Size 256 > Optimal Group work Size = 96 > Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [OpenCL]... DONE > Raw: 42164 c/s real, 121720 c/s virtual > > Same two GPUs, OpenMP build: > > Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [CUDA]... (8xOMP) DONE > Raw: 32385 c/s real, 16541 c/s virtual > > OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). > Using device 0: Tahiti > Max Group Work Size 256 > Optimal Group work Size = 128 > Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [OpenCL]... (8xOMP) DONE > Raw: 55138 c/s real, 41890 c/s virtual > > Hmm, somehow "Optimal Group work Size" is different here. > > BTW, "group work size" sounds weird. Do we actually mean "global work > size" or "work-group size"? It should be work group size. Even better results we could get by overlapping cpu and gpu code ( we would utilize cpu AND gpu at one time), and there is a chance ( cpu time < gpu time )that sse-intrinsics.c won't be needed, (speedup won't be noticable because of low cpu usage) - but this thesis need to be verified. Lukas
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.