|
Message-ID: <20120328002849.GD19375@openwall.com> Date: Wed, 28 Mar 2012 04:28:49 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: rawsha256.cu patch(using shared memory) On Tue, Mar 27, 2012 at 11:26:58PM +0800, myrice wrote: > I used shared memory in rawsha256.cu(Just as Lukas comments as to-do) > There are still space for improvement. I think sha256 access patterns have > bank conflict. > Overall speedup by ~6% in sha256 and 8% in sha224 ... > =====Before=============== ... > Benchmarking: raw-sha256-cuda [SHA256]... DONE > Raw: 1979K c/s real, 1998K c/s virtual > Average: 1933.3 c/s real, 1965.6 c/s virtual > > ============After================= ... > Benchmarking: raw-sha256-cuda [SHA256]... DONE > Raw: 2062K c/s real, 2085K c/s virtual > Average: 2048.6 c/s real, 2080.0 c/s virtual > > Speedup: ~6% That's nice, but this is still awfully slow. In fact, even the benchmarks we have on the wiki somehow show higher speeds, even though you have a faster card (GTX-580, right?) * C-01: i3 2100, 4GB 1333MHz, GeForce 9800GT, slackware 13.1 32bit * C-03: C2Duo P7350 2GHz,GF 9600m * C-04: 9800GTX * C-06: GTX 460 1024M Benchmarking: SHA256CUDA [SHA256] DONE john-1.7.6-sha256cuda-0.diff * C-01 : Raw: 5734K c/s real, 5745K c/s virtual * C-03 : Raw: 1795k c/s real, 1795k c/s virtual * C-04 : Raw: 4456k c/s real 4412k c/s virtual * C-06 : Raw: 10443K c/s real, 10527K c/s virtual (This is for an older revision of Lukas' code.) Here's what I am getting on CPU with OpenSSL calls: Benchmarking: Raw SHA-256 [32/64]... DONE Raw: 1565K c/s real, 1565K c/s virtual Benchmarking: Raw SHA-256 [32/64]... (8xOMP) DONE Raw: 6342K c/s real, 791325 c/s virtual The formats interface bottleneck is somewhere above 50M c/s. Actually, --format=dummy shows it at around 130M c/s on Core i7-2600, which is what you said you use, but indeed interfacing to the GPU takes time. With Samuele's fast hash implementations in OpenCL and running on GPU, we're getting close to 50M c/s. So you also need to get close to that. This is a good thing for you to attempt. (And once you get there, you'd need to somehow demonstrate that your code would be even faster without the interface bottleneck - e.g., by starting to implement candidate password generation and hash comparison on GPU in whatever quick way you can for the demo.) Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.