|
Message-ID: <20120415013038.GD1296@openwall.com> Date: Sun, 15 Apr 2012 05:30:38 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: MSCash2 OpenCL (was: OpenCL tests on HD 7970) Hi Sayantan, On Fri, Apr 13, 2012 at 10:44:41AM +0530, SAYANTAN DATTA wrote: > I have posted my final performance update(+ 13%) to magnum. It would be > really great if you could test them on 7970 and post the results. It became a lot slower: user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2-opencl -pla=1 OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). Using device 0: Tahiti Benchmarking: MSCASH2-OPENCL [PBKDF2_HMAC_SHA1]... DONE Raw: 36781 c/s real, 52459 c/s virtual GPU load is now reported at 94%. Probably it's not such a good indicator, then. I am also able to get it to 99% by simultaneously running two instances of JtR using the 7970, but the cumulative speed does not improve much (46k c/s above, 60k c/s with your previous code version - still slower than the 75k c/s with one instance of the previous version). For the sake of completeness: user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2-opencl OpenCL platform 0: NVIDIA CUDA, 1 device(s). Using device 0: GeForce GTX 570 Benchmarking: MSCASH2-OPENCL [PBKDF2_HMAC_SHA1]... DONE Raw: 13631 c/s real, 13631 c/s virtual user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2-opencl -pla=1 -dev=1 OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). Using device 1: AMD FX(tm)-8120 Eight-Core Processor Benchmarking: MSCASH2-OPENCL [PBKDF2_HMAC_SHA1]... DONE Raw: 624 c/s real, 78.4 c/s virtual CPU benchmark with old version (that did 75k c/s on 7970): user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2-opencl OpenCL platform 0: AMD Accelerated Parallel Processing, 1 device(s). Using device 0: AMD FX(tm)-8120 Eight-Core Processor Benchmarking: MSCASH2-OPENCL [PBKDF2_HMAC_SHA1]... DONE Raw: 642 c/s real, 90.7 c/s virtual For reference, CPU non-OpenCL benchmarks: -64i: One core (4.5 GHz): user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2 Benchmarking: M$ Cache Hash 2 (DCC2) [SSE2i 8x]... DONE Raw: 1291 c/s real, 1291 c/s virtual OpenMP (something like 3.7 GHz, bumps into 125 W TDP): user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2 Benchmarking: M$ Cache Hash 2 (DCC2) [SSE2i 8x]... (8xOMP) DONE Raw: 3584 c/s real, 446 c/s virtual -xop: One core: user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2 Benchmarking: M$ Cache Hash 2 (DCC2) [SSE2i 8x]... DONE Raw: 1784 c/s real, 1784 c/s virtual OpenMP: user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2 Benchmarking: M$ Cache Hash 2 (DCC2) [SSE2i 8x]... (8xOMP) DONE Raw: 4928 c/s real, 612 c/s virtual With 8 independent processes, I am getting 720 c/s per process, for a total of 5760 c/s (so our OpenMP parallelization for MSCash2 is not perfect - need to improve it). Comparing the best CPU vs. GPU benchmarks, we achieve a 13x speedup by going from XOP with 8 independent processes on FX-8120 o/c to your previous version of the OpenCL code on 7970 (stock clocks so far). Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.