|
Message-ID: <cbb3706666d30d0b31c93dd9644fe162@smtp.hushmail.com> Date: Sun, 26 Feb 2012 14:31:09 +0100 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: OpenCL for CPU (AMD vs. Intel) On 02/23/2012 09:54 PM, magnum wrote: > I had a look at Intel's OpenCL SDK today. This one makes your Intel CPU > (or better, all CPUs and cores) work just like a GPU with OpenCL. Intels OpenCL SDK works fine with an AMD CPU and vice versa, and there's no problem installing both. Armed with the new --platform option in git version of Jumbo, I can now do some comparisons. Here's the output of --platform=LIST with both SDK versions installed: $ ../run/john -platform=LIST Platform #0 name: Intel(R) OpenCL Platform version: OpenCL 1.1 LINUX Device #0 name: Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz Device vendor: Intel(R) Corporation Device type: CPU Device version: OpenCL 1.1 (Build 15293.6649) Driver version: 1.1 Global Memory: 3855 MB Global Memory Cache: 3 MB Local Memory: 32 KB Max clock (MHz) : 2400 Max Work Group Size: 1024 Parallel compute cores: 2 Platform #1 name: AMD Accelerated Parallel Processing Platform version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Device #0 name: Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz Device vendor: GenuineIntel Device type: CPU Device version: OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Driver version: 2.0 Global Memory: 3855 MB Global Memory Cache: 0 MB Local Memory: 32 KB Max clock (MHz) : 2401 Max Work Group Size: 1024 Parallel compute cores: 2 So, let's try the phpass format: $ ../run/john -test -form:phpass-opencl -platform=0 OpenCL Platforms: 2 OpenCL Platform: <<<Intel(R) OpenCL>>> 1 device(s), using device: <<<Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz>>> Compilation log: Build started Kernel <phpass> was successfully vectorized Done. Optimal Group work Size = 64 Benchmarking: PHPASS-OPENCL [PORTABLE-MD5]... DONE Raw: 17226 c/s real, 8613 c/s virtual $ ../run/john -test -form:phpass-opencl -platform=1 OpenCL Platforms: 2 OpenCL Platform: <<<AMD Accelerated Parallel Processing>>> 1 device(s), using device: <<<Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz>>> Optimal Group work Size = 8 Benchmarking: PHPASS-OPENCL [PORTABLE-MD5]... DONE Raw: 6063 c/s real, 3051 c/s virtual Intel's compiler says it vectorized phpass and it's indeed almost three times faster than running under AMD. I'm not sure AMD will tell if/when it vectorizes. I'm not even sure it will aver auto vectorize code? $ ../run/john -test -form:cryptmd5-opencl OpenCL Platforms: 2 OpenCL Platform: <<<Intel(R) OpenCL>>> 1 device(s), using device: <<<Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz>>> Compilation log: Build started Kernel <cryptmd5> was not vectorized Done. Benchmarking: CRYPTMD5-OPENCL [MD5-based CRYPT]... DONE Raw: 8694 c/s real, 4306 c/s virtual $ ../run/john -test -form:cryptmd5-opencl -platform=1 OpenCL Platforms: 2 OpenCL Platform: <<<AMD Accelerated Parallel Processing>>> 1 device(s), using device: <<<Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz>>> Benchmarking: CRYPTMD5-OPENCL [MD5-based CRYPT]... DONE Raw: 9404 c/s real, 4714 c/s virtual Here AMD beats Intel but both are slower than even the generic 32-bit code. So let's try SHA-1: $ ../run/john -test -form:mysql-sha1-opencl OpenCL Platforms: 2 OpenCL Platform: <<<Intel(R) OpenCL>>> 1 device(s), using device: <<<Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz>>> Compilation log: Build started Kernel <sha1_crypt_kernel> was not vectorized Done. Max Group Work Size 1024 Optimal local work size 128 (to avoid this test on next run do export LWS=128) Local work size (LWS) 128, Keys per crypt (KPC) 2097152 Benchmarking: MySQL 4.1 double-SHA-1 [mysql-sha1-opencl]... DONE Many salts: 1923K c/s real, 1069K c/s virtual Only one salt: 1923K c/s real, 1064K c/s virtual $ ../run/john -test -form:mysql-sha1-opencl -platform=1 OpenCL Platforms: 2 OpenCL Platform: <<<AMD Accelerated Parallel Processing>>> 1 device(s), using device: <<<Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz>>> Max Group Work Size 1024 Optimal local work size 1024 (to avoid this test on next run do export LWS=1024) Local work size (LWS) 1024, Keys per crypt (KPC) 2097152 Benchmarking: MySQL 4.1 double-SHA-1 [mysql-sha1-opencl]... DONE Many salts: 2995K c/s real, 1638K c/s virtual Only one salt: 2853K c/s real, 1638K c/s virtual AMD wins again. The AMD figure is better than generic 32-bit code but slower (per core) than generic 64-bit. Figures might be a little better with an optimised keys-per-crypt but that test will take an awful lot of time with the current scheme. Maybe the default KPC should depend on reported number of cores. And maybe the auto KPC should do a some kind of binary search for homing in. Start low, double until it's slower, back off half the last increment and so on. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.