|
Message-Id: <78F1A508-2CBB-4581-89CE-A9C20357F1EF@gmail.com> Date: Mon, 14 Sep 2015 22:16:54 +0800 From: Lei Zhang <zhanglei.april@...il.com> To: john-dev@...ts.openwall.com Subject: Re: SHA-1 H() On Sep 14, 2015, at 8:27 PM, Solar Designer <solar@...nwall.com> wrote: > > On Mon, Sep 14, 2015 at 04:33:55PM +0800, Lei Zhang wrote: >> In case it's helpful, here're some benchmark figures of JtR running on SDE: > > And, did you fully make use of them (turning all of the > 3-input basic functions of MD4/MD5/SHA-1/SHA-2 into single instructions) > or only to define vcmov() for now? I turned all 3-input functions to using a single TERNLOG instruction, except for those that are already using a single CMOV (I thought one CMOV is good enough, but forgot it might be emulated). Now that you mentioned it, I also used TERNLOG to emulate CMOV. Here's the latest results: Benchmarking: Raw-MD4 [MD4 512/512 AVX512F 16x3]... DONE Raw: 219184 c/s real, 219184 c/s virtual Benchmarking: Raw-MD5 [MD5 512/512 AVX512F 16x3]... DONE Raw: 138917 c/s real, 140293 c/s virtual Benchmarking: Raw-SHA1 [SHA1 512/512 AVX512F 16x]... DONE Raw: 99216 c/s real, 99216 c/s virtual Benchmarking: Raw-SHA256 [SHA256 512/512 AVX512F 16x]... DONE Raw: 48839 c/s real, 49328 c/s virtual Benchmarking: Raw-SHA512 [SHA512 512/512 AVX512F 8x]... DONE Raw: 22019 c/s real, 22019 c/s virtual Compared to the previous figures (please refer to my last message), using TERNLOG to emulate CMOV makes JtR slower on SDE. Maybe SDE's emulation of TERNLOG is just not efficient. And here's the results without using any TERNLOG instructions: Benchmarking: Raw-MD4 [MD4 512/512 AVX512F 16x3]... DONE Raw: 444356 c/s real, 448800 c/s virtual Benchmarking: Raw-MD5 [MD5 512/512 AVX512F 16x3]... DONE Raw: 225172 c/s real, 227424 c/s virtual Benchmarking: Raw-SHA1 [SHA1 512/512 AVX512F 16x]... DONE Raw: 212784 c/s real, 212784 c/s virtual Benchmarking: Raw-SHA256 [SHA256 512/512 AVX512F 16x]... DONE Raw: 63413 c/s real, 63413 c/s virtual Benchmarking: Raw-SHA512 [SHA512 512/512 AVX512F 8x]... DONE Raw: 27440 c/s real, 27168 c/s virtual I think that further confirms my statement above: SDE's emulation of TERNLOG is inefficient. >> BTW, SDE runs much more smoothly than I expected. At least those formats listed above ran quite fast on it. > > You mean the program's interactive response time, not the c/s rates. That's exactly what I meant :D Lei
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.