|
Message-ID: <4740e3f0386027415061100893262a59@smtp.hushmail.com> Date: Tue, 12 May 2015 02:11:03 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: Adding OpenMP support to SunMD5 On 2015-05-11 23:06, magnum wrote: > On my core i7 laptop, OMP_SCALE 4 is best, HT or not. Bumping to 8 > slightly degrades HT but does not change non-HT at all. This is with 4: > > $ OMP_NUM_THREADS=4 ../run/john -test -form:sunmd5 && ../run/john -test > -form:sunmd5 > Will run 4 OpenMP threads > Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (4xOMP) DONE > Speed for cost 1 (iteration count) of 5000 > Raw: 2497 c/s real, 629 c/s virtual > > Will run 8 OpenMP threads > Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (8xOMP) DONE > Speed for cost 1 (iteration count) of 5000 > Raw: 2671 c/s real, 345 c/s virtual After replacing the bad #ifdefs for MAX_KEYS_PER_CRYPT (mentioned in an other thread) with just SIMD_COEF_32 * MD5_SSE_PARA, I saw a slowdown. So I added a fixed multiplier and bumped it running a single thread until I seemed to hit a sweet spot. It ended up as #define MIN_KEYS_PER_CRYPT SIMD_COEF_32 #define MAX_KEYS_PER_CRYPT (32 * SIMD_COEF_32 * MD5_SSE_PARA) That ends up, in this case, as 384 while the old ifdefs would pick 96. I got a 6% speedup for single-thread compared to old non-OMP code. Then I ran with 8 threads HT and verified OMP_SCALE. It's now best kept at 1 or 2. New speed: $ ../run/john -test -form:sunmd5 Will run 8 OpenMP threads Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (8xOMP) DONE Speed for cost 1 (iteration count) of 5000 Raw: 2682 c/s real, 351 c/s virtual Then I took it to Super and tried it as-is, hoping for the best figure yet. Unfortunately it did not fly: $ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form:sunmd5 Will run 32 OpenMP threads Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (32xOMP) DONE Speed for cost 1 (iteration count) of 5000 Raw: 8330 c/s real, 260 c/s virtual Frank had 9909 c/s at some point so this is not quite it. But the speed is now much more stable between runs (btw this was also with MEM_ALIGN_CACHE). In fact I even got the exact same speed using OMP_NUM_THREADS=64. That's odd though. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.