|
Message-ID: <20150423103753.GA4815@openwall.com> Date: Thu, 23 Apr 2015 13:37:53 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: [GSoC] JtR SIMD support enhancements Hi Lei, On Thu, Apr 23, 2015 at 06:25:27PM +0800, Lei Zhang wrote: > I just finished adding MIC/AVX512 support to the remaining formats in JtR (great thanks to magnum's work). Now all formats with MIC intrinsics enabled passed self-tests on MIC. Great. What speeds are you getting? Have you tried tuning the interleave factors already? And simpler things such as OMP_SCALE? Regarding OpenMP offload experiments: > BF_std: > Currently this is the only one that works. > ----------------------------------------------------- > [zhanglei@...ter src]$ ../run/john --test --format=bcrypt > Will run 12 OpenMP threads > Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X2]... DONE > Raw: 1552 c/s real, 1555 c/s virtual > ----------------------------------------------------- What exactly is benchmarked here? Is this 12 threads running on MIC? I guess 12 came from the host CPU's number of hardware threads, and as we know it is way too low for MIC. What will happen if you force OMP_NUM_THREADS=240 in this test? Anyway, we should have it run the proper number of threads for the device it's offloading to - but only on that device, obviously. In fact, the performance you're seeing here is too good to be for 12 threads (out of 240 possible) on MIC, but too poor to be for 12 threads on host. So I am puzzled. Can you figure this out? Check "micsmc -a | less" and "top" (on both host and MIC) while this is running, etc. > Yet I found another minor issue. There're some highly-optimized functions defined in nonstd.c, named s1, s2, etc. Most of them have several implementations, and the preprocessor chooses the best implementation depending on the underlying CPU. When compiling on offload mode, the best implementation for the host CPU is not necessary best for MIC. This won't affect correctness, though. When compiling a given function for offload, I'd expect the various cpp macros to be set corresponding to the target device's architecture. So this issue shouldn't be specific to the case of offloading. We need to tune things for MIC either way, enabling such tuning in proper #ifdef's. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.