|
Message-ID: <20150425124621.GA20694@openwall.com> Date: Sat, 25 Apr 2015 15:46:21 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: [GSoC] JtR SIMD support enhancements Lei, On Thu, Apr 23, 2015 at 11:35:44PM +0800, Lei Zhang wrote: > > Regarding OpenMP offload experiments: > > > >> BF_std: > >> Currently this is the only one that works. > >> ----------------------------------------------------- > >> [zhanglei@...ter src]$ ../run/john --test --format=bcrypt > >> Will run 12 OpenMP threads > >> Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X2]... DONE > >> Raw: 1552 c/s real, 1555 c/s virtual > >> ----------------------------------------------------- > > > > What exactly is benchmarked here? Is this 12 threads running on MIC? > > I guess 12 came from the host CPU's number of hardware threads, and as > > we know it is way too low for MIC. What will happen if you force > > OMP_NUM_THREADS=240 in this test? Anyway, we should have it run the > > proper number of threads for the device it's offloading to - but only on > > that device, obviously. > > > > In fact, the performance you're seeing here is too good to be for 12 > > threads (out of 240 possible) on MIC, but too poor to be for 12 threads > > on host. So I am puzzled. Can you figure this out? Check "micsmc -a | > > less" and "top" (on both host and MIC) while this is running, etc. > > Actually, in BF_std.c, I only added a single line of pragma directive (plus a bunch of "__attribute__((target(mic)))"s): > ----------------------------------------------------- > #pragma offload target(mic) inout(salt:length(1)) > #pragma omp parallel for ... > ----------------------------------------------------- > The '12 OpenMP threads' reported should've been detected by host code. The default number of threads used by offloaded code for MIC should be 236. I tried adding a "printf("%d\n", omp_get_num_threads());" in the offloaded code, and the output confirmed my expectation. > > BTW, I did some experiment to find out the default number of threads is 240 in native mode, but 236 in offload mode. I guess that, in offload mode, one of MIC's 60 cores is preserved for communicating with the host. Yes, I had read about that. I think they similarly allocate the last core for communication when using MIC via OpenCL. So, any idea about the weird speed you got for bcrypt here? Here's mine: maybe max_keys_per_crypt is set based on the host's number of threads, so is too low for MIC? Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.