|
Message-Id: <17D1860D-8597-4637-99AC-9B947F8141B5@gmail.com>
Date: Thu, 23 Apr 2015 23:35:44 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: [GSoC] JtR SIMD support enhancements
> On Apr 23, 2015, at 6:37 PM, Solar Designer <solar@...nwall.com> wrote:
>
> Hi Lei,
>
> On Thu, Apr 23, 2015 at 06:25:27PM +0800, Lei Zhang wrote:
>> I just finished adding MIC/AVX512 support to the remaining formats in JtR (great thanks to magnum's work). Now all formats with MIC intrinsics enabled passed self-tests on MIC.
>
> Great. What speeds are you getting?
Please see the attachment for a full report.
> Have you tried tuning the interleave factors already? And simpler
> things such as OMP_SCALE?
I did tune a bunch of OMP_SCALEs. Some them are too big by default and would drain MIC's memory if not tuned. There're just too many formats there to do a thorough check. So I just picked out some formats that have too big a OMP_SCALE (e.g. > 4096), and experimentally tuned it one by one.
I'm not sure of the "interleave factors". Could you be more specific?
> Regarding OpenMP offload experiments:
>
>> BF_std:
>> Currently this is the only one that works.
>> -----------------------------------------------------
>> [zhanglei@...ter src]$ ../run/john --test --format=bcrypt
>> Will run 12 OpenMP threads
>> Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X2]... DONE
>> Raw: 1552 c/s real, 1555 c/s virtual
>> -----------------------------------------------------
>
> What exactly is benchmarked here? Is this 12 threads running on MIC?
> I guess 12 came from the host CPU's number of hardware threads, and as
> we know it is way too low for MIC. What will happen if you force
> OMP_NUM_THREADS=240 in this test? Anyway, we should have it run the
> proper number of threads for the device it's offloading to - but only on
> that device, obviously.
>
> In fact, the performance you're seeing here is too good to be for 12
> threads (out of 240 possible) on MIC, but too poor to be for 12 threads
> on host. So I am puzzled. Can you figure this out? Check "micsmc -a |
> less" and "top" (on both host and MIC) while this is running, etc.
Actually, in BF_std.c, I only added a single line of pragma directive (plus a bunch of "__attribute__((target(mic)))"s):
-----------------------------------------------------
#pragma offload target(mic) inout(salt:length(1))
#pragma omp parallel for ...
-----------------------------------------------------
The '12 OpenMP threads' reported should've been detected by host code. The default number of threads used by offloaded code for MIC should be 236. I tried adding a "printf("%d\n", omp_get_num_threads());" in the offloaded code, and the output confirmed my expectation.
BTW, I did some experiment to find out the default number of threads is 240 in native mode, but 236 in offload mode. I guess that, in offload mode, one of MIC's 60 cores is preserved for communicating with the host.
Lei
View attachment "log.txt" of type "text/plain" (44926 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.