Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150423103753.GA4815@openwall.com>
Date: Thu, 23 Apr 2015 13:37:53 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: [GSoC] JtR SIMD support enhancements

Hi Lei,

On Thu, Apr 23, 2015 at 06:25:27PM +0800, Lei Zhang wrote:
> I just finished adding MIC/AVX512 support to the remaining formats in JtR (great thanks to magnum's work). Now all formats with MIC intrinsics enabled passed self-tests on MIC.

Great.  What speeds are you getting?

Have you tried tuning the interleave factors already?  And simpler
things such as OMP_SCALE?

Regarding OpenMP offload experiments:

> BF_std:
> Currently this is the only one that works.
> -----------------------------------------------------
> [zhanglei@...ter src]$ ../run/john --test --format=bcrypt
> Will run 12 OpenMP threads
> Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X2]... DONE
> Raw:    1552 c/s real, 1555 c/s virtual
> -----------------------------------------------------

What exactly is benchmarked here?  Is this 12 threads running on MIC?
I guess 12 came from the host CPU's number of hardware threads, and as
we know it is way too low for MIC.  What will happen if you force
OMP_NUM_THREADS=240 in this test?  Anyway, we should have it run the
proper number of threads for the device it's offloading to - but only on
that device, obviously.

In fact, the performance you're seeing here is too good to be for 12
threads (out of 240 possible) on MIC, but too poor to be for 12 threads
on host.  So I am puzzled.  Can you figure this out?  Check "micsmc -a |
less" and "top" (on both host and MIC) while this is running, etc.

> Yet I found another minor issue. There're some highly-optimized functions defined in nonstd.c, named s1, s2, etc. Most of them have several implementations, and the preprocessor chooses the best implementation depending on the underlying CPU. When compiling on offload mode, the best implementation for the host CPU is not necessary best for MIC. This won't affect correctness, though.

When compiling a given function for offload, I'd expect the various cpp
macros to be set corresponding to the target device's architecture.  So
this issue shouldn't be specific to the case of offloading.  We need to
tune things for MIC either way, enabling such tuning in proper #ifdef's.

Thanks,

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.