john-dev - ignore or limit Idle=Y for non-CPU-only?

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150602005614.GB12572@openwall.com>
Date: Tue, 2 Jun 2015 03:56:14 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: ignore or limit Idle=Y for non-CPU-only?

magnum, all -

For a few years now, JtR's default is Idle=Y.  This works well when
targeting the host's CPUs only and no synchronization is needed.

We already have logic in place to ignore Idle=Y when using OpenMP and
the thread count is greater than 1, because in that case yielding CPU
impacts more than just the current thread (it may also unnecessarily
make other threads wait when they reach the end of a parallel region).

Now, there's a similar issue when targeting non-CPU devices.  When we
yield CPU (because there's other demand for CPU on the host, whether
from another instance of JtR or from something else), we additionally
risk having an external device stall waiting for input from that CPU.

I found that when I use both GPUs and CPUs on a machine at once, with
multiple instances of john, I end up editing john.conf to set Idle=Y in
the CPU-using instances and Idle=N in the GPU using ones.  When I forget
to do that, my GPU usage percentage drops.

Should we possibly ignore Idle=Y when running OpenCL and CUDA formats?

There's another aspect here, though.  When targeting NVIDIA GPUs, we
often end up having a thread busily looping on the CPU.  This is
described e.g. here:

https://devtalk.nvidia.com/default/topic/494659/execute-kernels-without-100-cpu-busy-wait-/

an old thread, but I don't know if things have improved since.  I've
seen the busy loop issue recently.

So to avoid wasting that CPU core for potential concurrent CPU-using
instances of john, maybe we can check if the target device is an NVIDIA
card and if so only partially ignore Idle=Y: do invoke nice(20), but
don't use SCHED_IDLE and don't invoke sched_yield().  Unfortunately,
this would still cause some reduction in GPU usage when there's a
concurrent CPU-using john - just not as much reduction as the current
idle.c code causes.

Or maybe we need to make Idle tri-state?  If so, what exactly would the
three states correspond to?

Ideally, the default would normally not need to be adjusted and would
result in near-optimal behavior.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.