john-dev - Re: ignore or limit Idle=Y for non-CPU-only?

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150620211012.GA10409@openwall.com>
Date: Sun, 21 Jun 2015 00:10:12 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: ignore or limit Idle=Y for non-CPU-only?

magnum - what do you think of the below?  I think that for now we should
simply make Idle=Y ignored when running OpenCL/CUDA formats, just like
it is already ignored for OpenMP.

On Tue, Jun 02, 2015 at 03:56:14AM +0300, Solar Designer wrote:
> magnum, all -
> 
> For a few years now, JtR's default is Idle=Y.  This works well when
> targeting the host's CPUs only and no synchronization is needed.
> 
> We already have logic in place to ignore Idle=Y when using OpenMP and
> the thread count is greater than 1, because in that case yielding CPU
> impacts more than just the current thread (it may also unnecessarily
> make other threads wait when they reach the end of a parallel region).
> 
> Now, there's a similar issue when targeting non-CPU devices.  When we
> yield CPU (because there's other demand for CPU on the host, whether
> from another instance of JtR or from something else), we additionally
> risk having an external device stall waiting for input from that CPU.
> 
> I found that when I use both GPUs and CPUs on a machine at once, with
> multiple instances of john, I end up editing john.conf to set Idle=Y in
> the CPU-using instances and Idle=N in the GPU using ones.  When I forget
> to do that, my GPU usage percentage drops.
> 
> Should we possibly ignore Idle=Y when running OpenCL and CUDA formats?
> 
> There's another aspect here, though.  When targeting NVIDIA GPUs, we
> often end up having a thread busily looping on the CPU.  This is
> described e.g. here:
> 
> https://devtalk.nvidia.com/default/topic/494659/execute-kernels-without-100-cpu-busy-wait-/
> 
> an old thread, but I don't know if things have improved since.  I've
> seen the busy loop issue recently.
> 
> So to avoid wasting that CPU core for potential concurrent CPU-using
> instances of john, maybe we can check if the target device is an NVIDIA
> card and if so only partially ignore Idle=Y: do invoke nice(20), but
> don't use SCHED_IDLE and don't invoke sched_yield().  Unfortunately,
> this would still cause some reduction in GPU usage when there's a
> concurrent CPU-using john - just not as much reduction as the current
> idle.c code causes.
> 
> Or maybe we need to make Idle tri-state?  If so, what exactly would the
> three states correspond to?
> 
> Ideally, the default would normally not need to be adjusted and would
> result in near-optimal behavior.
> 
> Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.