john-dev - Re: [GSoC] John the Ripper support for PHC finalists

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150427015007.GB27289@openwall.com>
Date: Mon, 27 Apr 2015 04:50:07 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: [GSoC] John the Ripper support for PHC finalists

On Sat, Apr 25, 2015 at 11:53:15PM +0200, Agnieszka Bielec wrote:
> 2015-04-25 23:33 GMT+02:00 Solar Designer <solar@...nwall.com>:
> > Oh, maybe it happens to work precisely because those 4 work-items tend
> > to be part of the same SIMD vector in hardware, so with current OpenCL
> > drivers they happen to be "guaranteed" to be ready at the same time?
> 
> Did you mean CPU, not GPU??

No, I meant GPU.  Why?  Your code looks broken regardless of target
device, even if it happens to work on some devices currently.

> Sorry I forgot to mention that it works on
> GPU because instructions are executed at the same time but this fails
> on CPU.

... and this confirms that the code is broken.  You're relying on things
that are not guaranteed, not even on GPU.

> I was focused only on GPU because coalescing decreased the speed on
> CPU significantly

Why did it?

I think you might be misinterpreting the reasons for speedup and
slowdown with these changes.  While coalescing is relevant, it is not
the only thing that changes.

When you use 4-way SIMD, you take advantage of the parallelism available
within one instance of POMELO.  This might make a lower GWS optimal (now
that you've tried using the proper vector type), and along with it lower
GPU global memory usage.  Unless and until you actually bump into the
total GPU global memory size with a would-be-optimal GWS, this aspect
might not matter (except that caching is slightly more effective when
the total size isn't as much larger than what can be cached), but you
should keep it in mind anyway.

BTW, bumping into total GPU global memory size may be realistic with
these memory-hard hashes.  Our TITAN's 6 GB was the performance
limiting factor in some of the benchmarks here:
http://www.openwall.com/lists/crypt-dev/2014/03/13/1

You could want to list GPU memory usage along with your benchmark
results, so that we can assess how close or not we're getting to there.

> and if somebody want opencl on CPU they should use a old version

OK, but we're also speaking code correctness here.  Except for today's
research purposes, we're not interested in code that just happens to
work today and is expected to break any time.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.