Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120417163445.GA11450@openwall.com>
Date: Tue, 17 Apr 2012 20:34:45 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Weekly report 1

myrice -

Thank you for starting with the weekly reports early.  This is very nice.
Since we're going to ask other students working on JtR to post weekly
reports in here as well, please include your nickname in the Subjects of
further weekly reports so that it is more obvious which report any
replies refer to.

On Tue, Apr 17, 2012 at 03:10:07PM +0800, myrice wrote:
> Last week, my accomplishments are:
> 
> 1. For correctness, implemented cmp_exact for xsha512-cuda
> 2. Reversed last 3 rounds of sha512 in xsha512
> 3. Implemented unoptimized xsha512-opencl

Great.

> 4. Tried async cpy on GPU but no performance gains, still keep tuning.

IIRC, what you tried was not supposed to result in any speedup because
your GPU code was invoked and required the data to be already available
right after you started the async copy - so you had it waiting for data
right at that point anyway.

Lukas' code was different: IIRC, he split the buffered candidate
passwords in three smaller chunks, where two of the three may in fact be
transferred to the GPU asynchronously while the previous chunk is being
processed.  You may implement that too, and I suggest that you make the
number of chunks to use configurable and try values larger than 3 (e.g.,
10 might be reasonable - letting you hide the latency for 9 out of 10
transfers while hopefully not exceeding the size of a CPU data cache).

> In the next week, my priorities are:
> 
> 1. Optimize sha512 stuff in xsha512
>     For one round sha512, ctx can be replaced by a string. Merge init,
> update, final in one function

OK.  I think the compiler does something very much like this already,
but doing it manually may allow for further optimizations, so we should
do it.

> 2. Merge cmp_all() with crypt_all()
>     For crypt_all(), we just return. In cmp_all(), we invoke GPU and return
> a value indicate if there is a matched hash.

This is going to be problematic.  It will only work well for the special
case (albeit most common) of exactly one hash per salt.  When there are
a few more hashes per salt, cmp_all() is called multiple times, so you
will once again have increased CPU/GPU interaction overhead.  When there
are many more hashes per salt, cmp_all() is not called at all, but
instead a get_hash*() function is called.

This is why I suggested caching of loaded hashes in previous calls to
cmp_all(), such that you can move the comparisons into crypt_all()
starting with the second call to that function.  Then your GPU code for
crypt_all() will return a flag telling your CPU code for cmp_all() to
just return that fixed value instead of invoking any GPU code.

> 3. Keep optimizing xsha512-opencl

OK.

> 4. Discussing password generation, maybe not implement this in next week.

Yes.  Definitely don't implement it that soon - we need to discuss first.

Thanks again,

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.