|
Message-ID: <20120417163445.GA11450@openwall.com> Date: Tue, 17 Apr 2012 20:34:45 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Weekly report 1 myrice - Thank you for starting with the weekly reports early. This is very nice. Since we're going to ask other students working on JtR to post weekly reports in here as well, please include your nickname in the Subjects of further weekly reports so that it is more obvious which report any replies refer to. On Tue, Apr 17, 2012 at 03:10:07PM +0800, myrice wrote: > Last week, my accomplishments are: > > 1. For correctness, implemented cmp_exact for xsha512-cuda > 2. Reversed last 3 rounds of sha512 in xsha512 > 3. Implemented unoptimized xsha512-opencl Great. > 4. Tried async cpy on GPU but no performance gains, still keep tuning. IIRC, what you tried was not supposed to result in any speedup because your GPU code was invoked and required the data to be already available right after you started the async copy - so you had it waiting for data right at that point anyway. Lukas' code was different: IIRC, he split the buffered candidate passwords in three smaller chunks, where two of the three may in fact be transferred to the GPU asynchronously while the previous chunk is being processed. You may implement that too, and I suggest that you make the number of chunks to use configurable and try values larger than 3 (e.g., 10 might be reasonable - letting you hide the latency for 9 out of 10 transfers while hopefully not exceeding the size of a CPU data cache). > In the next week, my priorities are: > > 1. Optimize sha512 stuff in xsha512 > For one round sha512, ctx can be replaced by a string. Merge init, > update, final in one function OK. I think the compiler does something very much like this already, but doing it manually may allow for further optimizations, so we should do it. > 2. Merge cmp_all() with crypt_all() > For crypt_all(), we just return. In cmp_all(), we invoke GPU and return > a value indicate if there is a matched hash. This is going to be problematic. It will only work well for the special case (albeit most common) of exactly one hash per salt. When there are a few more hashes per salt, cmp_all() is called multiple times, so you will once again have increased CPU/GPU interaction overhead. When there are many more hashes per salt, cmp_all() is not called at all, but instead a get_hash*() function is called. This is why I suggested caching of loaded hashes in previous calls to cmp_all(), such that you can move the comparisons into crypt_all() starting with the second call to that function. Then your GPU code for crypt_all() will return a flag telling your CPU code for cmp_all() to just return that fixed value instead of invoking any GPU code. > 3. Keep optimizing xsha512-opencl OK. > 4. Discussing password generation, maybe not implement this in next week. Yes. Definitely don't implement it that soon - we need to discuss first. Thanks again, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.