Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANJ2NMOq0DxCJ5K4rLgq0y3x3ZziuP2Shh_uxMKkY7ZDurqE-g@mail.gmail.com>
Date: Wed, 18 Apr 2012 20:38:47 +0800
From: myrice <qqlddg@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Weekly report 1

On Wed, Apr 18, 2012 at 12:34 AM, Solar Designer <solar@...nwall.com> wrote:
>
> IIRC, what you tried was not supposed to result in any speedup because
> your GPU code was invoked and required the data to be already available
> right after you started the async copy - so you had it waiting for data
> right at that point anyway.
>
> Lukas' code was different: IIRC, he split the buffered candidate
> passwords in three smaller chunks, where two of the three may in fact be
> transferred to the GPU asynchronously while the previous chunk is being
> processed.  You may implement that too, and I suggest that you make the
> number of chunks to use configurable and try values larger than 3 (e.g.,
> 10 might be reasonable - letting you hide the latency for 9 out of 10
> transfers while hopefully not exceeding the size of a CPU data cache).
>
> I tried after Lukas posted his code. If you remember, I have a ITERATIONS
in my code. I split max_keys_per_crypt to ITERATIONS parts. I have not
posted my new result. From profiler, the cudamemcpy is still not overlap
with computing kernel and there is performance regression.


> > 2. Merge cmp_all() with crypt_all()
> >     For crypt_all(), we just return. In cmp_all(), we invoke GPU and
> return
> > a value indicate if there is a matched hash.
>
> This is going to be problematic.  It will only work well for the special
> case (albeit most common) of exactly one hash per salt.  When there are
> a few more hashes per salt, cmp_all() is called multiple times, so you
> will once again have increased CPU/GPU interaction overhead.  When there
> are many more hashes per salt, cmp_all() is not called at all, but
> instead a get_hash*() function is called.
>
> Yes, I just noticed this. I took a look at crack.c.
In crk_password_loop(), we invoke crypt_all for crypt a bunch of passwords.
And next we invoke cmp_all for all hashes with same salt. But I still not
sure about how to use get_hash*().


> This is why I suggested caching of loaded hashes in previous calls to
> cmp_all(), such that you can move the comparisons into crypt_all()
> starting with the second call to that function.  Then your GPU code for
> crypt_all() will return a flag telling your CPU code for cmp_all() to
> just return that fixed value instead of invoking any GPU code.
>
This just likes caching of salts as you suggested too. I think they belong
to same type of question. In current interface, we will invoke cmp_all() in
fmt_sefl_test(), benchmark_format() and in real crack function. First
problem is to distinguish these functions and only caching useful loaded
hashes. And the next problem is what can we get from merging GPU code of
cmp_all() and crypt_all(). From profiler, the cmp_all() takes 1% or below
GPU time(Include memcpyHtoD). It will not impact on performance. Does it
mean that it won't make sense from the merging?

Thanks!
Dongdong Li

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.