Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120624234948.GA7400@openwall.com>
Date: Mon, 25 Jun 2012 03:49:48 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: get_hash*() in GPU formats (was: Jumbo candidate vs Test Suite)

Lukas, myrice -

On Mon, Jun 25, 2012 at 01:00:50AM +0200, Lukas Odzioba wrote:
> 2012/6/25 Solar Designer <solar@...nwall.com>:
> > Why are you dropping the get_hash*() functions?  This would be a
> > performance hit when there are more than a few hashes per salt.
> >
> > Normally, this shouldn't be the case for md5crypt and phpass hashes, but
> > things may be weird in the real world - and even more so in contests as
> > we have recently seen. ;-)
> 
> 1) "crack checking" was moved to gpu code so now we copy back just 1
> byte per hash not BINARY_SIZE bytes.

Why not make it 1 byte per crypt_all() in the typical case (when nothing
got cracked with that one call)?

> 2) code looks cleaner

Yes.

> 3) those are slow formats and I was hoping that it won't be a problem.

Understood, but by that logic your #1 reason doesn't matter. ;-)

Anyway, what I think you could do is have partial hashes sufficient
for get_hash*() to work transferred from GPU the first time a
get_hash*() function is called (if one is called).  That is, have a
global variable ("static" inside the format file) that you'd reset on
crypt_all() to indicate that you do not have the hashes on CPU side yet.
Have a function that transfers the partial hashes from GPU and sets the
variable.  Call this function from all get_hash*() functions when the
variable is zero.

There's already similar code in cuda_xsha512_fmt.c and cuda/xsha512.cu,
except that the variable is only checked inside cuda_xsha512_cpy_hash().
I think those checks should be moved right into get_hash*() to avoid
the function call in the typical case.  And the variable itself should
be moved from the .cu to the .c file.

BTW, you won't notice this in --test benchmarks.  You need to actually
simulate different hashes per salt ratios in sample password hash files
to see the effect of changes in this area.

A next step could be to have this data transfer from GPU overlap with
computation on GPU.  You could achieve this e.g. by predicting that
hashes will be requested (due to past requests for this same salt) and
starting to transfer the first half of hashes while the second half is
being computed, then start the second transfer right before leaving
crypt_all().  get_hash*() calls are made in order of increasing index,
so this may help.  This is probably overkill for slow and salted hashes,
but e.g. for raw SHA-512 it may be done.

An alternative to this is to completely disable the use of CPU side
bitmaps and hash tables for formats that support offload of hash
comparisons onto GPU - and to implement similar bitmaps and hash tables
on GPU side (along with caching of loaded hashes).  A drawback is that
we might need to have fallback code for the case of loaded hashes not
fitting in GPU memory, though.

I think a mix of both approaches may work best: have a higher threshold
for on CPU bitmaps and hash tables, but do support them - with the
optimizations I described above.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.