john-dev - Re: Password Generation on GPU

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20120605180335.GB19746@openwall.com>
Date: Tue, 5 Jun 2012 22:03:35 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Password Generation on GPU

On Tue, Jun 05, 2012 at 02:12:33AM +0800, myrice wrote:
> Sorry for unclearness. I mean, for example, we may implement
> incremental mode on GPU. The password will generate totally on GPU. No
> password candidate will copy to GPU.

Yes, we may do that eventually.  It will be slower than mask mode in
terms of c/s rate, though, since it'd need to use global memory
(incremental mode currently needs about 6 MB for length 8 with 95
different chars).  Yet it may be faster in terms of success rate.

> With set_mask(), we still have to
> copy password candidates, which are generated on CPU, to GPU.

Not exactly.  We only copy some base strings, from which actual
candidates are formed on GPU.  For example, if the mask is for 3
character positions where each can take 10 possible values (e.g.,
they're digits), then we only copy one base string per 1000 candidate
passwords.  The latter are generated on GPU and in fast on-die memory or
even in registers (they can be hashed and forgotten right away).

> But you said "So theoretically it might provide greater speed than
> what we'd achieve by having each thread generate its entirely
> independent stream of candidate passwords."

I was primarily referring to CPU-only code in that quote (no GPU
involved at all), like my LM hash code.

> So does it mean that
> implement password generation totally on GPU is not effective?

No.  However, reducing the number of strings transferred from CPU to
GPU, say, by a factor of 1000 is just good enough.

> Nevertheless, as you point out in other parts, I will happy to
> implement set_mask() first and it is good to begin with.

I am trying to offer a simpler initial task for you and something that
would almost fully remove the performance bottleneck.  Incremental mode
on GPU (as you suggested) is not that easy - for example, you'd need to
deal with issues relating to interrupt/restore of sessions, which I
guess you haven't even thought of yet - and it would have its own
bottleneck (use of global memory), which wouldn't let your fast hash
code seriously compete e.g. with hashcat's.

> I become more clear of this. So we first create and implement a new
> mode - mask mode. But only with the mask, we don't have password
> candidates base for the mask to apply on. So we have to at least
> coordinate with one exist mode first? Or we may make mask mode more
> powerful. So we can generate password candidates from the mask(rule?)?

Mask mode is going to be just that - mask only.  At a later time, we
might also introduce the ability to combine it with other modes, but
this is not to be implemented right away.

Please see how other tools have mask modes - insidepro's programs and
hashcat do.  We will even need to use compatible syntax for the mask.

> >> 3) We still have to call multiple time of set_key()
> >
> > Maybe, but not necessarily.  This depends on cracking mode and its
> > implementation.  For mask mode, we may choose to call set_key() just
> > once per crypt_all() - having the pre-set mask do everything else.
> 
> So we could just base on a simple password candidate and enumerate
> many more with the mask?

Yes.  That base string is not a password candidate yet, though - it's
just a base string to apply the mask to.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.