john-dev - Re: Mask mode for GPU

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130615144557.GC20946@openwall.com>
Date: Sat, 15 Jun 2013 18:45:57 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Mask mode for GPU

On Sat, Jun 15, 2013 at 09:32:32AM +0530, Sayantan Datta wrote:
> On Saturday 15 June 2013 06:36 AM, Solar Designer wrote:
> >As to introducing support for format's set_mask() into this - now that's
> >possibly more difficult than it would be with a specialized implementation.
> >Yet I think we should not give up on this approach.  Perhaps we'd have
> >to untie mask mode from rpp, but we may nevertheless start by duplicating
> >much of rpp's structure and initially even code - and only then proceed
> >to customize it for optional use of set_mask().
> 
> I looked into the patch. You are using rpp to generate passwords on cpu 
> even though rpp was primarily meant to process rules which are very 
> similar to password generation. But if I understand correctly,we need 
> only the set of characters for each place holder that will be used on 
> gpu to generate the required password.  I should find a way to do that 
> using rpp's format, right?

You may.  Actually, there's nothing to "find" - it's obvious.  You just
take ctx.ranges[i].chars.

Maybe we need to generalize rpp some further and introduce into it
ability to skip iterating over some of the ranges, leaving that for
set_mask() (presumed to be done in the caller of rpp).  To avoid
confusion, we could rename it from rpp into something different - a name
that would be fine for both uses at once (rules preprocessing and
mask mode).

For now, though, I suggest that you keep your changes to rpp or
rpp-derived code to a minimum (to the extent possible) and focus on the
GPU side of things - actual implementations of the functionality needed
for set_mask().  Frankly, I don't expect sufficiently clean host side
mask mode code from you - I'd expect to have to rewrite it anyway - so
what you need to provide is some working throw-away implementation that
would show what functionality I'd need to implement in a clean fashion.

> Also we need to parallelize rpp's algorithm of password generation for 
> gpu SIMDs.

Not quite.  While we sort of have this issue for --node/--fork, we don't
really have it for fast hashes on GPU, which is where we need on-GPU
set_mask().  Rather, those hashes and those GPUs are so fast that we'd
get acceptable kernel running times when we simply iterate over some
character ranges for some character positions (perhaps two or so) inside
each work-item.  That's what myrice did (with hard-coded ranges for two
character positions) with raw-md5, and it worked well.  Parallelization
thus comes from the host iterating over the rest of character positions.

For example, in a given kernel invocation the host might provide strings
"aaaaaa.." to "aaaapq..", where ".." are any placeholder chars, which
the GPU will overstrike anyway, and the GPU (separately in each
work-item) will iterate from "aa" to "zz" in the last two character
positions.  On the next kernel invocation, the host will start with
"aaaapr..".  (By choosing the weird end/start of range for these two
kernel invocations, except for the very first invocation's range start,
I illustrate that they don't have to be at prettier looking points in
the keyspace, and in practice they generally won't be.  Just whatever
fits in the optimal GWS.)

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.