john-dev - Re: bitslice DES on GPU

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120813054011.GA6392@openwall.com>
Date: Mon, 13 Aug 2012 09:40:11 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: bitslice DES on GPU

Sayantan -

On Mon, Aug 13, 2012 at 09:19:35AM +0400, Solar Designer wrote:
> On Mon, Aug 13, 2012 at 08:36:46AM +0530, Sayantan Datta wrote:
> > What should be the value of DES_BS_EXPAND for GPU implentation ?
[...]
> To summarize, you'd need to try three approaches (and their variations):
> 
> 1. Keys in global memory, expanded.
> 
> 2. Keys in global memory, not expanded.
> 
> 3. Keys in local memory, not expanded.
> 
> For LM hashes, it's just #2 or #3 above.  Maybe start with #3?

There's yet another option:

4. Unroll the entire 16-round DES loop.  Then you'll have the right 768
indices (with repeats) right in the code.  If the code fits in the same
cache level that it would with a mere 2-round unroll, then you will
achieve a better speed in this way than you would with the approaches
discussed above.

In the current CPU code, I use a 2x unroll in this loop - that is, I
have 8 iterations with a 2 DES round loop body.  There's not enough L1
instruction cache on a typical CPU for a full 16-round unroll.  Maybe
on some GPUs this is different.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.