|
Message-ID: <20120920045015.GA27255@openwall.com> Date: Thu, 20 Sep 2012 08:50:15 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: bitslice DES on GPU Sayantan - On Wed, Sep 19, 2012 at 05:42:40PM +0530, Sayantan Datta wrote: > On Fri, Sep 14, 2012 at 2:02 AM, Solar Designer <solar@...nwall.com> wrote: > > > So that's about 41M c/s at 25 iterations if you somehow manage to remove > > the overhead. In other words, the overhead still corresponds to about > > 50% of total running time. > > I'm trying to figure out where the overhead lies. I thought it might be the > cmp_all() function. So I tried using openmp in cmp_all. However there was > no improvement at all. Yet this test doesn't mean that cmp_all() doesn't correspond to a significant portion of the overhead. OpenMP does not always speed things up, and there are specific reasons why it tends to perform poorly inside the bitslice DES cmp_all() (I tried this before). Also note that cmp_all() normally tests only a few elements of B[] - e.g., around 5 of them when dealing with 32-bit vector elements - yet you're transferring the entire 64-element B[] from GPU. So you'd probably avoid more overhead by transferring a portion of B[] only until/unless more of it is actually needed, than by speeding up cmp_all() itself. Anyhow, a next step would be to do comparisons on the GPU side anyway - using the interfaces and approaches myrice experimented with during the summer. > This makes me think that the overhead lies primarily > in set_key which is called maximum number of times. Can you provide any > suggestions? set_key() definitely corresponds to a large portion of the overhead. But a few other things do as well. Note that LM hashes use the same kind of set_key(), yet they achieve speeds of 50M to 110M c/s on CPU (on one core). So the slowdown from 41M (on GPU alone) to <20M (on GPU with CPU side's overhead) can't be explained by set_key() alone. > How about doing the set key in GPU using another kernel? Note that our current set_key() doesn't do anything other than copying the data bytes in the right places. Even if you do this on GPU, you'd have an equivalent amount of work on CPU side anyway, to get the keys saved into a buffer and transferred to GPU at once. So that's not a solution at all. The solution is to generate keys on the GPU, not just "set" them on the GPU. This is also something myrice experimented with during the summer, and we'll need to do it for descrypt on GPU eventually. At this time, though, I suggest that you try to improve upon the 41M without-overhead speed. It is not good enough anyway (roughly twice slower than hashcat's actual/usable speed). I think you need to try DES_BS_EXPAND=0. The target without-overhead speed is ~300M. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.