|
Message-ID: <20120922045839.GB3458@openwall.com> Date: Sat, 22 Sep 2012 08:58:39 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: bitslice DES on GPU Sayantan, On Sat, Sep 22, 2012 at 01:49:48AM +0530, Sayantan Datta wrote: > > On Sat, Sep 22, 2012 at 1:41 AM, Sayantan Datta <std2048@...il.com> wrote: > >> In the previous test Global no. of work items were half of this time. So > >> the overhead is double in this test than the last one. Doesn't this also double the amount of useful work being done? If so, the relative overhead has actually stayed the same. > Here's the correct one: > > 1/(1/78 + 1/35) = 24 Is the 78 obtained as 2*39, where the 39 is my 39M figure for "overhead speed" of your previous code revision? If so, if the overhead actually doubled, you'd need to halve its "speed" figure. So you'd use 19.5 in this equation, not 78. But your assumption that the overhead has doubled is probably wrong anyway, as I tried to explain above. If the overhead did in fact double, you'd obtain total speed no better than 19.5. Anyhow, is your new code revision available anywhere? Meanwhile, I had a nice conversation with atom on #openwall. He had found this paper, which he wanted to share with us: https://www.emsec.rub.de/media/crypto/attachments/files/2011/03/DA_Schober.pdf This is diploma thesis of Marc Schober. On page 56 (page 64 per the PDF file's numbering), Marc talks about bitslice DES on GPU. This continues on page 111 (119 per PDF), including a funny quote from me dating back to 2006 (yes, I really did not look into GPUs at the time, and they were not used for password cracking until 2007). Marc's work apparently occurred some time between 2008 and 2010 (the PDF was generated on 2010-09-13). This also explains comparison against JtR's DES running on one CPU core only. (I only implemented OpenMP for DES in May of 2010 in form of a separate patch, which Marc might not have been aware of, or his experiments might have occurred earlier than May 2010.) Anyway, Marc achieved 12.9M c/s with bitslice DES on GTX 260, and he estimated a theoretical peak performance of 27M to 31M on that GPU. This is with Matthew Kwan's S-boxes (which we've since replaced with Roman's smaller ones). As is common with academic papers, no code is provided. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.