|
Message-ID: <CANJ2NMO=y9ibM1He=6sSeH7LHxdqjrm6fqOhVpHGchyN5F9YNQ@mail.gmail.com> Date: Tue, 26 Jun 2012 17:13:07 +0800 From: myrice <qqlddg@...il.com> To: john-dev@...ts.openwall.com Subject: Re: async key transfers to GPU On Mon, Jun 25, 2012 at 3:07 AM, Solar Designer <solar@...nwall.com> wrote: > myrice - > Yes, I did not suggest to use multiple streams. I am not familiar with > this, but Lukas was able to have data transfers to GPU overlap with > computation on GPU by interleaving these inside crypt_all(). I am > suggesting an improvement upon this where you'd only need two chunks for > (potentially) full efficiency, whereas Lukas' inside-crypt_all() > approach would need more chunks to get close to full efficiency (but not > reach it). > > Please do try this out and post your results. > I split the memcpyH2D into 2. One in set_key(), one in crypt_all(). Others remain the same. With 1/4 long password, there are ~4M and ~3M improvement in many salts and one salt. ======Before============= [12:35:02 myrice] run $ ./john -te=1 -fo=xsha512-cuda Benchmarking: Mac OS X 10.7+ salted SHA-512 [CUDA]... DONE Many salts: 61086K c/s real, 61652K c/s virtual Only one salt: 17476K c/s real, 17096K c/s virtual ======After=============== [12:35:53 myrice] run $ ./john -te=1 -fo=xsha512-cuda Benchmarking: Mac OS X 10.7+ salted SHA-512 [CUDA]... DONE Many salts: 65278K c/s real, 65925K c/s virtual Only one salt: 20695K c/s real, 21254K c/s virtual Thanks myrice
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.