john-dev - Re: async key transfers to GPU

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANJ2NMO=y9ibM1He=6sSeH7LHxdqjrm6fqOhVpHGchyN5F9YNQ@mail.gmail.com>
Date: Tue, 26 Jun 2012 17:13:07 +0800
From: myrice <qqlddg@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: async key transfers to GPU

On Mon, Jun 25, 2012 at 3:07 AM, Solar Designer <solar@...nwall.com> wrote:
> myrice -
> Yes, I did not suggest to use multiple streams.  I am not familiar with
> this, but Lukas was able to have data transfers to GPU overlap with
> computation on GPU by interleaving these inside crypt_all().  I am
> suggesting an improvement upon this where you'd only need two chunks for
> (potentially) full efficiency, whereas Lukas' inside-crypt_all()
> approach would need more chunks to get close to full efficiency (but not
> reach it).
>
> Please do try this out and post your results.
>

I split the memcpyH2D into 2. One in set_key(), one in crypt_all().
Others remain the same.
With 1/4 long password, there are ~4M and ~3M improvement in many
salts and one salt.
======Before=============
[12:35:02 myrice] run $ ./john -te=1 -fo=xsha512-cuda
Benchmarking: Mac OS X 10.7+ salted SHA-512 [CUDA]... DONE
Many salts:     61086K c/s real, 61652K c/s virtual
Only one salt:  17476K c/s real, 17096K c/s virtual
======After===============
[12:35:53 myrice] run $ ./john -te=1 -fo=xsha512-cuda
Benchmarking: Mac OS X 10.7+ salted SHA-512 [CUDA]... DONE
Many salts:     65278K c/s real, 65925K c/s virtual
Only one salt:  20695K c/s real, 21254K c/s virtual

Thanks
myrice

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.