john-dev - Re: JtR: GPU for slow hashes

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+TsHUAhiPcgwExGqVRsORVBx5dadz+5i-_UVRM1VHQ1t3VDTA@mail.gmail.com>
Date: Sat, 31 Mar 2012 15:42:03 +0530
From: SAYANTAN DATTA <std2048@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: JtR: GPU for slow hashes

On Thu, Mar 29, 2012 at 5:37 AM, Solar Designer <solar@...nwall.com> wrote:

> On Wed, Mar 28, 2012 at 11:21:50PM +0530, SAYANTAN DATTA wrote:
> > On Wed, Mar 28, 2012 at 5:54 PM, SAYANTAN DATTA <std2048@...il.com>
> wrote:
> > >   Here are a few problems I'm facing.Since ATI 4000 series gpus don't
> > > support byte_addressable_store I have to work around this problem by
> using
> > > only uint as the data type for temporary data storage.This problem
> exsist
> > > with many of the hash algorithms already implemented with openCL eg
> MD5,4
> > > etc.However ATI 5000 series and above seems to support
> > >  byte_addressable_store.So the exsisting codes should work fine on
> 5000 or
> > > above GPUs but for 4000 series or below they need to be
> reimplemented.The
> > > workaround is also causing some performance penalties.
> ...
> >   Since my GPU dosen't support byte_addressable_store it is becoming an
> > increasingly uphill task to implement the HMAC_SHA1 algorithm.Using the
> > uint[]  instead of uchar[] is a probable solution but debugging the
> > code becomes very much time consuming.
> >    I  have also considered using 4 uchar16 vectors to  replace single
> > uchar[64] array but it is resulting in too much branching in the code.If
> > you have any suggestion please let me know.
>
> I am totally unfamiliar with this - maybe someone else will comment.
> Lukas, Milen, Samuele, Claudio, magnum - maybe some of you?
>
> It is not necessarily a bad thing that the task turned out to be more
> complicated - you have a better chance to demonstrate your ability to
> work on complex tasks in this way. ;-)
>
> Thanks,
>
> Alexander
>

Hi Alexander,

I'm pleased to inform you that I have finished the implementation of PBKDF2
step on GPU  (openCL). The code is primarily based on the sample program
that you mentioned in the earlier post but I had to heavily modify the code
in order to implement it on ATI RV790 architecture because of which it took
a lot more time than expected.
I have compared the outputs with the sample code you provided and the
outputs are perfect match.Also there is a room for lot more optimization.
    One drawback I found is that due to very large length of code the
compilation(clBuildProgram()) time is a bit long.As I've already told you
that my GPU doesn't support byte_addressable_storage ,I had to improvise a
work around which resulted in lengthier code.
   I'm attaching the unoptimized version host and device codes here.

Regards,
-Sayantan

Content of type "text/html" skipped

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.