|
Message-ID: <CA+TsHUCPLZ2TGrE50OsBgKwPEDt=BjZjaUkAakwxP2GGiMaP7A@mail.gmail.com> Date: Sat, 31 Mar 2012 15:43:31 +0530 From: SAYANTAN DATTA <std2048@...il.com> To: john-dev@...ts.openwall.com Subject: Re: JtR: GPU for slow hashes On Sat, Mar 31, 2012 at 3:42 PM, SAYANTAN DATTA <std2048@...il.com> wrote: > > > On Thu, Mar 29, 2012 at 5:37 AM, Solar Designer <solar@...nwall.com>wrote: > >> On Wed, Mar 28, 2012 at 11:21:50PM +0530, SAYANTAN DATTA wrote: >> > On Wed, Mar 28, 2012 at 5:54 PM, SAYANTAN DATTA <std2048@...il.com> >> wrote: >> > > Here are a few problems I'm facing.Since ATI 4000 series gpus don't >> > > support byte_addressable_store I have to work around this problem by >> using >> > > only uint as the data type for temporary data storage.This problem >> exsist >> > > with many of the hash algorithms already implemented with openCL eg >> MD5,4 >> > > etc.However ATI 5000 series and above seems to support >> > > byte_addressable_store.So the exsisting codes should work fine on >> 5000 or >> > > above GPUs but for 4000 series or below they need to be >> reimplemented.The >> > > workaround is also causing some performance penalties. >> ... >> > Since my GPU dosen't support byte_addressable_store it is becoming an >> > increasingly uphill task to implement the HMAC_SHA1 algorithm.Using the >> > uint[] instead of uchar[] is a probable solution but debugging the >> > code becomes very much time consuming. >> > I have also considered using 4 uchar16 vectors to replace single >> > uchar[64] array but it is resulting in too much branching in the code.If >> > you have any suggestion please let me know. >> >> I am totally unfamiliar with this - maybe someone else will comment. >> Lukas, Milen, Samuele, Claudio, magnum - maybe some of you? >> >> It is not necessarily a bad thing that the task turned out to be more >> complicated - you have a better chance to demonstrate your ability to >> work on complex tasks in this way. ;-) >> >> Thanks, >> >> Alexander >> > > Hi Alexander, > > I'm pleased to inform you that I have finished the implementation of > PBKDF2 step on GPU (openCL). The code is primarily based on the sample > program that you mentioned in the earlier post but I had to heavily modify > the code in order to implement it on ATI RV790 architecture because of > which it took a lot more time than expected. > I have compared the outputs with the sample code you provided and the > outputs are perfect match.Also there is a room for lot more optimization. > One drawback I found is that due to very large length of code the > compilation(clBuildProgram()) time is a bit long.As I've already told you > that my GPU doesn't support byte_addressable_storage ,I had to improvise a > work around which resulted in lengthier code. > I'm attaching the unoptimized version host and device codes here. > > Regards, > -Sayantan > > > > Content of type "text/html" skipped View attachment "MSCash2_sample.cpp" of type "text/x-c++src" (26707 bytes) View attachment "MSCash2_host.cpp" of type "text/x-c++src" (12685 bytes) Download attachment "PBKDF2.cl" of type "application/octet-stream" (35849 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.