|
Message-ID: <CANJ2NMMWaRzVQzTA_EQ-DudK5kjBqE9nTivpA7Hbi9JVSmtmPg@mail.gmail.com>
Date: Fri, 23 Mar 2012 21:23:48 +0800
From: myrice <qqlddg@...il.com>
To: john-dev@...ts.openwall.com
Subject: Possible improvement of cryptsha256-cuda
Hi,
Lukas, I am reading your cryptsha256-cuda code. The cuda output buffer is
not coalesce accessed. That is(in file cuda/cryptsha256.cu):
284#pragma unroll 8
285 for (i = 0; i < 8; i++)
286 tresult[hash_addr(i, idx)] = alt_result[i];
The hash_addr is:
#define hash_addr(j,idx) (((j)*(KEYS_PER_CRYPT))+(idx))
However, the access pattern is not regular. That means we will access 0
2000 4000. And each access need a large memory cycles. In cuda
4.1 profiler. It also says that the global memory store is very
inefficiency. I think we could change it to idx*8+i. And make an address
translate in cpu side. I am doing this!
Will let you know the result.
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.