|
Message-ID: <20120817050711.GC23259@openwall.com> Date: Fri, 17 Aug 2012 09:07:11 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: bitslice DES on GPU Sayantan - On Mon, Aug 13, 2012 at 07:54:47PM +0530, Sayantan Datta wrote: > I'm currently trying to understand what you just wrote, correct me if I am > wrong. > > For DES_BS_EXPAND==0 > We have 768 indices for a 56 element array. The 56 element array is > different for each hash. Yes. (In fact, it holds many keys at once in its different bit layers, for computation of that many different hashes.) > However the 768 indices are same for all hashes > (Right?). Correct. They're part of the specification of DES. > So why can't we put the 768 indices in local memory? We can. A potential tradeoff here is between indices and pointers. Indices can be shared, but they may need to be scaled to byte offsets (multiply by 4) and a per work-item base address for the 56-element K array needs to be added to them. Likely this is free or cheap. However, if not then you could instead have a per work-item 768-element array of pointers, which would provide K elements directly (no additional base address needs to be added). The latter is almost certainly slower on current GPUs, where the local memory size would become the limiting factor. (On CPUs, the latter is faster.) The scaling of indices to byte offsets you may perform out of the loop (in fact, they will be constants). 56*4 still fits in 8 bits. This may or may not make any difference (depending on available addressing modes on a particular architecture and costs or lack thereof of making use of those addressing modes). BTW, this assumes gather addressing, even though we don't really need it (the accesses by different work-items executing on a SIMD unit will be to adjacent addresses). I wonder if writing the vectorized code more explicitly would result in savings. Then the scaling of indices to byte offsets would be by more than *4, and maybe there would more likely be a benefit from precomputing it - although then you need larger than 8-bit elements in the 768-element array to store them. > For DES_BS_EXPAND==1 > We expand the 56 element array to a 768 element array(Right?). This time > 768 element array is different for all hashes(Right?). So we can't put > them in local memory. Yes. > Did I get the problem correct or am I totally wrong? You mostly got it right. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.