|
Message-ID: <20150819044057.GA16484@openwall.com> Date: Wed, 19 Aug 2015 07:40:57 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU Agnieszka, On Wed, Aug 19, 2015 at 07:10:42AM +0300, Solar Designer wrote: > I think the modulo division operations are causing a lot of latency: > > [solar@...er opencl]$ fgrep % argon2*.cl > argon2d_kernel.cl: reference_block_offset = (phi % r); > argon2i_kernel.cl: uint reference_block_index = addresses[0] % r; > argon2i_kernel.cl: uint reference_block_index = addresses[i] % r; You might also achieve speedup by moving these operations up in code, to be performed as soon as their input data is available. Maybe the compiler already does it for you, or maybe not. In 2d, you could compute an equivalent of "phi % r" inside ComputeBlock, after having invoked 9 out of 16 BLAKE2 rounds. (The remaining 7 only affect state[] elements other than the one used for phi.) In 2i, you could compute "addresses[...] % r" on writes to addresses[] (thus, store the block indices instead of the original addresses). "r" might not yet have the correct value at that point, but you should be able to correctly predict what it would have been at the "%" time. However, I doubt this will fully do the latency hiding trick on GPUs. Parallel processing capabilities to perform the modulo operations concurrently are simply not available at that level in the OpenCL programming model: For 2d, there's a chance you'd save the latency of one modulo operation (and that's all you need) if the GPU has an instruction like this in hardware (and microcode?) and it doesn't block further processing (until there's a data dependency on the result). For 2i, there's no way those 256 modulo operations would be run concurrently from one work-item. And besides, to run them concurrently you'd need to provide storage for the results (you already have that addresses[] array) and then it's no better than copying this data e.g. from a larger array (holding all precomputed indices) in global memory. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.