|
Message-ID: <CAKGDhHVad7OFG8=RiVupEGu8txP=ib46KLuhx3gHibvkw6eL1g@mail.gmail.com> Date: Sun, 30 Aug 2015 01:44:32 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU 2015-08-29 8:48 GMT+02:00 Solar Designer <solar@...nwall.com>: > As to loop unrolling, there's "#pragma unroll N", and when you specify > N=1 so "#pragma unroll 1" I think it prevents unrolling. As an > experiment, I tried adding "#pragma unroll 1" before all loops in > argon2d_kernel.cl, and the PTX instruction count reduced - but not a > lot. Can I get this code? > We need to figure out why it doesn't get lower. ~80k is still a lot. > Are there many inlined functions and unrolled loops in the .h files? there are also blake2 files > Maybe some pre- and/or post-processing should be kept on host to make > the kernel simpler and smaller. This is bad in terms of Amdahl's law, > but it might help us figure things out initially. I will think about it and split kernels, even small pomelo was slightly faster
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.