|
Message-ID: <20150705075304.GA27232@openwall.com> Date: Sun, 5 Jul 2015 10:53:04 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Lyra2 on GPU Agnieszka, On Sat, Jul 04, 2015 at 02:04:26AM +0200, Agnieszka Bielec wrote: > my optimizations are based on transfer one table to local memory and > copying small portions of global memory into local buffers, I didn't > saw any sense i coalescing and I didn't tried it Please also try going in the opposite direction: keep more stuff in global memory, reduce use of local memory per instance to the point where you can use a lot higher GWS - like 20480 (10x higher than what's auto-tuned now) or even higher. This may result in a speedup through hiding of global memory access latencies due to the greater concurrency. You'll need to benchmark these approaches against each other at different cost settings. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.