Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150705075304.GA27232@openwall.com>
Date: Sun, 5 Jul 2015 10:53:04 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Lyra2 on GPU

Agnieszka,

On Sat, Jul 04, 2015 at 02:04:26AM +0200, Agnieszka Bielec wrote:
> my optimizations are based on transfer one table to local memory and
> copying small portions of global memory into local buffers, I didn't
> saw any sense i coalescing and I didn't tried it

Please also try going in the opposite direction: keep more stuff in
global memory, reduce use of local memory per instance to the point
where you can use a lot higher GWS - like 20480 (10x higher than what's
auto-tuned now) or even higher.  This may result in a speedup through
hiding of global memory access latencies due to the greater concurrency.

You'll need to benchmark these approaches against each other at
different cost settings.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.