john-dev - Re: PHC: Lyra2 on GPU

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150705075304.GA27232@openwall.com>
Date: Sun, 5 Jul 2015 10:53:04 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Lyra2 on GPU

Agnieszka,

On Sat, Jul 04, 2015 at 02:04:26AM +0200, Agnieszka Bielec wrote:
> my optimizations are based on transfer one table to local memory and
> copying small portions of global memory into local buffers, I didn't
> saw any sense i coalescing and I didn't tried it

Please also try going in the opposite direction: keep more stuff in
global memory, reduce use of local memory per instance to the point
where you can use a lot higher GWS - like 20480 (10x higher than what's
auto-tuned now) or even higher.  This may result in a speedup through
hiding of global memory access latencies due to the greater concurrency.

You'll need to benchmark these approaches against each other at
different cost settings.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.