Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151006001911.GA4927@openwall.com>
Date: Tue, 6 Oct 2015 03:19:11 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: yescrypt on GPU

On Tue, Oct 06, 2015 at 03:16:09AM +0300, Solar Designer wrote:
> Then there's also this weird trick I just posted about to the PHC list:
> 
> http://thread.gmane.org/gmane.comp.security.phc/2938/focus=3496
> 
> where a BSTY miner implementation author chose to split the S-box
> lookups across multiple work-items.  He chose 16, but I think 4 would be
> optimal (or maybe 8, if additionally splitting loads of the two S-boxes
> across two sets of 4 work-items).  This might in fact speed them up, so
> might be worth trying (as an extra option, on top of 3 main ones).
> He reported 372 h/s at 2 MB (N=2048 r=8) on HD 7750.  Scaling to 7970,

https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units#Radeon_HD_7xxx_Series

> this could be up to 372*2048/512*1000/800 = 1860, but probably a lot
> less than that in practice (7750's narrower memory bus might be a better
> fit).  Your reported best result is 914 for 1.5 MB (r=6), so seemingly
> much slower than his:
> 
> http://www.openwall.com/lists/john-dev/2015/07/27/6
> 
> We have a 7750 (a version with DDR3 memory, though) in "well", so you
> may try your code on it and compare against the 372 figure directly.
> And like I wrote, his byte-granular loads are likely not optimal, with
> uint likely more optimal.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.