|
Message-ID: <20151006001911.GA4927@openwall.com> Date: Tue, 6 Oct 2015 03:19:11 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: yescrypt on GPU On Tue, Oct 06, 2015 at 03:16:09AM +0300, Solar Designer wrote: > Then there's also this weird trick I just posted about to the PHC list: > > http://thread.gmane.org/gmane.comp.security.phc/2938/focus=3496 > > where a BSTY miner implementation author chose to split the S-box > lookups across multiple work-items. He chose 16, but I think 4 would be > optimal (or maybe 8, if additionally splitting loads of the two S-boxes > across two sets of 4 work-items). This might in fact speed them up, so > might be worth trying (as an extra option, on top of 3 main ones). > He reported 372 h/s at 2 MB (N=2048 r=8) on HD 7750. Scaling to 7970, https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units#Radeon_HD_7xxx_Series > this could be up to 372*2048/512*1000/800 = 1860, but probably a lot > less than that in practice (7750's narrower memory bus might be a better > fit). Your reported best result is 914 for 1.5 MB (r=6), so seemingly > much slower than his: > > http://www.openwall.com/lists/john-dev/2015/07/27/6 > > We have a 7750 (a version with DDR3 memory, though) in "well", so you > may try your code on it and compare against the 372 figure directly. > And like I wrote, his byte-granular loads are likely not optimal, with > uint likely more optimal. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.