|
Message-ID: <CAKGDhHX80khhY6v4yrwYUtFHBeB_z3m37GrKnx_D6+L1tPwxWA@mail.gmail.com> Date: Sat, 25 Jul 2015 00:08:29 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: yescrypt on GPU 2015-07-23 4:00 GMT+02:00 Solar Designer <solar@...nwall.com>: > On Thu, Jul 23, 2015 at 01:33:26AM +0200, magnum wrote: >> On 2015-07-23 00:36, Agnieszka Bielec wrote: >> >has anyone idea why copying parts of memory from __global to __private >> >makes my code slower when there are different passwords and faster >> >where all passwords are the same? > > Why faster for same passwords: > > This is puzzling, but my guess (which could well be wrong) is that the > remaining global memory accesses have better locality of reference > (resulting in better cache hit rate) and/or coalescing potential than > all of them did before you moved some to private memory. In other > words, you moved the "bad" ones to private and kept the "good" ones in > global. But they are only "good" when the passwords are the same (and I > guess the salts as well, or there are few different ones), so this is of > no practical use. > > Why slower for different passwords: > > I guess your LWS or/and GWS became lower. > >> >I did in lyra2 something very >> >similar, maybe my code is too big and I have to do split kernels? > > Split kernel may be good anyway, but this is most likely unrelated to > this specific occasion. > >> Are there differences in length distribution in the two cases? > > This should be irrelevant. The PHC finalists process the plaintext > password into a hash early on, and do not use the plaintext password > frequently. They are not like e.g. md5crypt in this respect. > >> If not, >> Maybe in the slow case they end up spilling to local memory due to >> harder register pressure. > > Maybe. This is a possibility with any changes to a kernel. I had in my code for() { copy to private some operiations on private copy to global } i changed this code to memset(this private array,0,size of private array)//because I noticed when I was working on parallel that kernel can slow down after using uninitialized array for() { some operations on private } and runned with --skip-self-test and speed was the same, even without this memset. this is big array 8KB but I have in another place copying 64 B and this also decreases speed even when copying 8KB is turned off
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.