john-dev - Re: PHC: yescrypt on GPU

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKGDhHX80khhY6v4yrwYUtFHBeB_z3m37GrKnx_D6+L1tPwxWA@mail.gmail.com>
Date: Sat, 25 Jul 2015 00:08:29 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: yescrypt on GPU

2015-07-23 4:00 GMT+02:00 Solar Designer <solar@...nwall.com>:
> On Thu, Jul 23, 2015 at 01:33:26AM +0200, magnum wrote:
>> On 2015-07-23 00:36, Agnieszka Bielec wrote:
>> >has anyone idea why copying parts of memory from __global to __private
>> >makes my code slower when there are different passwords and faster
>> >where all passwords are the same?
>
> Why faster for same passwords:
>
> This is puzzling, but my guess (which could well be wrong) is that the
> remaining global memory accesses have better locality of reference
> (resulting in better cache hit rate) and/or coalescing potential than
> all of them did before you moved some to private memory.  In other
> words, you moved the "bad" ones to private and kept the "good" ones in
> global.  But they are only "good" when the passwords are the same (and I
> guess the salts as well, or there are few different ones), so this is of
> no practical use.
>
> Why slower for different passwords:
>
> I guess your LWS or/and GWS became lower.
>
>> >I did in lyra2 something very
>> >similar, maybe my code is too big and I have to do split kernels?
>
> Split kernel may be good anyway, but this is most likely unrelated to
> this specific occasion.
>
>> Are there differences in length distribution in the two cases?
>
> This should be irrelevant.  The PHC finalists process the plaintext
> password into a hash early on, and do not use the plaintext password
> frequently.  They are not like e.g. md5crypt in this respect.
>
>> If not,
>> Maybe in the slow case they end up spilling to local memory due to
>> harder register pressure.
>
> Maybe.  This is a possibility with any changes to a kernel.

I had in my code

for()
{
     copy to private
     some operiations on private
     copy to global
}

i changed this code to

memset(this private array,0,size of private array)//because I noticed
when I was working on parallel that kernel can slow down after using
uninitialized array
for()
{
     some operations on private
}

and runned with --skip-self-test and speed was the same, even without
this memset. this is big array 8KB but I have in another place copying
64 B and this also decreases speed even when copying 8KB is turned off

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.