Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150823055305.GA15169@openwall.com>
Date: Sun, 23 Aug 2015 08:53:05 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Argon2 on GPU

Agnieszka,

There might also be room for improvement of Argon2 performance on GPUs
through special handling of BLAKE2b's 64-bit operations.  See:

http://hashcat.net/forum/archive/index.php?thread-3422.html

"All the 64-bit based algorithms like SHA512, Keccak etc dropped in
performance with each new driver a little bit.  So it was hard to notice.
GPUs instructions operate still on 32-bit only, so the 64-bit mode is
emulated.  But the way how it is emulated was somehow broken.  I was
able to pinpoint the problem where the biggest drop came from and I
managed to workaround it.  For NVidia it took me a little PTX hack, for
AMD luckily there was no binary hack required."

Unfortunately, atom doesn't go into further detail there (but we could
try asking him).  I guess the approach amounts to explicitly building
64-bit addition out of 32-bit additions.  Maybe having it split like
that right away (rather than only in the PTX or IL to ISA translation)
is somehow friendlier to current compilers.

I guess this is part of why oclHashcat is faster than JtR at SHA-512
based hashes (per further announcements, oclHashcat's performance at
those has been improved way further since that old forum posting above).

In a vectorized kernel, we'd switch from ulong2 to uint4.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.