Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <41325962340d5e8bf2b1e3428030d11b@smtp.hushmail.com>
Date: Mon, 01 Jun 2015 13:37:14 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Interleaving of intrinsics

On 2015-05-31 11:19, magnum wrote:
> On 2015-05-30 04:55, Solar Designer wrote:
>> These are reasonable results for pbkdf2-hmac-sha512, but there's
>> something "wrong" for pbkdf2-hmac-sha256.  It is suspicious whenever
>> there's a performance regression for going from x1 to x2 interleaving,
>> but then a performance improvement at higher interleaving factors.  This
>> suggest there's some overhead incurred with interleaving, and it is
>> probably avoidable.
>
> Perhaps the para loops doesn't always unroll in sha256 and we end up
> with actual loops, as discussed below? The overhead would be less
> significant for higher paras.

Or perhaps as soon as we use interleaving, things like tmp[SIMD_PARA] 
end up being stack arrays? That should hurt a lot.

Actually, here's a bug we have: Using the wide loops as in SHA2, we 
don't need to use "tmp[i]" at all - we do fine with just "tmp". I tried 
this but there was very little difference (but to the better).

I tried changing MD4/5 and SHA1 to use fewer, wider loops similar to 
SHA2 and consequently use single temps instead of arrays. There was 
about 4% boost for MD4/MD5 but SHA1 got slightly worse. Why?

Nothing of this was very conclusive. I'm not sure what to make of it, 
but I'm committing it to a topic branch "intrinsics-loops" for now.


Here's a somewhat unrelated note: While MD4/5 just use the w[16] pad, 
SHA1 and SHA2 use w[80] internally. We handle this differently in all 
three: SHA1 keeps a sliding window of tmpR[16] and some EXPAND macros 
(Jim did this, for a 10% boost of Simon's original code that had w[80]). 
SHA256 seems to manage with just tmp1 and the R() macro. And SHA512 
actually use an expensive w[80]. This should be looked into. I'll have a 
peak at Alain's code again. Maybe SHA1 and SHA512 could do it more like 
SHA256 does it?

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.