Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20230529193523.GA2921@openwall.com>
Date: Mon, 29 May 2023 21:35:23 +0200
From: Solar Designer <solar@...nwall.com>
To: Rich Felker <dalias@...c.org>
Cc: Markus Wichmann <nullplan@....net>, musl@...ts.openwall.com
Subject: Re: crypt: dependency needed between main and test run?

Hi,

On Sat, May 27, 2023 at 09:55:13AM -0400, Rich Felker wrote:
> On Sat, May 27, 2023 at 11:52:23AM +0200, Markus Wichmann wrote:
> > continuing my theme of not providing patches but rather asking
> > questions, I noticed something about all of the crypt() backends: They
> > all use a self test, and have no dependency between main and test calls
> > to their respective backends. E.g. in __crypt_sha512():
> > 
> > 
> > |p = sha512crypt(key, setting, output);
> > |/* self test and stack cleanup */
> > |q = sha512crypt(testkey, testsetting, testbuf);
> > 
> > The backend is itself a pure function. For any given input, it will
> > always make the same output. So I don't see a reason why the compiler
> > can't reorder these calls. The backend is a static function (and only
> > calling static functions) defined in the same file, so all of this is
> > available for the compiler to see.
> > 
> > Indeed the compiler might inline the second call, then keep constant
> > folding and loop unrolling until it is entirely evaluated at compile
> > time, and only the result remains. Constant folding and loop unrolling
> > aren't new ideas, but sha512 (and actually all of the hash functions)
> > would require the compiler to be a lot more agressive about them than
> > it normally is at this time. But that doesn't mean it cannot change in
> > future.
> > 
> > I'd propose to add an explicit dependency from the second to the first
> > call. Something like this:
> > 
> > p = sha512crypt(key, setting, output);
> > __asm__("" : "=x"(q) : "0"(testsetting), "x"(p));
> > q = sha512crypt(testkey, q, testbuf);
> > 
> > The asm statement basically tells the compiler that
> > 
> > q = f(testsetting, p);
> > 
> > where f is some unknown pure function. The fact that that function
> > happens to be the identity is unknown to the compiler.
> > 
> > Doing this has two purposes: One, it makes the second call depend on the
> > result of the first, and two, it obscures to the compiler that all
> > inputs to the second call are constant, and thus makes it extremely
> > unlikely that the test backend call would be entirely eliminated.
> > 
> > What do you guys think?
> 
> This is probably the right thing to do. I've CC'd Solar in case he has
> any good ideas on this, since it was originally his design to do the
> tests this way.

Yes, I agree that we have the theoretical issues here that Markus
described, and the dummy inline asm is a way to avoid them.  Perhaps the
volatile keyword could be another way to avoid compile-time computation
of the self-test, but it might not prevent the reordering.

In crypt_blowfish.c, where I had originally introduced this design, we
partially mitigate these issues by having a dependency on the first
call's return value to decide between one of two test vectors for the
second call.  Theoretically, the compiler can still precompute the test
hashes for both possible inputs, but this is less likely.

While we're at it, I suggest usage of (almost?) the lowest supported
settings in the self-test - so for sha*crypt use rounds=1000 or e.g.
rounds=1001, not 1234.  In fact, in crypt_blowfish.c we override the
minimum for the self-test, to make the self-test's performance cost
negligible.  Maybe we should in sha*crypt, too.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.