|
Message-ID: <20120809164355.GA28398@openwall.com> Date: Thu, 9 Aug 2012 20:43:55 +0400 From: Solar Designer <solar@...nwall.com> To: musl@...ts.openwall.com Subject: Re: crypt* files in crypt directory On Thu, Aug 09, 2012 at 01:58:12PM +0200, Szabolcs Nagy wrote: > it's scary how these crypto codes mix various > signed and unsigned integer types Yes, there's room for improvement here. I'd probably write this differently if I were writing it from scratch now. Perhaps a cleanup is in order, similarly to what we did for the FreeSec code recently. Specifically, we could have a higher-level wrapper function accept char * like crypt() is defined to accept, but pass these to a function accepting unsigned char * and go with that from this point. > i added review comments > the style comments are subjective of course Thanks. Note that I deliberately did not merge Rich's cleanups yet in order to be able to continue testing this as part of my tree, with its wrapper.c still intact, etc. As discussed with Rich on IRC, I left merging those changes for musl's use to Rich. > * Solar Designer <solar@...nwall.com> [2012-08-09 14:53:48 +0400]: > > typedef unsigned int BF_word; > > typedef signed int BF_word_signed; > > > > /* Number of Blowfish rounds, this is also hardcoded into a few places */ > > #define BF_N 16 > > > > typedef BF_word BF_key[BF_N + 2]; > > i don't like these typedefs > it seems the code assumes 32bit unsigned BF_word Yes. As you suggested while we were discussed FreeSec, in musl's revision of the code we should use <stdint.h> types here. I made those changes to that code, but not yet to crypt_blowfish code - I think Rich will do that. > > #define BF_safe_atoi64(dst, src) \ > > { \ > > tmp = (unsigned char)(src); \ > > if ((unsigned int)(tmp -= 0x20) >= 0x60) return -1; \ > > tmp is already unsigned int Rich is dropping BF_safe_atoi64() anyway, in favor of different code. Thanks for the comment, though. I might look into this when cleaning this up for non-musl use. > > #define BF_ROUND(L, R, N) \ > > tmp1 = L & 0xFF; \ > > tmp2 = L >> 8; \ > > tmp2 &= 0xFF; \ > > tmp3 = L >> 16; \ > > tmp3 &= 0xFF; \ > > tmp4 = L >> 24; \ > > tmp1 = ctx->s.S[3][tmp1]; \ > > tmp2 = ctx->s.S[2][tmp2]; \ > > tmp3 = ctx->s.S[1][tmp3]; \ > > tmp3 += ctx->s.S[0][tmp4]; \ > > tmp3 ^= tmp2; \ > > R ^= ctx->s.P[N + 1]; \ > > tmp3 += tmp1; \ > > R ^= tmp3; > > i guess this is performance critical, but > i wouldn't spread those expressions over > several lines > > tmp1 = ctx->S[3][L & 0xff]; > tmp2 = ctx->S[2][L>>8 & 0xff]; > tmp3 = ctx->S[1][L>>16 & 0xff]; > tmp4 = ctx->S[0][L>>24 & 0xff]; > R ^= ctx->P[N+1]; > R ^= ((tmp3 + tmp4) ^ tmp2) + tmp1; I think this rewrite is likely to cause a slowdown with some compilers. My revision of BF_ROUND makes the available parallelism explicit - for example, that L & 0xFF may be computed early, but the corresponding array lookup postponed until after other indices are also computed - which leaves more time for the mask and address generation latency. Sure, an optimizing compiler is supposed to figure this out and do proper instruction scheduling on its own. However, there exist cases when some optimizations of a compiler have to be disabled - e.g., an Apple person suggested disabling clang/llvm's instruction scheduling when compiling this very code (well, a two hashes at a time revision of it) a couple of years ago, as a workaround for the compiler doing it poorly. My hand-written instruction scheduling as above turned out to work a lot better: http://www.openwall.com/lists/john-users/2010/08/08/4 (and two follow-ups - click "thread-next"). It would be curious to see what the same compiler would do with your "unoptimized" revision of the source code, but I doubt that it would perform better. > > do { > > ptr += 2; > > L ^= ctx->s.P[0]; > > BF_ROUND(L, R, 0); > > BF_ROUND(R, L, 1); > > BF_ROUND(L, R, 2); > > BF_ROUND(R, L, 3); > > BF_ROUND(L, R, 4); > > BF_ROUND(R, L, 5); > > BF_ROUND(L, R, 6); > > BF_ROUND(R, L, 7); > > BF_ROUND(L, R, 8); > > BF_ROUND(R, L, 9); > > BF_ROUND(L, R, 10); > > BF_ROUND(R, L, 11); > > BF_ROUND(L, R, 12); > > BF_ROUND(R, L, 13); > > BF_ROUND(L, R, 14); > > BF_ROUND(R, L, 15); > > tmp4 = R; > > R = L; > > L = tmp4 ^ ctx->s.P[BF_N + 1]; > > *(ptr - 1) = R; > > *(ptr - 2) = L; > > } while (ptr < end); > > why increase ptr at the begining? To hide the latency. Indeed, a smart compiler should do this on its own, but not all do it well. > it seems the idiomatic way would be > > *ptr++ = L; > *ptr++ = R; Yes, but note that the writes are made in the other order, for a reason: R is computed first, so it is available for the write first. All of this is a result of experiments, some on old systems many years ago, some recent. > > tmp[1] |= (BF_word_signed)(signed char)*ptr; /* bug */ > > i don't think the BF_word_signed cast helps here > signed char is promoted to int and then > it will be converted to BF_word Yes, I just like it explicit to show what exactly is happening, because it matters a lot. > > if (setting[0] != '$' || > > setting[1] != '2' || > > setting[2] < 'a' || setting[2] > 'z' || > > !flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a'] || > > setting[2] is already range checked so > the casts are not necessary > (assuming 'a' < 'z') Casting chars to (unsigned int)(unsigned char) for array indices is an idiom for me. Now, we have a promotion to (int) due to "- 'a'", but if this is ever compiled with a C++ compiler, the 'a' constant would be of type char rather than int, IIRC. So better safe than sorry. Yes, the range checking should have helped in this case, but I recall seeing weird behavior with in-range (implies non-negative) char subscripts on Alpha in 1990s before I started using this idiom. IIRC, instructions were generated such that high bytes of the array index register were not cleared before the lookup. Compiler bug? But it was seen with different compilers. > i'd use for (;;) for infinite loops I saw Rich do that, but I kept the infinite loops in this file I posted consistent with those I use elsewhere for now. Rich can musl'ify this code in this way if desired. > i like for better > > for (done = 0; ; done++) { > ... > if (done) > break; > ... > } I might try that and see if it's actually cleaner and as fast. Instead of "done++", I'd use "done = 1". > > count = 64; > > do { > > L = BF_encrypt(&data.ctx, L, LR[1], > > &LR[0], &LR[0]); > > } while (--count); > > for (count = 0; count < 64; count++) > > i assume it will be optimized to the same thing > (count is not used later) I hope/expect that it will be optimized to the same thing, but I am not sure it'd happen with all relevant compilers all the time. I like writing performance-critical code closer to the desired assembly code. This one loop is only moderately performance relevant, though. > > output[7 + 22 - 1] = BF_itoa64[(int) > > BF_atoi64[(int)setting[7 + 22 - 1] - 0x20] & 0x30]; > > is setting[28] range checked at this point? Yes, as part of the salt decoding at function start. > the int casts do not help Yes, there's promotion to int here anyway. > > _crypt_output_magic(setting, output, size); > > retval = BF_crypt(key, setting, output, size, 16); > > save_errno = errno; > > why save errno before the self test? > > if the self test fails then errno can be anything anyway Yes, if the test fails, we set errno explicitly to EINVAL after this point. > if it does not fail then errno is unchanged i guess Yes, this should be the case currently. I just did not want to have a pitfall in there in case of future changes to BF_crypt() where errno could potentially be altered on a successful call. > > memcpy(buf.s, test_setting, sizeof(buf.s)); > > sizeof buf.s > without () I am consistently using sizeof() with the braces. Maybe musl's coding style is different. If so, this may be changed indeed. > > memset(buf.o, 0x55, sizeof(buf.o)); > > buf.o[sizeof(buf.o) - 1] = 0; > > p = BF_crypt(test_key, buf.s, buf.o, sizeof(buf.o) - (1 + 1), 1); > > ditto > > i'd write (1 + 1) as 2 I'm not sure which is cleaner. It can be said that both the '\x55' and the NUL after it are canary bytes and thus they're the same kind of thing and we can talk of a canary of size 2. Then we'd need to use this "2" in three places, not just here. However, in the way we specify the test_hash[] strings, the NUL is implicit, which kind of makes it separate from the '\x55'. > > test_hash[(unsigned int)(unsigned char)buf.s[2] & 1], > > the extra (unsigned int) cast is unnecessary Yes, we could break the idiom and rely on the promotion to int here. (Without such promotion or cast, IIRC, things would sometimes fail on Alpha.) Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.