|
Message-ID: <20120809232132.GX27715@brightrain.aerifal.cx> Date: Thu, 9 Aug 2012 19:21:32 -0400 From: Rich Felker <dalias@...ifal.cx> To: musl@...ts.openwall.com Subject: Re: crypt* files in crypt directory On Thu, Aug 09, 2012 at 01:58:12PM +0200, Szabolcs Nagy wrote: > > #define BF_ROUND(L, R, N) \ > > tmp1 = L & 0xFF; \ > > tmp2 = L >> 8; \ > > tmp2 &= 0xFF; \ > > tmp3 = L >> 16; \ > > tmp3 &= 0xFF; \ > > tmp4 = L >> 24; \ > > tmp1 = ctx->s.S[3][tmp1]; \ > > tmp2 = ctx->s.S[2][tmp2]; \ > > tmp3 = ctx->s.S[1][tmp3]; \ > > tmp3 += ctx->s.S[0][tmp4]; \ > > tmp3 ^= tmp2; \ > > R ^= ctx->s.P[N + 1]; \ > > tmp3 += tmp1; \ > > R ^= tmp3; > > i guess this is performance critical, but > i wouldn't spread those expressions over > several lines > > tmp1 = ctx->S[3][L & 0xff]; > tmp2 = ctx->S[2][L>>8 & 0xff]; > tmp3 = ctx->S[1][L>>16 & 0xff]; > tmp4 = ctx->S[0][L>>24 & 0xff]; > R ^= ctx->P[N+1]; > R ^= ((tmp3 + tmp4) ^ tmp2) + tmp1; My first modified version to remove the manual scheduling is significantly slower than the hand-scheduled version. I haven't tried your version here yet, but it looks nicer and I think it would be reasonable to compare and see if it's better. > > do { > > ptr += 2; > > L ^= ctx->s.P[0]; > > BF_ROUND(L, R, 0); > > BF_ROUND(R, L, 1); > > BF_ROUND(L, R, 2); > > BF_ROUND(R, L, 3); > > BF_ROUND(L, R, 4); > > BF_ROUND(R, L, 5); > > BF_ROUND(L, R, 6); > > BF_ROUND(R, L, 7); > > BF_ROUND(L, R, 8); > > BF_ROUND(R, L, 9); > > BF_ROUND(L, R, 10); > > BF_ROUND(R, L, 11); > > BF_ROUND(L, R, 12); > > BF_ROUND(R, L, 13); > > BF_ROUND(L, R, 14); > > BF_ROUND(R, L, 15); > > tmp4 = R; > > R = L; > > L = tmp4 ^ ctx->s.P[BF_N + 1]; > > *(ptr - 1) = R; > > *(ptr - 2) = L; > > } while (ptr < end); > > why increase ptr at the begining? > it seems the idiomatic way would be > > *ptr++ = L; > *ptr++ = R; For me, making this change makes it 5% faster. I suspect the difference comes from the fact that gcc is not smart enough to move the ptr+=2; across the rest of the loop body, and the fact that it gets spilled to the stack and reloaded for *both* points of usage rather than just one. The original version may perform better on machines with A LOT more registers, but I'm doubtful... Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.