|
Message-ID: <008901cc369a$f29f9570$d7dec050$@net> Date: Wed, 29 Jun 2011 15:27:20 -0500 From: "jfoug" <jfoug@....net> To: <john-dev@...ts.openwall.com> Subject: RE: Lukas's Status Report - #7 of 15 >-----Original Message----- >From: Ćukasz Odzioba [mailto:lukas.odzioba@...il.com] > >This week: >... >Try MSCash2 for nvidia cards. Before you do mscash2, get with me on some significant optimizations, and some extreme simplification. The current mscash2 is doing 2x the work needed inside the inner loop. I have redone mscash2 in preparation for converting it to SSE/intrinsic. My main goal was to simplify the code significantly, however, in doing so, by converting the inline code into oSSL calls, I found the encryption of the ipad/opad block could very easily be pulled out of the inner loop. This results in a 2x improvement in speed, over and above any previous optimizations found. I would get it to you now, but I have it a bit ripped open right now, waiting on Simon to try to help figure out why multi-block SHA1 on the intrinsic SSE is not working like I thought it did (likely it is my usage, but I cannot find it). Jim. Here is my code for pbkdf2(). This is the 'entire' code. It is vastly simpler than the original, which had calls to a HUGE hmac_sha1 function. In this code, the salt_buffer is a simple 'flat' char * value (actually converted into UTF16). Salt_len is the length of this salt (in bytes, not in UTF16 characters). The _key[] value, is the original DCC1 value (the mscache) for this password/user. This same buffer is being used to return the DCC2 value back to the caller. Even though the changes I have to mscash2_fmt.c are not prime time ready, I hope this function will help show what I am working on. static void pbkdf2(unsigned int _key[]) // key is also 'final' digest. { SHA_CTX ctx1, ctx2, tmp_ctx1, tmp_ctx2; unsigned char ipad[SHA_CBLOCK+1], opad[SHA_CBLOCK+1]; unsigned int tmp_hash[SHA_DIGEST_LENGTH/4]; unsigned i, j; unsigned char *key = (unsigned char*)_key; for(i = 0; i < 16; i++) { ipad[i] = key[i]^0x36; opad[i] = key[i]^0x5C; } memset(&ipad[16], 0x36, sizeof(ipad)-16); memset(&opad[16], 0x5C, sizeof(opad)-16); SHA1_Init(&ctx1); SHA1_Init(&ctx2); SHA1_Update(&ctx1, ipad, SHA_CBLOCK); SHA1_Update(&ctx2, opad, SHA_CBLOCK); memcpy(&tmp_ctx1, &ctx1, sizeof(SHA_CTX)); memcpy(&tmp_ctx2, &ctx2, sizeof(SHA_CTX)); SHA1_Update(&ctx1, salt_buffer, salt_len); SHA1_Update(&ctx1, "\x0\x0\x0\x1", 4); SHA1_Final((unsigned char*)tmp_hash,&ctx1); SHA1_Update(&ctx2, (unsigned char*)tmp_hash, SHA_DIGEST_LENGTH); // we have to sha1 final to a 'temp' buffer, since we can only overwrite first 16 bytes // of the _key buffer. If we overwrote 20 bytes, then we would lose the first 4 bytes // of the next element (and overwrite end of buffer on last element). SHA1_Final((unsigned char*)tmp_hash, &ctx2); // only copy first 16 bytes, since that is ALL this format uses memcpy(_key, tmp_hash,1 6); for(i = 1; i < 10240; i++) { // we only need to copy the accumulator data from the CTX, since // the original encryption was a full block of 64 bytes. memcpy(&ctx1, &tmp_ctx1, sizeof(SHA_CTX)-(64+sizeof(unsigned int))); SHA1_Update(&ctx1, (unsigned char*)tmp_hash, SHA_DIGEST_LENGTH); SHA1_Final((unsigned char*)tmp_hash, &ctx1); memcpy(&ctx2, &tmp_ctx2, sizeof(SHA_CTX)-(64+sizeof(unsigned int))); SHA1_Update(&ctx2, (unsigned char*)tmp_hash, SHA_DIGEST_LENGTH); SHA1_Final((unsigned char*)tmp_hash, &ctx2); // only xor first 16 bytes, since that is ALL this format uses for(j = 0; j < 4; j++) _key[j] ^= tmp_hash[j]; } }
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.