|
Message-ID: <op.x3i6zq1kzz6j51@1pqhgq1.dtn.com> Date: Mon, 17 Aug 2015 16:22:28 -0500 From: JimF <jfoug@....net> To: john-dev@...ts.openwall.com Subject: Re: Formats using non-SIMD SHA2 implementations On Mon, 17 Aug 2015 15:57:26 -0500, magnum <john.magnum@...hmail.com> wrote: > On 2015-08-17 09:40, Lei Zhang wrote: >> On Aug 17, 2015, at 2:26 PM, magnum <john.magnum@...hmail.com> wrote: >>> On 2015-08-17 05:07, Lei Zhang wrote: >>>> I finally got 7z to work correctly with SIMD :) >> >>> Are you sorting lengths, like Jim hinted? Or are you handling >>> diverging lengths like in SAP F/G? >> >> No, I haven't done that yet. I may give that a try too. I hope it's not >> too tricky to implement. The code already looks ugly enough... > > Are you saying you do neither? That can't work. If it seems to work, > it's only because all test vectors are same length. The same applies to > RAR3. dynamic is also another format that works like SAP F/G (if that keeps iterating until all inputs have been completed). You can look at any of the low level SIMD functions in dynamic_big_crypt.c to see how dynamic is doing it. here is an example showing SHA256 static void DoSHA256_crypt_sse(void *in, uint32_t ilen[SHA256_LOOPS], void *out[SHA256_LOOPS], uint32_t *tot_len, uint32_t tid) { JTR_ALIGN(MEM_ALIGN_SIMD) ARCH_WORD_32 a[(32*SHA256_LOOPS)/sizeof(ARCH_WORD_32)]; union yy { unsigned char u[32]; ARCH_WORD_32 a[32/sizeof(ARCH_WORD_32)]; } y; uint32_t i, j, loops[SHA256_LOOPS], bMore, cnt; unsigned char *cp = (unsigned char*)in; for (i = 0; i < SHA256_LOOPS; ++i) { loops[i] = Do_FixBufferLen32(cp, ilen[i], 1); cp += 64*4; } cp = (unsigned char*)in; bMore = 1; cnt = 1; while (bMore) { SIMDSHA256body(cp, a, a, SSEi_FLAT_IN |SSEi_4BUF_INPUT_FIRST_BLK|(cnt==1?0:SSEi_RELOAD)); bMore = 0; for (i = 0; i < SHA256_LOOPS; ++i) { if (cnt == loops[i]) { uint32_t offx = ((i/SIMD_COEF_32)*32/sizeof(ARCH_WORD_32)*SIMD_COEF_32)+(i&(SIMD_COEF_32-1)); for (j = 0; j < 32/sizeof(ARCH_WORD_32); ++j) { y.a[j] = JOHNSWAP(a[(j*SIMD_COEF_32)+offx]); } *(tot_len+i) += large_hash_output(y.u, &(((unsigned char*)out[i])[*(tot_len+i)]), 32, tid); } else if (cnt < loops[i]) bMore = 1; } cp += 32*2; ++cnt; } } what we do here, is to put the data into our buffers (we are using flat buffers), and the function that does this tells us that for a specific string, it takes X number of sha256 calls. Then we loop until ALL the values have been completed (i.e. when the max of loops[x] == cnt). Ignore some of the strangeness, such as cp += 32*2; There is some weird looking code, because this file is auto-generated, and the same template for this function is used for ALL simd formats. This could also be done by loading the buffers within the "while (bMore) {}" loop, and SHOULD be done there with only 1 buffer, if the number of multiple crypt calls is large or unknnown. In the dynamic format, I do have 4 crypt limb (or 2 limb for 64 bit SIMD), and use them 'intact'. But I do that because there is a lot of reading and writing in these buffers, and no real way to know how or what will be written to them (it is dynamic btw). BUT as magnum has stated, you MUST handle this, in some way.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.