|
Message-ID: <709a17f82b8b0338075b142d5cf0a863@smtp.hushmail.com> Date: Tue, 22 May 2012 21:39:35 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: Nvidia compiler bug On 05/22/2012 06:06 PM, Claudio André wrote: > Em 21-05-2012 21:04, magnum escreveu: >> Interesting. What went wrong and how did you mitigate it? > I had something like this > __kernel > void kernel_crypt(parameters, ... > __local crypt_sha512_salt * salt_data, > __local working_memory * fast_tmp_memory) { > code; > } > > Even if it was ok on runtime, it was hurting my possibilities. So i > changed to: > __kernel > void kernel_crypt(parameters, ...){ > code; > __local crypt_sha512_salt * salt_data[1]; > __local working_memory * tmp_memory[SIZE]; > > more code; > } So this made the compiler actually use local memory? Weird. I'll try that in RAR. > The point is that i was misunderstanding what was generated as object > code, so i understood wrong the results i got. Maybe, i shouldn't call > this an Nvidia bug (i had troubles using __local pointers and i did 1+1 > and, there is a bug somewhere). I solved my misunderstandings and: > 1. shake the code. > 2. make correct assumptions and conclusions. > > So, another 2 important things happen: > 1. i realized LWS (or LWS + KPC) is much more important than i was > thinking. Yes, once the code is good enough. Then when you find a good LWS and a good minumum multiple for GWS (KPC), it's like the GPU suddenly gets airborne. > 2. i found a better solution for the real bug on an unroll (the most > important) i have. >> Btw I'm curious why your attempt at avoiding byte addressable store >> failed. When/where was it misaligned? > After this, i was afraid i'm facing other crazy thing: > - TESTE *not* defined: CPU: Ok, GPU: ok. > - if TESTE is defined: CPU: worse performance GPU: FAILED > (get_hash[0](0)) > ---------------- > > void insert_to_buffer(sha512_ctx * ctx, > const uint8_t * string, > const uint32_t len) { // len range: 1 to 64 > #ifdef TESTE > uint32_t *d = (uint32_t *) (ctx->buffer->mem_08 + ctx->buflen); > #define PUTCHAR_MAGNUM(buf, index, val) (buf)[(index)>>2] = > ((buf)[(index)>>2] & ~(0xffU << (((index) & 3) << 3))) + ((val) << > (((index) & 3) << 3)) > #else > uint8_t *d = ctx->buffer->mem_08 + ctx->buflen; > #define PUTCHAR_MAGNUM(buf, index, val) (buf)[index] = (val) > #endif > for (uint32_t i = 0; i < len; i++) > PUTCHAR_MAGNUM(d, i, GETCHAR(string, i)); > > ctx->buflen += len; > } > Thanks, I'll have a look at this some time later. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.