Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <709a17f82b8b0338075b142d5cf0a863@smtp.hushmail.com>
Date: Tue, 22 May 2012 21:39:35 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Nvidia compiler bug

On 05/22/2012 06:06 PM, Claudio André wrote:
> Em 21-05-2012 21:04, magnum escreveu:
>> Interesting. What went wrong and how did you mitigate it?
> I had something like this
> __kernel
> void kernel_crypt(parameters, ...
>                   __local    crypt_sha512_salt     * salt_data,
>                   __local    working_memory      * fast_tmp_memory) {
>      code;
> }
> 
> Even if it was ok on runtime, it was hurting my possibilities. So i
> changed to:
> __kernel
> void kernel_crypt(parameters, ...){
>      code;
>     __local crypt_sha512_salt     * salt_data[1];
>     __local working_memory      * tmp_memory[SIZE];
> 
>      more code;
> }

So this made the compiler actually use local memory? Weird. I'll try
that in RAR.

> The point is that i was misunderstanding what was generated as object
> code, so i understood wrong the results i got. Maybe, i shouldn't call
> this an Nvidia bug (i had troubles using __local pointers and i did 1+1
> and, there is a bug somewhere). I solved my misunderstandings and:
> 1. shake the code.
> 2. make correct assumptions and conclusions.
> 
> So, another 2 important things happen:
> 1. i realized LWS (or LWS + KPC) is much more important than i was
> thinking.

Yes, once the code is good enough. Then when you find a good LWS and a
good minumum multiple for GWS (KPC), it's like the GPU suddenly gets
airborne.

> 2. i found a better solution for the real bug on an unroll (the most
> important) i have.
>> Btw I'm curious why your attempt at avoiding byte addressable store
>> failed. When/where was it misaligned?
> After this, i was afraid i'm facing other crazy thing:
> - TESTE *not* defined: CPU: Ok,  GPU: ok.
> - if TESTE is defined: CPU: worse performance   GPU: FAILED
> (get_hash[0](0))
> ----------------
> 
> void insert_to_buffer(sha512_ctx    * ctx,
>                       const uint8_t * string,
>                       const uint32_t len) { // len range: 1 to 64
> #ifdef TESTE
>     uint32_t *d = (uint32_t *) (ctx->buffer->mem_08 + ctx->buflen);
>     #define PUTCHAR_MAGNUM(buf, index, val) (buf)[(index)>>2] =
> ((buf)[(index)>>2] & ~(0xffU << (((index) & 3) << 3))) + ((val) <<
> (((index) & 3) << 3))
> #else
>     uint8_t *d = ctx->buffer->mem_08 + ctx->buflen;
>     #define PUTCHAR_MAGNUM(buf, index, val) (buf)[index] = (val)
> #endif
>     for (uint32_t i = 0; i < len; i++)
>         PUTCHAR_MAGNUM(d, i, GETCHAR(string, i));
> 
>     ctx->buflen += len;
> }
> 

Thanks, I'll have a look at this some time later.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.