|
Message-ID: <CALaL1qC+uwDUPyfWinvfH6ZMkRhemP5eedVm1+tdpK0849DZzA@mail.gmail.com>
Date: Mon, 25 Jun 2012 18:11:48 -0700
From: Bit Weasil <bitweasil@...il.com>
To: magnum <john.magnum@...hmail.com>
Cc: john-dev@...ts.openwall.com
Subject: Re: Re: OpenCL kernel max running time vs. "ASIC hang"
> I simply store the intermediate values in the GPU global memory.
>> The access (if done sanely) is coalesced, and is roughly speaking a
>> "best case" memory access pattern for both the load and the store.
>> I'm using a high resolution timer class to dynamically adjust the
>> work done per kernel invocation. If I'm below 90% or above 110% of
>> my target time, I adjust the steps per invocation for the next call.
>> It seems to work nicely, and also properly handles conditions like an
>> overheating GPU that throttles, or someone gaming in the background.
>>
>
> You make it sound very easy :)
>
I try. I started my kernels on CUDA, with a display - so I had to do
this. Once you design it into the kernels, it's not that bad. I reuse the
same timing code for almost everything.
>
> It shouldn't be difficult to take a single execution kernel and break
>> it into multiple steps. If you would like a starting point, the
>> Cryptohaze tools have this done for all the GPU kernels - feel free
>> to take a look around.
>>
>
> Thanks, I will do that!
>
Feel free! I'll definitely dig through the OpenCL kernels & see if your
algorithms are faster implemented than mine. :)
>
> magnum
>
>
>
>
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.