|
Message-ID: <87572723674fd11fbc586991ff135ec9@smtp.hushmail.com> Date: Sat, 21 Apr 2012 01:26:21 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: cl_khr_byte_addressable_store On 04/21/2012 01:03 AM, Solar Designer wrote: > On Sat, Apr 21, 2012 at 12:45:56AM +0200, magnum wrote: >> Then I'm afraid you lost me. Just how should I approach this? Should I >> do two separate kernels or should I try some kind of bit-flipping >> madness that just might work on both AMD and nvidia? > > I can't speak for Milen, but I guess that to write a byte you need to > read a naturally aligned 4-byte word, mask out the original byte in it, > OR in your new byte value, and write that word back. Of course, this is > non-atomic, but you should not be accessing nearby bytes from another > thread anyway. > > An obvious optimization would be to combine multiple byte writes > together such that you read/write fewer words (such as one per 4 bytes). Yes, thanks. I already do things similar to what you say for performance reasons but the non-aligned cases will get nasty (or tedious at the very least) if I am not allowed to ever write an unaligned byte. I am really surprised by this limitation, this was not the obstacles I was picturing when I got into this game. The older I get, the older I become :) magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.