|
Message-ID: <20121209234436.GD4261@openwall.com> Date: Mon, 10 Dec 2012 03:44:36 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: GCN: indexed access to VGPRs On Sun, Dec 09, 2012 at 02:38:18PM +0200, Milen Rangelov wrote: > I thought about that when doing the bcrypt kernel. There is one problem > with that though - we have a hard limit of 256 VGPRs per workitem Yeah, I thought of this shortly after I sent the messages yesterday, but I was unsure. The GCN instruction encoding only allows for fixed VGPR register numbers in the 0 to 255 range, but it is unclear if this limitation applies to indexed access to VGPRs as well or not (there's no fixed-width field for the register number then). Anyhow, OpenCL might impose this limitation universally, regardless of what the hardware is capable of. > and it > does not matter how many workitems per group we spawn, the limit stays even > if we run the kernel with worksize of say just 2 items (effectively that > means we'd underuse the register file a lot). So we can utilize at most 1KB > of registers for our sbox data. What eventually happens though is that the > compiler spills registers into global memory (and this register spill is > much worse than I expected). I tried having one of the 4 sboxes as a > private array and got a lot of spilled registers, the end result being > slower even given the increased occupancy and finally for some reason the > kernel was not calculating the hash correctly (might be mistake on my part > or a compiler issue, didn't investigate). Understood. Placing one of the 4 S-boxes into registers was one of my ideas, too (was not mentioned yet). > Perhaps though, smaller chunk of the sbox in VGPRs would be beneficial, I > just did not try that possibility. We'd have an if/else then - and if it's implemented with eager execution, then we incur the LDS access latency even when the data is in fact in a register. What we gain is a slightly higher number of concurrent bcrypt instances per CU (18 instead of 16 if we put one half of one S-box into registers?) This is worth experimenting with, but if the 256 registers per work-item limit does in fact apply, then any possible gain is quite minor. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.