|
Message-ID: <20130617172221.GC30970@openwall.com> Date: Mon, 17 Jun 2013 21:22:21 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Mask mode for GPU On Mon, Jun 17, 2013 at 10:10:35AM +0530, Sayantan Datta wrote: > In a little experiment I simply put the kernel inside a 10 iter loop. > Nothing else is changed but somehow I was only able to 20M c/s(with > updated *pcount) whereas I can get around > 80M c/s with just one loop. I think it may be problem with the opencl > compiler trying to unroll the 10 loops causing i-cache overrun. Oh. Can you investigate that and try to get around it perhaps with different source code layout? BTW, out of different GPUs, AMD GCN GPUs appeared to tolerate the code exceeding I-cache size pretty well - although perhaps not in all cases. I think the DES cracker referenced from that Reddit thread I mentioned did exceed I-cache size, but it ran pretty fast on GCN GPUs (and only on those...), I guess due to each fetch into the I-cache being made use of multiple times (by multiple wavefronts). > In my approach I was only going to generate 32 keys per work item and > avoid any loop inside the kernel for descrypt. Whereas in your approach > we would be generating 26*26 passwords per kernel requiring (26*26)/ 32 > kernel iterations per kernel invocation. Oh, I overlooked the fact that with bitslice DES you obviously need to process a multiple of 32 candidate passwords per work-item. A drawback of your approach is that you'll probably end up doing modulo division of get_global_id(), which may be slow. Also, you may have to compute all of the GPU-generated characters (such as 2 or 3) in each work-item, whereas with my suggested approach you'd only update one character in most loop iterations. A hybrid approach might work best - leaving the somewhat slow computation mentioned above outside of a loop iterating over one or two characters. Yes, you'd need to deal with the 80M to 20M slowdown you identified above somehow... in fact, since 80M is ~3x+ less than our target speed anyway, you'd need to figure out why this code runs slower than it should and try to fix it either way. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.