|
Message-ID: <470618e39e5c8a37b25d5a3424b220c8@smtp.hushmail.com> Date: Sat, 14 Apr 2012 12:59:51 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: Find best LWS in OpenCL formats On 04/14/2012 02:05 AM, Lukas Odzioba wrote: > Currently OpenCL LWS testing takes ages on gpu devices because testing > starts from 1 thread goes up to maximum value for particular > device/thread. > I think we should change it to start from 32 for gpu's, it is simply > not make sense use small values. > For example for wpapsk this change affects much shorter -test run (46s > reduced to 6s). > > ///Find best local work size > my_work_group = 1; > if(device_type==CL_DEVICE_TYPE_GPU) my_work_group=32; > for (; (int) my_work_group <= (int) max_group_size; > my_work_group *= 2) { > (...) > For better future adoption it might be better to start at CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE (this will be 32 for current nvidia). Also, the global work size should be lower for CPU. I have experimented with limiting time for a run. This too is good for CPU. If run time exceeds 10 seconds (for RAR, possibly much less for most formats), we should stop trying even higher numbers. There really is no point in trying a worksize of 1024 on a dual core CPU :) BTW I believe CL_KERNEL_WORK_GROUP_SIZE is a better maximum than CL_DEVICE_MAX_WORK_GROUP_SIZE. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.