|
Message-ID: <CAKGDhHV0O8wEm5+j6V9oMr6VHQiN0tFJfV=FvVbO+1a1_Ri-WA@mail.gmail.com> Date: Fri, 22 May 2015 02:00:27 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: autotune_run problem hi, I've fixed one bug in my parallel-opencl but I have problem with autotune_run I discovered that for different set of arguments is running different algorithm to determine when tuning for gws is stopped If i make autotune_run(self, 1000, 0, 500); the stop is determined by this code: (common-opencl.c) if (duration_time && (endTime - startTime) > duration_time) { runtime = looptime = 0; if (options.verbosity > 4) fprintf(stderr, " (exceeds %s)", ns2string(duration_time)); break; } which means that we are computing various gws values until we exceed the time, then we are choosing gws for the best speed computed gws is optimal on my laptop and --dev=1 but not in --dev=5, it prints exceed for the optimal value and setting highest duration_time doesn't work when my autotune_run call looks like: autotune_run(self, 1, 1000, 100000); the time when we stop computing is determined by: if (best_speed && speed < 1.8 * best_speed && max_run_time && run_time > max_run_time) { if (!optimal_gws) optimal_gws = num; if (options.verbosity > 3) fprintf(stderr, " - too slow\n"); break; } we stop computing new values for gws only when new speed isn't 1.8 faster than the previous and 1.8 is a wrong value for parallel, a change from 1.8 to 1.1 works good for --dev=1 and on my laptop but for --dev=5 it stops for unoptimal gws=4096. it stops on 4096 because there is no difference in the speed for gws=4094 and 8192 for 32768 the speed is better the speed for parallel on --dev=5 for various gws: Local worksize (LWS) 64, global worksize (GWS) 4096 Benchmarking: parallel-opencl, parallel SHA-512 [ ]... DONE Speed for cost 1 (N) of 0 Many salts: 23405 c/s real, 23630 c/s virtual Only one salt: 23630 c/s real, 23630 c/s virtual Local worksize (LWS) 64, global worksize (GWS) 8192 Benchmarking: parallel-opencl, parallel SHA-512 [ ]... DONE Speed for cost 1 (N) of 0 Many salts: 23405 c/s real, 23630 c/s virtual Only one salt: 21557 c/s real, 21557 c/s virtual Local worksize (LWS) 64, global worksize (GWS) 32768 Benchmarking: parallel-opencl, parallel SHA-512 [ ]... DONE Speed for cost 1 (N) of 0 Many salts: 33098 c/s real, 33098 c/s virtual Only one salt: 33098 c/s real, 33098 c/s virtual any idea how I can set optimal gws also for --dev=5 ? reults above might suggest that we can have some hashes not autotuned properly but persons with better knowledge about autotune_run should comment this
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.