|
Message-ID: <CA+TsHUDMHG+7K3udGvi6gLMxJQ-T1Z5gg8d3uRyVxuSzZ0Q3og@mail.gmail.com>
Date: Tue, 1 May 2012 10:58:22 +0530
From: SAYANTAN DATTA <std2048@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Sayantan :Weekly Report #2
>
> 1. Implemented a function to find the optimum local work group size in
> > opencl-mscash2.
> Is this in magnum-jumbo? It seems not. If so, where is it?
>
> OK. The code currently in magnum-jumbo only achieves 36k c/s on 7970:
>
> user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2-opencl
> -pla=1
> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
> Using device 0: Tahiti
> Benchmarking: MSCASH2-OPENCL [PBKDF2_HMAC_SHA1]... DONE
> Raw: 35754 c/s real, 50592 c/s virtual
>
I haven't posted a patch yet because it would require two or more kernels
for better utilization of various GPUs. As my kernel size is quite large I
should first try to make it as compact as possible before stuffing two or
more kernel in one file. This could be easily done using function inlining
and macros. But I need more time. Whatsoever I will try to post the patch
by 8th or 9th of this month. Currently I have my codes on bull that
produce 73k c/s on 7970 which you might test if you want to.
rotate() function caused huge performance drop on 7970 on bull.
> This is puzzling. Perhaps you can try reviewing the generated code (IL
> or native) to figure out the cause of the performance drop? In general,
> I think we (as a team) should learn to do that.
One task I think you could approach slightly later is trying to
> implement and optimize Eksblowfish on GPU. As discussed before, we
> expect it to be slow, but it'd be useful to have some hard data to prove
> this - or maybe disprove it (unlikely), and to have some OpenCL code
> (and maybe CUDA as well) that we could run on future GPUs easily as they
> become available. Specifically, this may be helpful for design of
> future password hashing methods. Additionally, this OpenCL code may
> happen to be readily capable of making use of AVX2's VSIB addressing
> with Intel's OpenCL SDK - if so, it may actually be faster (on those
> future CPUs) than the existing CPU code for bcrypt, until we implement
> proper AVX2 code more directly (perhaps with intrinsics).
> I had mentioned this task in GSoC student selection context before, but
> it may also be approached outside of that context and with slightly
> different goals as above. In that way, it will actually be useful even
> if the implementation is indeed slower than the current CPU code on
> current hardware.
Okay, we will do that.
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.