john-dev - Re: Sayantan :Weekly Report #2

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+TsHUDMHG+7K3udGvi6gLMxJQ-T1Z5gg8d3uRyVxuSzZ0Q3og@mail.gmail.com>
Date: Tue, 1 May 2012 10:58:22 +0530
From: SAYANTAN DATTA <std2048@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Sayantan :Weekly Report #2

>
> 1. Implemented a function to find the optimum local work group size in
> > opencl-mscash2.
> Is this in magnum-jumbo?  It seems not.  If so, where is it?
>
> OK.  The code currently in magnum-jumbo only achieves 36k c/s on 7970:
>
> user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2-opencl
> -pla=1
> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
> Using device 0: Tahiti
> Benchmarking: MSCASH2-OPENCL [PBKDF2_HMAC_SHA1]... DONE
> Raw:    35754 c/s real, 50592 c/s virtual
>

I haven't posted a patch yet because it would require two or more kernels
for better utilization of various GPUs. As my kernel size is quite large I
should first try to make it as compact as possible before stuffing two or
more kernel in one file. This could be easily done using function inlining
and macros. But I need more time. Whatsoever I will try to post the patch
by  8th or 9th of this month. Currently I have my codes  on bull that
produce 73k c/s on  7970 which you might test if you want to.

 rotate() function caused huge performance drop on 7970 on bull.
> This is puzzling.  Perhaps you can try reviewing the generated code (IL
> or native) to figure out the cause of the performance drop?  In general,
> I think we (as a team) should learn to do that.


One task I think you could approach slightly later is trying to
> implement and optimize Eksblowfish on GPU.  As discussed before, we
> expect it to be slow, but it'd be useful to have some hard data to prove
> this - or maybe disprove it (unlikely), and to have some OpenCL code
> (and maybe CUDA as well) that we could run on future GPUs easily as they
> become available.  Specifically, this may be helpful for design of
> future password hashing methods.  Additionally, this OpenCL code may
> happen to be readily capable of making use of AVX2's VSIB addressing
> with Intel's OpenCL SDK - if so, it may actually be faster (on those
> future CPUs) than the existing CPU code for bcrypt, until we implement
> proper AVX2 code more directly (perhaps with intrinsics).
> I had mentioned this task in GSoC student selection context before, but
> it may also be approached outside of that context and with slightly
> different goals as above.  In that way, it will actually be useful even
> if the implementation is indeed slower than the current CPU code on
> current hardware.


Okay, we will do that.

Content of type "text/html" skipped

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.