john-dev - Re: Lukas's Status Report

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CABob6irSwZK+60=K8nqOvn+oN1vLi-oQMGpO3qybWtr_50UT6w@mail.gmail.com>
Date: Thu, 25 Aug 2011 19:15:25 +0200
From: Lukas Odzioba <lukas.odzioba@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Lukas's Status Report - #15 of 15

2011/8/25 Solar Designer <solar@...nwall.com>:
> After applying your john-1.7.8-mscash2cuda-0.diff, I changed:
>
>        mscash2_init(1);
>
> to:
>
>        mscash2_init(0);
>
> or it was failing trying to use a non-existent second GPU as far as I
> could tell.  Do you have two NVidia GPUs in your machine now? :-)
Yes I've got 460 in pcie 16x and 9800gt in pcie 4x. This bug was
because of code duplication. Every format had own init function. In
the next revision there will be one common init with command line
parameter.


> Running two instances at once, I got:
>
> Raw:    784 c/s real, 819 c/s virtual
> Raw:    934 c/s real, 961 c/s virtual
>
> which is slightly faster (1700 c/s combined).
On my PC cpu time is 0.6% of all. With slower cpu (in terms of thread
speed) it might be even more, so small cpu_speed/gpu_speed increases
gpu iddle time. I can divide computation on two parts and compute
second cpu part during first gpu part execution. For now it is
sequentialy do_cpu -> do_gpu.

> Overall, this feels somewhat slow - comparable to a quad-core CPU.
> There's probably a lot of room for optimization.
>
> Your 8160 c/s for a faster GPU is much better, though. :-)
Mscash2 and sha512crypt kernels requires quite a lot registers, and
because Fermi have got more of them is able to run more threads at
once and better hide memory latency.

>> Patch is configured for older devices (sm=10,128threads) to be more
>> portable. As Solar stated only pbkdf2 is on gpu side.
>
> Yet you implemented the on-CPU mscash portion of mscash2 in the .cu
> source file - wouldn't it be cleaner/easier to have it in .c?  (Maybe
> this is how it should be.  I am merely asking.)
I assumed that it will be better to have all computing code in one
file rather than cpu, gpu and common function preproc duplicated in
two separate files.

>> It is basicly Sn3f's implementation with JimF's optimizations, and
>> it's not (yet) fully optimal. I estimate that optimal should do around
>> 13k c/s on gtx460.
>
> How did you arrive at this estimate?
Someone stated on john-contest list that AMD5870 running oclhashcat is
doing 59k c/s.
I took Ivan Golubev's sha1 estimations for both cards and compared
results. Yes it's not exact but gives some overview.

Lukas

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.