Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <01f801cd28cc$eaa53f30$bfefbd90$@net>
Date: Wed, 2 May 2012 20:34:43 -0500
From: "jfoug" <jfoug@....net>
To: <john-dev@...ts.openwall.com>
Subject: RE: New JtR functionality, re-build lost salts

From: magnum [mailto:john.magnum@...hmail.com]
>> This modification to JtR will allow these (missing salts) to be 
>>found
>> (albeit, pretty slowly).
>
>This is a curious patch, I haven't had time to try it out but I will
>later.
>
>A thing that hits me is this is a task a fast GPU format like raw-md5
>could do very well, without the bandwidth problems it has otherwise...
>we just supply a fairly small buffer of words (perhaps just one word)
>and the GPU code generates all salts itself. But I guess it would need
>some support from the format interface.
>
>I suppose this patch as-is could be used with a slightly modified GPU
>format with less work, but then we'd have to transfer salts from CPU
>side. That is much lighter than transfering millions of keys though.

The way things work, is for each key, you run 'almost' the same crypt code,
X times, where X is the universe of salt's.  So for OSC, there are 95**2
runs for every candidate PW that many times.  The 'generation' of the salts
is highly trivial.  

For some formats (md5-6 and md5-9), the candidate needs to have MD5 code run
1 time, then ALL salts use the results of that.  I am not sure how easily
that would 'scale' to GPU code.  If each GPU could encode the 3 bytes of
salt, and then all of them could encode the 32 bytes of the '1' common
buffer holding that pre-computed MD5 value, and do it at the same time, then
that would make the GPU code very fast.  It would totally eliminate the
buffer movement of that md5 hash X number of times.

It 'is' a curious patch, and sort of does things a little different than the
'normal' JtR way of doing things.  However, with your reply (and my reply to
yours), it probably could be made quite a bit faster even on the CPU side,
by doing some creative SSE coding.  Only the first 4 bytes (per interleaved
SSE buffer), would be need to be set independently (for the md5($s.md5($p)
or md5(md5($p).$s formats, which are PHPS/MediaWiki), and a single 64 byte
buffer would be 'shared' between all of the simultaneous SSE's.  I think
this would greatly reduce the memory movement, and should speed things up a
bit, since 'almost' all of each of the buffers is the exact same data.

Jim.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.