john-dev - Re: best way to get ciphertext

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111121165227.GA26729@openwall.com>
Date: Mon, 21 Nov 2011 20:52:27 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: best way to get ciphertext

Hi Samuele,

Thank you for bringing this topic up - I mean not just "getting the
ciphertext", but scalability issues for fast hashes in general.

On Mon, Nov 21, 2011 at 02:59:23PM +0100, Samuele Giovanni Tonon wrote:
> i'm trying to add key comparison inside the opencl kernel code
> trying to see if this add more speed to the process.

Yes, but this is difficult to do.

> at the moment all the job is done inside crypt_all() in which
> i set the salt, the list of cleartext password to hash, the output
> buffer .
> 
> i tried also to pass to opencl kernel ciphertext password by calling
> "binary", however with my great disappoint i'm not getting the password
> but some random data.

What "ciphertext password"?  John normally has many hashes loaded at
once, including often many per salt.  Having just one hash to crack (per
salt, if applicable) is only a special case.

> i tried to print inside crypt_all and cmp_all binary value with a simple:
> 
> printf("cry %x %x %x %x %x \n ", ((ARCH_WORD_32 *)binary)[0],
> ((ARCH_WORD_32 *)binary)[1], ((ARCH_WORD_32 *)binary)[2], ((ARCH_WORD_32
> *)binary)[3], ((ARCH_WORD_32 *)binary)[4]);

No idea what binary value you're trying to print here.  There might not
even be a symbol called "binary" and available inside crypt_all().
Well, maybe you happen to have a function called binary(), like many
formats do, and you print portions of its code here? ;-)

> however while on cmp_all i get the right "numbers", on crypt_all
> i get nothing valuable.
> 
> since it looks like binary is not available inside crypt_all
> (because not yet setted?)

Yes, not available there.  No, for more fundamental reasons.

> i'm wondering which is best to do to solve
> the problem which in the end is quite simple:
> is there a good way to crypt and compare at the same time using the same
> function or shall i go with some nasty hacks ?
> Has anyone found similar problem on other formats ?

This is not simple at all, and it applies to all formats indeed.

There's currently no interface to communicate hashes loaded for cracking
into the format code.  You may introduce a dirty hack where binary()
would record the hashes and then cmp_all() would compare against those,
but another hurdle is that cmp_all() is not always called - when there
are a lot of hashes for a given salt, the cracker.c code will use
get_hash*() instead and do comparisons on its own.  So you'd also need
to introduce an FMT_* flag maybe to disable that or to have it enabled
only when a much higher threshold of hashes per salt is reached.

Better yet, we'd actually need to enhance the formats interface.

Another difficulty is that your format would need to duplicate
cracker.c's removal of already cracked hashes. from your own data
structures.  If you don't do that, you happen to have an easily cracked
hash loaded initially, and you happen to try the corresponding password
multiple times (e.g., as a result of wordlist rules producing some
duplicates, which is normally acceptable when attacking fast hashes),
you might end up having cmp_all() return true too often (and then you
hit slow code paths).

Besides comparison of computed hashes, another bottleneck is set_key(),
which is currently called by just one thread in the main program.  (This
is a problem for fast hashes with OpenMP builds of John as well - e.g.,
this is why LM does not scale beyond 100M c/s or so on CPU currently.)
I think this one can be dealt with in two ways:

1. Have an FMT_* flag that would indicate that set_key() may be called
by multiple threads at once, for different key indices indeed.  Have
cracking mode specific code parallelized with OpenMP as well (the
difficulty of actually doing this will vary by cracking mode).

2. Have an extra per-format method to specify that crypt_all() should
produce and try multiple candidate passwords for every key that was set
with set_key().  For example, it could specify character positions and
charsets.  Then we'd actually have a hybrid of smart high-level cracking
modes with dumb exhaustive search for a few character positions.  It
could be made a bit smarter e.g. by incremental mode altering the
charset for those positions to match what it currently tries for the
rest.  And then we also need a way for a format to say that it computed
more hashes than it was supplied candidate passwords, so get_hash*(),
get_key(), etc. would need to be called for larger index values as well
(up to the actual number of hashes computed) - and the hash comparison
optimizations mentioned above would preferably need to be used.

Overall, this is complicated stuff involving some trade-offs.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.