Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150323024056.GB25495@openwall.com>
Date: Mon, 23 Mar 2015 05:40:56 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Lengths of fields recorded in hashes

On Mon, Mar 23, 2015 at 02:21:27AM +0300, Alexander Cherepanov wrote:
> For some formats, hashes contain variable-length data (say, encoded in 
> hex) and the length of this data is also recorded in the hashes. One of 
> the examples is 7z format which contains 3 data field (salt, iv, data) 
> and 3 lengths for them.
> I think their inclusion was an error

Agreed.

> and they should be removed.

Unclear.

> Mainly because they are useless and only complicate things.

That's (almost) correct.  See below for an exception.

> Does anybody consider them useful? They could potentially be useful when 
> one needs to allocate memory for the data but even in this case they 
> save only one strlen (or equivalent).
> OTOH each superfluous field now requires several lines of code in 
> valid() and get_salt(). And parsing numbers is not that easy, e.g. atoi 
> exhibits undefined behavior on overflows.

Right.  BTW, we should get rid of all atoi()'s, like in favor of a
custom wrapper around strtol() that we'd introduce into our misc.[ch].

Maybe we should adopt OpenBSD's strtonum():

http://www.tedunangst.com/flak/post/the-design-of-strtonum

although its use of "long long" is unnecessarily not nice for 32-bit
builds.  If we deviate from strtonum() in any way, we'll need to use a
different function name.

This is one of many tasks for the "robustness officer" we're seeking.

> Extra fields also take extra 
> space etc. Recently the point was raised that lengths are an obstacle to 
> fuzzing. And it would be nice if most of the hashes are constructed from 
> a small number of simple blocks (a field containing a length of another 
> field is not one of them).

While maybe being a problem for fuzzing, this also suggests that the
fields were maybe not totally useless: they serve as poor man's
checksums, even if only for metadata.  For very long encoded strings,
this makes some sense to me.

> I propose the following plan:
> 1) gradually rewrite valid()s and get_salt()s to ignore lengths recorded 
> in hashes;

No, valid() must continue to check whatever practical.

I am OK with rewriting get_salt().

BTW, officially - such as in formats.h - this function is called
salt().  And ditto for binary().  The get_ prefix, as used by many
formats, isn't exactly right: these function aren't "getters" (from the
format's state), they are "extractors" (from the function argument).
Maybe we should rename them dropping the get_ prefix, or changing it to
extract_ (but that's longer, and ext_ would conflict with the naming
we're already using for functions implementing external mode support).
For now, let's not bother - not the worst of our worries, by far.

> 2) change the format of hashes in john and in 2john tools at some later 
> point.

Hmm, to be compatible with your changes proposed as step 1 above, would
the versions of *2john tools from step 2 produce empty fields in place
of the lengths?  That is, two field separators in a row?  That's still
not pretty.

Anyway, we probably should do this, for the reasons magnum explained.

> Can we change format of such hashes at will (by requiring to always use 
> john and 2john tools of matching versions)?

Not necessarily.  We could if we really needed to, though.

> Attached is a patch to 7z format which implements the first part of the 
> plan as an illustration of the approach. (It also fixes a potential 
> overflow in data field and removes strange dealing with zeroes in iv.)

I didn't review it closely, but in general I think that your changes to
get_salt() are acceptable since get_salt() can assume(*) that its input
is valid, but your removal of checks from valid() is not acceptable.

(*) We should discuss this separately.  In fact, I recall this was
already brought up in another thread.  I think it's another task for
"robustness officer" to review loader.c very carefully, and to fuzz an
instrumented version of JtR with assertions added, and make sure that
only output from valid(), possibly passing through split(), can pass
into salt() and binary().  If this is not currently the case, then
whatever exceptions to this might exist should be patched in loader.c.

Please write down the tasks I am mentioning here into some sort of to-do
list, such as a page you'd create on our wiki for the robustness project.

Thanks,

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.