Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55189563.4000402@openwall.com>
Date: Mon, 30 Mar 2015 03:14:27 +0300
From: Alexander Cherepanov <ch3root@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Lengths of fields recorded in hashes

On 2015-03-23 05:40, Solar Designer wrote:
>> Does anybody consider them useful? They could potentially be useful when
>> one needs to allocate memory for the data but even in this case they
>> save only one strlen (or equivalent).
>> OTOH each superfluous field now requires several lines of code in
>> valid() and get_salt(). And parsing numbers is not that easy, e.g. atoi
>> exhibits undefined behavior on overflows.
>
> Right.  BTW, we should get rid of all atoi()'s, like in favor of a
> custom wrapper around strtol() that we'd introduce into our misc.[ch].
>
> Maybe we should adopt OpenBSD's strtonum():
>
> http://www.tedunangst.com/flak/post/the-design-of-strtonum
>
> although its use of "long long" is unnecessarily not nice for 32-bit
> builds.  If we deviate from strtonum() in any way, we'll need to use a
> different function name.

Such a function doesn't seem to be a good fit for one-pass parsing at 
all. It doesn't return a position where it has stopped parsing and it 
"doesn’t handle [...] trailing non-digit characters". This leads to 
strdup+strtok-like approach which I thought we want to get rid of.

> This is one of many tasks for the "robustness officer" we're seeking.
>
>> Extra fields also take extra
>> space etc. Recently the point was raised that lengths are an obstacle to
>> fuzzing. And it would be nice if most of the hashes are constructed from
>> a small number of simple blocks (a field containing a length of another
>> field is not one of them).
>
> While maybe being a problem for fuzzing, this also suggests that the
> fields were maybe not totally useless: they serve as poor man's
> checksums, even if only for metadata.  For very long encoded strings,
> this makes some sense to me.

Actually, I thought about it but decided that it's minor enough to not 
be worth mentioning.

>> I propose the following plan:
>> 1) gradually rewrite valid()s and get_salt()s to ignore lengths recorded
>> in hashes;
>
> No, valid() must continue to check whatever practical.
>
> I am OK with rewriting get_salt().
>
> BTW, officially - such as in formats.h - this function is called
> salt().  And ditto for binary().  The get_ prefix, as used by many
> formats, isn't exactly right: these function aren't "getters" (from the
> format's state), they are "extractors" (from the function argument).
> Maybe we should rename them dropping the get_ prefix, or changing it to
> extract_ (but that's longer, and ext_ would conflict with the naming
> we're already using for functions implementing external mode support).
> For now, let's not bother - not the worst of our worries, by far.

Ok, I've added it into todo list:

- Unify function names (get_salt -> salt etc.). This makes 
grepping/refactoring easier.

>> 2) change the format of hashes in john and in 2john tools at some later
>> point.
>
> Hmm, to be compatible with your changes proposed as step 1 above, would
> the versions of *2john tools from step 2 produce empty fields in place
> of the lengths?  That is, two field separators in a row?  That's still
> not pretty.
>
> Anyway, we probably should do this, for the reasons magnum explained.

You mean to accept empty length but check that it's right if it's not empty?

>> Can we change format of such hashes at will (by requiring to always use
>> john and 2john tools of matching versions)?
>
> Not necessarily.  We could if we really needed to, though.
>
>> Attached is a patch to 7z format which implements the first part of the
>> plan as an illustration of the approach. (It also fixes a potential
>> overflow in data field and removes strange dealing with zeroes in iv.)
>
> I didn't review it closely, but in general I think that your changes to
> get_salt() are acceptable since get_salt() can assume(*) that its input
> is valid, but your removal of checks from valid() is not acceptable.

Ok

> (*) We should discuss this separately.  In fact, I recall this was
> already brought up in another thread.  I think it's another task for
> "robustness officer" to review loader.c very carefully, and to fuzz an
> instrumented version of JtR with assertions added, and make sure that
> only output from valid(), possibly passing through split(), can pass
> into salt() and binary().  If this is not currently the case, then
> whatever exceptions to this might exist should be patched in loader.c.
>
> Please write down the tasks I am mentioning here into some sort of to-do
> list, such as a page you'd create on our wiki for the robustness project.

I've posted the todo list but I'm yet to create a wiki page for it.

-- 
Alexander Cherepanov

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.