|
Message-ID: <20150323024056.GB25495@openwall.com> Date: Mon, 23 Mar 2015 05:40:56 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Lengths of fields recorded in hashes On Mon, Mar 23, 2015 at 02:21:27AM +0300, Alexander Cherepanov wrote: > For some formats, hashes contain variable-length data (say, encoded in > hex) and the length of this data is also recorded in the hashes. One of > the examples is 7z format which contains 3 data field (salt, iv, data) > and 3 lengths for them. > I think their inclusion was an error Agreed. > and they should be removed. Unclear. > Mainly because they are useless and only complicate things. That's (almost) correct. See below for an exception. > Does anybody consider them useful? They could potentially be useful when > one needs to allocate memory for the data but even in this case they > save only one strlen (or equivalent). > OTOH each superfluous field now requires several lines of code in > valid() and get_salt(). And parsing numbers is not that easy, e.g. atoi > exhibits undefined behavior on overflows. Right. BTW, we should get rid of all atoi()'s, like in favor of a custom wrapper around strtol() that we'd introduce into our misc.[ch]. Maybe we should adopt OpenBSD's strtonum(): http://www.tedunangst.com/flak/post/the-design-of-strtonum although its use of "long long" is unnecessarily not nice for 32-bit builds. If we deviate from strtonum() in any way, we'll need to use a different function name. This is one of many tasks for the "robustness officer" we're seeking. > Extra fields also take extra > space etc. Recently the point was raised that lengths are an obstacle to > fuzzing. And it would be nice if most of the hashes are constructed from > a small number of simple blocks (a field containing a length of another > field is not one of them). While maybe being a problem for fuzzing, this also suggests that the fields were maybe not totally useless: they serve as poor man's checksums, even if only for metadata. For very long encoded strings, this makes some sense to me. > I propose the following plan: > 1) gradually rewrite valid()s and get_salt()s to ignore lengths recorded > in hashes; No, valid() must continue to check whatever practical. I am OK with rewriting get_salt(). BTW, officially - such as in formats.h - this function is called salt(). And ditto for binary(). The get_ prefix, as used by many formats, isn't exactly right: these function aren't "getters" (from the format's state), they are "extractors" (from the function argument). Maybe we should rename them dropping the get_ prefix, or changing it to extract_ (but that's longer, and ext_ would conflict with the naming we're already using for functions implementing external mode support). For now, let's not bother - not the worst of our worries, by far. > 2) change the format of hashes in john and in 2john tools at some later > point. Hmm, to be compatible with your changes proposed as step 1 above, would the versions of *2john tools from step 2 produce empty fields in place of the lengths? That is, two field separators in a row? That's still not pretty. Anyway, we probably should do this, for the reasons magnum explained. > Can we change format of such hashes at will (by requiring to always use > john and 2john tools of matching versions)? Not necessarily. We could if we really needed to, though. > Attached is a patch to 7z format which implements the first part of the > plan as an illustration of the approach. (It also fixes a potential > overflow in data field and removes strange dealing with zeroes in iv.) I didn't review it closely, but in general I think that your changes to get_salt() are acceptable since get_salt() can assume(*) that its input is valid, but your removal of checks from valid() is not acceptable. (*) We should discuss this separately. In fact, I recall this was already brought up in another thread. I think it's another task for "robustness officer" to review loader.c very carefully, and to fuzz an instrumented version of JtR with assertions added, and make sure that only output from valid(), possibly passing through split(), can pass into salt() and binary(). If this is not currently the case, then whatever exceptions to this might exist should be patched in loader.c. Please write down the tasks I am mentioning here into some sort of to-do list, such as a page you'd create on our wiki for the robustness project. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.