john-dev - more robustness

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150622033617.GA16410@openwall.com>
Date: Mon, 22 Jun 2015 06:36:17 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: more robustness

Hi Kai,

I've been thinking of what else you could do on this project, besides
the fuzzing you did so far and the coding style improvements.

A related thought is that GSoC is meant to be about coding to a greater
extent than you've been doing so far.  We appreciate your fuzzing and
coding style experiments a lot, but GSoC has its rules.  You'll need to
upload your code sample to Google once GSoC is over.  What would your
code sample be?  Just your enhancements to my fuzz.pl?  Anything else?

I am not blaming you at all.  In fact, it is our - the mentoring org's -
oversight that we didn't point out the preference for writing more code
under GSoC to you sooner.

Here are a couple of coding-oriented sub-project ideas:

1. JtR builtin extensive self-tests and/or formats fuzzer.  Right now,
--test performs only very basic testing, hashing one password at a time
(albeit in different key indices).  It is not uncommon for a format
being developed to pass self-test, yet fail to crack some passwords when
tested for real on a password hash file and a wordlist.  We could have
a longer-running, yet more extensive self-test and/or fuzzing invoked
via e.g. --test-full or/and --fuzz options (to be introduced).  Just
like --test, these would be usable along with --format (to lock them to
a specific format or to a specific category of formats) as well as
without (to test all formats at once).

You'll need to learn JtR's internal formats interface and understand how
JtR's cracking modes use it and why in order for you to implement this
functionality well.

The extensive test should mimic actual cracking (testing groups of mixed
correct and incorrect passwords at once) and perhaps also combine it
with benchmarking.  Right now, our --test starts with a quick self-test
and then proceeds with a benchmark, which are separate stages; with an
extensive self-test that mimics actual cracking, the self-test and
benchmark should be one and the same stage.

The builtin fuzzer might be similar to my fuzz.pl, but it'd work via the
formats interface directly, not needing to exec anything and thus
achieving higher efficiency.  I think JtR's builtin wordlist rules
engine could be made use of for fuzzing of hash encodings (we'd have a
separate ruleset in a .include'd fuzz.conf file for that).

Having this functionality built-in will likely result in more people
running those extensive tests and fuzzing (and we could include some
randomization to increase the coverage of such community testing).

Frankly, I think this sub-project might be (too?) challenging for you at
this time, yet I thought I'd offer the possibility.

2. Tests for other JtR features, beyond the formats.  You did fuzz other
features, but you did not test them for proper operation.  ("Didn't
crash" isn't the same as "works as intended".)  We do test them during
development and use, but we possibly don't retest them enough after
making changes.

Perhaps those extra tests should be included in the jumbo tree, and
invoked by "make check".  We do have a "make check" already, but it
merely invokes "../run/john --test=0 --verbosity=2".  We also have a JtR
test suite (originally by JimF), and "make test" to invoke it, but it's
about more extensive tests of the formats and IIUC also about basic
testing of the wordlist mode and character encodings.  It does not test
other cracking modes, does not test interrupt/restore, does not test
--fork and --node, etc.

BTW, one of the areas where bugs are fairly likely is the combination of
--fork, --node, and interrupt/restore.  This is something I've been
testing extensively while adding support for --fork and --node, and I
fixed relevant bugs in my initial code (before making it available to
anyone else), but I think it's fairly likely that new bugs creeped in
along with modifications and new cracking modes in jumbo.  (I think
magnum has been testing for this sort of bugs too, though.)  A good test
suite should simulate interrupts and --restore's, and ensure that this
does not result in fewer passwords getting cracked than are cracked in
an uninterrupted session.

I'm sure there are many more things to test as well.  And more
importantly, to enable us and others to retest them easily later.

To work on this sub-project well, you'd need to improve your JtR user
skills.  e.g. try attacking the hashes from the recent Hash Runner
contest via all of the cracking modes, interrupting and restoring, using
--fork and --node, etc.

What do you think?

Naturally, I am also very interested in Alexander Cherepanov's opinion
on all of this.

Thanks,

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.