john-dev - Re: Reload pot file

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140303024216.GB659@openwall.com>
Date: Mon, 3 Mar 2014 06:42:16 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Reload pot file

magnum -

On Sat, Mar 01, 2014 at 05:09:35PM +0100, magnum wrote:
> Simplest but slowest alternative would be to drop the database, and more 
> or less reload it (from both input hash files and pot files) just like 
> an initial load.

Ouch.  I wouldn't even consider this approach.  Nor the re-exec.

> A faster and safer solution would be to just re-process pot file using 
> existing functions. We miss the opportunity to reload the input files 
> containing hashes to crack but that was never my main goal anyway. The 
> worst problem seems to be the database used during initial load is not 
> exactly the same as the one ultimately used. Perhaps that doesn't 
> necessarily matter?

I don't understand what you mean by "the database used during initial
load is not exactly the same as the one ultimately used".  Can you
please clarify which aspect(s) you're referring to here?  I'd like to
comment on this, but as it is I am just confused.

> I'm not sure but I might need to implement an alternative version of 
> ldr_load_pot_file() to begin with. Anyway the hardest part for me now is 
> deciding what needs to be done - and when it's appropriate to do so. I 
> think this processing could take place right after crk_password_loop() 
> with no side effects.
> 
> I'd appreciate any hints you can come up with!

For --fork, my tentative plan was to introduce a new mem_alloc_shared()
function or such, which would allocate memory from mmap's that would be
shared between the child processes.  Such allocations would occur before
fork().  We'd use mem_alloc_shared() to allocate the database structures
that are or may be updated when a cracked hash is removed.  And we'd
need to introduce a mutex on such removals - perhaps place it within a
shared allocation of this kind too.  That would be a weird mix of
fork(), partially shared memory, and a mutex (from OpenMP? from
pthreads?), but it might just work and not even look too ugly - I
thought I'd try and then decide whether to keep this in a release or
consider this a funny experiment and throw it away.  Since this is a bit
more operating system specific than the rest of JtR is, I think it'll
need to be optional (compile-time).  Unfortunately, I haven't yet found
time for this experiment since my work on 1.8.0.

Re-reading the pot is more generic since it also supports MPI and
independent invocations of john (e.g., if someone manually invokes john
with multiple wordlists one after another while also running it in
incremental mode, the incremental run's john would remove the hashes
cracked by the wordlist runs).  So even if the shared memory approach
above would happen to work well for --fork, I realize there may be
demand for re-reading the pot anyway.  So you may implement that in
jumbo, and leave the shared memory for me to eventually experiment with,
or you may try the shared memory thing yourself if you like.

Oh, and when you re-read, you can start reading from a previously
recorded offset (the last re-reads pot file size).  Then it may actually
be fast.

Thanks!

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.