john-users - Re: .chr files

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120415233104.GA8929@debian>
Date: Mon, 16 Apr 2012 03:31:04 +0400
From: Aleksey Cherepanov <aleksey.4erepanov@...il.com>
To: john-users@...ts.openwall.com
Subject: Re: .chr files

On Fri, Apr 13, 2012 at 10:59:42PM +0200, Simon Marechal wrote:
> Le 13/04/2012 22:46, Aleksey Cherepanov a écrit :
> > Assume that we have mixed passwords of two patterns. We build .chr and
> > enumerate each password with a number according to its positions in a list of
> > candidates this .chr file provides. We drop one password from our set and redo
> > the steps and numbers are changed: if ratio between the biggest group of
> > password and the smallest group is higher than before then it was a password
> > from the smallest group else it was a password from the biggest group. I am
> > not sure how to measure numbers right.
> 
> You assume that incremental mode will be a good tool to model password
> patterns. I do not believe this is the case for most, even if it worked
> reasonably well during the constest.

Practically I think that my algo is too slow. But in general I think it is one
of possible K-means cluster analysis that has a metric based on distance
between passwords relative to the best .chr file specific to a cluster.

I think less brutal algo is to compare password with other by .chr file built
from only that password so similar, "close" passwords would be in top john -i
output while significantly different passwords would be later (it could be
needed to mix in simple passwords to provide all letters to make it possible
to have passwords with different set of letters on one list).

Though I am not a specialist in questions of statistics (yet).

http://en.wikipedia.org/wiki/Cluster_analysis
http://en.wikipedia.org/wiki/K-means_clustering

There are a lot of different methods to do password grouping, even with
K-means clustering there could be variations in choose of metric. So I guess
there could be better methods then proposed. Also it could be reasonable to
use different methods at the same time.

Though one thing we need is a method that shows groups close in meaning of
rule set generation while other thing is a method that shows groups by meaning
(for example list of pokemons was a pattern during the contest). Even rule set
generation needs different things: passwords obtained by mutations and by
generation through template like lllddd (while it could be expressed in
mutations it is less likely to find two passwords of that pattern that are
close in mutations). So approaches are necessary.

Regards,
Aleksey Cherepanov

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.