|
Message-ID: <20120415233104.GA8929@debian> Date: Mon, 16 Apr 2012 03:31:04 +0400 From: Aleksey Cherepanov <aleksey.4erepanov@...il.com> To: john-users@...ts.openwall.com Subject: Re: .chr files On Fri, Apr 13, 2012 at 10:59:42PM +0200, Simon Marechal wrote: > Le 13/04/2012 22:46, Aleksey Cherepanov a écrit : > > Assume that we have mixed passwords of two patterns. We build .chr and > > enumerate each password with a number according to its positions in a list of > > candidates this .chr file provides. We drop one password from our set and redo > > the steps and numbers are changed: if ratio between the biggest group of > > password and the smallest group is higher than before then it was a password > > from the smallest group else it was a password from the biggest group. I am > > not sure how to measure numbers right. > > You assume that incremental mode will be a good tool to model password > patterns. I do not believe this is the case for most, even if it worked > reasonably well during the constest. Practically I think that my algo is too slow. But in general I think it is one of possible K-means cluster analysis that has a metric based on distance between passwords relative to the best .chr file specific to a cluster. I think less brutal algo is to compare password with other by .chr file built from only that password so similar, "close" passwords would be in top john -i output while significantly different passwords would be later (it could be needed to mix in simple passwords to provide all letters to make it possible to have passwords with different set of letters on one list). Though I am not a specialist in questions of statistics (yet). http://en.wikipedia.org/wiki/Cluster_analysis http://en.wikipedia.org/wiki/K-means_clustering There are a lot of different methods to do password grouping, even with K-means clustering there could be variations in choose of metric. So I guess there could be better methods then proposed. Also it could be reasonable to use different methods at the same time. Though one thing we need is a method that shows groups close in meaning of rule set generation while other thing is a method that shows groups by meaning (for example list of pokemons was a pattern during the contest). Even rule set generation needs different things: passwords obtained by mutations and by generation through template like lllddd (while it could be expressed in mutations it is less likely to find two passwords of that pattern that are close in mutations). So approaches are necessary. Regards, Aleksey Cherepanov
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.