john-dev - Re: Upper casing (and lower casing) in john

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <11970F45F23845C48CB7E11B2501B5E9@ath64dual>
Date: Thu, 14 Jul 2011 15:30:45 -0500
From: "JFoug" <jfoug@....net>
To: <john-dev@...ts.openwall.com>
Subject: Re: Upper casing (and lower casing) in john

>From: "magnum"
> On 2011-07-14 17:18, JimF wrote:
>> 1. rules: l u c C ?l ?u t TN (p P I may also be impacted). S V are also
>> likely candidates.
>
> Sometimes you really only want a-z (see 2. below) so for ANSI mode, I 
> suggest we keep all the existing as-is and add alternate versions for some 
> or all of them that use the new functions.

I do see your point about CP other than 8859-1.  We need to research this 
out a little more.  The new functionality would add good things to john. We 
just need to add it in a way as to not do any bad.

> In UTF-8 mode, we could add support for (fully) case-shifting whole words 
> but as soon as we try to say "third character" or some such, rules are not 
> UTF-8 aware. I have some vague thoughts about how to add future UTF-8 
> awareness in rules (counting multibyte characters as one) but that is 
> probably far away - and it will be much slower than today so it must be 
> separated so it doesn't hit non-UTF8 mode.

For UTF-8 mode, we should really step back.  First off, using a multi-byte 
format like utf8, is very expensive. Especially in the rules section, where 
you often have to swap to anther format, to the 'work', then swap back in.

I think for this, we should run a pre-process the rule if in -utf8 mode, and 
determine IF conversions are needed.  If all we are doing is appending '123' 
to the tail of the word, then no conversion is needed.   In that case, we 
simply handle the string, as though it was ANSI.

However, if we determine that there is something which would require 
conversions (length, indexof, casing, etc, etc), then I would suggest we 
convert the word into UTC2 (UTF16), and KEEP it that way, and once the rule 
has completed, then convert back into utf8 for processing by the format. It 
would be 'nice' if we could have some rule that says to leave the word in 
the already converted UTF16 (vs converting back to utf8 to later be 
converted back into UTF16), before passing it into the format. However, that 
would likely take some modifiations to the format (possibly new function 
pointers, or different params to the existing functions).

I am not looking at making changes right now.  I am more looking at finding 
out WHAT parts of john deal with casing (or lengths, indexes, etc when 
dealing with dictionary input words), and just what can be done to improve 
the exiting word handling/modification which john does.  That 
handling/manipulation is one of the CORE reasons why john is such a great 
tool.  Often, john is not the fastest tool out there (in 'raw' speed), but 
often it is THE BEST, and cracking passwords, because the correct candidate 
can be presented sooner in the cracking session.   So, if we can find the 
locations where we can make this tool better, and find good ways to exploit 
that, while not causing slowdowns for any existing workflow, then that what 
I would love to look into.

Jim.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.