john-dev - Re: Upper casing (and lower casing) in john

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20110715222545.GD6277@openwall.com>
Date: Sat, 16 Jul 2011 02:25:45 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Upper casing (and lower casing) in john

Jim -

On Thu, Jul 14, 2011 at 10:18:36AM -0500, JimF wrote:
> What logic within john is there for casing, (up/down, etc).  From my 
> knowledge, there is:

You pretty much found all those places.

> 1. rules:  l u c C ?l  ?u t TN    (p P I may also be impacted).  S V are 
> also likely candidates.
> 
> 2. Formats (but these are one by one issues which need to be addressed 
> directly).  Oracle/mssql have been handled.  LM has not, but by my 
> understanding, what we have done already is the 'correct' method.
> 
> Now, what about external ??

Not in the external mode support code itself, but there's this sample:

[List.External:Filter_LanMan]
void filter()
{
	int i, c;

	word[7] = 0;                    // Truncate at 7 characters

	i = 0;                          // Convert to uppercase
	while (c = word[i]) {
		if (c >= 'a' && c <= 'z') word[i] &= 0xDF;
		i++;
	}
}

which the documentation suggests for re-generating lanman.chr.

There's also a check in inc.c for mixed-case .chr files.

> I do not think there is case conversion in 
> there now, but is this something we 'should' add, toupper and tolower type 
> functions?

There are currently no function calls from external mode at all,
although this (function calls in general) is something I've been
considering adding.

> Is there anything needing looked at within the pre-processor code?

No.

> Are there other places where letter case, or changing case is required 
> within john?

There are some case-insensitive comparisons, but I see no need for them
to be non-ASCII-aware.  (They're not of passwords.)

> The reason I ask, is there are now valid toupper/tolower character macros, 
> which will properly up/down all 8 bit ANSI characters properly (with a 
> couple of caveats).   We should be able to make most of these changes, with 
> no impact in performance.  However, we now will crack a LOT more hashes if 
> they have many of the European accent/umlaut type characters.  I believe 
> that the changes to do this should be pretty easy, depending upon how the 
> original code (especially in rules), is put together.  I have not looked 
> 'yet', so I am not sure.

I am not sure that this will crack a lot more hashes as you say.  In
certain cases, it will just waste more time, because e.g. for a Russian
word in whatever non-utf-8 Cyrillic encoding rules like:

:
u Q

would try the word just once now, whereas if you change the "u" to do
upcasing of 8-bit chars for iso-8859-1 (or whatever non-Cyrillic
encoding), the second rule would actually try some weird password that
is unlikely to result in a match against any of the hashes.

When implementing rules.c in 1996-1998, I briefly thought of using the
system's ctype macros and initializing locale, but decided not to,
opting to limit the rule commands' case conversions to ASCII and thus to
consistent behavior across systems.

Perhaps it's time for a change now, but I am not sure what change is the
right one.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.