Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100227185412.GA21345@openwall.com>
Date: Sat, 27 Feb 2010 21:54:12 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Encoding UTF-8 .pot fails

On Sat, Feb 27, 2010 at 01:07:10PM +0100, websiteaccess@...il.com wrote:
>  My system is OS X (latest), I run JTR 1.7.5 in my terminal (setting 
> with UTF-8 encoding), my wordlist is UTF-8 encoding too. ALL is UTF-8
> 
> At the begining of the crack session JTR store passwords found in the 
> .pot file encoded UTF-8.
> After 1 hour of cracking (-rules), my .pot is no more UTF-8 !

You must be wrong about some of the statements you made above, but I
can't guess which one(s) are wrong.

Since JtR is not aware of different character encodings, it can't
possibly switch from one encoding to another.

To make matters worse (as it relates to the list members helping you),
when you post your sample passwords to the list they might be getting
recoded from one character encoding to another, maybe even more than
once.  It depends on many programs that you use to get the passwords
into an e-mail message draft, to edit it, and to create and send the
message.  All of this might be transparent to you (like a copy & paste),
yet many programs are involved.

> --- passwords found at the begining of my session --
> bären
[...]

As far as I can tell, these display correctly when interpreted as
iso-8859-1 (not UTF-8), but that might be an effect of the way you
placed them into an e-mail message and sent it.  Indeed, your e-mail
message was sent in iso-8859-1 (according to its headers), so you
couldn't correctly include UTF-8 characters in it (the recipients' mail
readers would misinterpret those because the characters would be
inconsistent with your message headers).

> --- after 1 hour, all same passwords cracked before, are now unreadable --
> bären

These look like UTF-8.  They won't display correctly in your e-mail message
for the reason I mentioned above.

>  I have to reencode my .pot in UTF-8 to restore all passwords correctly.

What do you mean by "reencoding" your .pot?  What exactly are you doing?

>  This problem was also with previous version of JTR.

The problem has nothing to do with JtR itself.

>  What is the problem ?

The problem is that there are too many issues involved.  Character
encodings is a complicated topic.  I realize that this is not how you
intended your question to be interpreted, but at least it's a correct
answer and one that I think can actually help (albeit not directly).

To debug the actual problem, I suggest that you try viewing hex dumps of
your files - the wordlist, the .pot file.  One thing this will tell you
is that the encoding of existing entries of the .pot file obviously does
not change as JtR is running (so your guess/statement that it did was
wrong).  It might also help you figure out where/what the problem is.

You may try commands like:

hexdump -C john.pot | less
xxd john.pot | less
od -tx1 john.pot | less

(press "q" to quit the "less" viewer).

You may post relevant excerpts from the hex dumps.  This will avoid the
uncertainty associated with possible recoding of characters when you
place them in an e-mail message.

One thing you could want to check is whether your terminal is still set
to UTF-8 when it stops displaying john.pot contents "correctly" (the way
you want).  Maybe there's something that makes it switch to a different
character encoding - e.g., a terminal control sequence, or a sequence of
bytes that is not valid per UTF-8.  Just a guess (maybe a wrong one).

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.