john-users - Re: Bleeding jumbo now defaults to UTF-8

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9d0b580410269c4c18758a2cd3eea697@smtp.hushmail.com>
Date: Wed, 22 Jul 2015 18:23:39 +0200
From: magnum <john.magnum@...hmail.com>
To: john-users@...ts.openwall.com
Subject: Re: Bleeding jumbo now defaults to UTF-8

On 2015-07-22 16:34, Marek Wrzosek wrote:
> What is the one - proper way to use --inc=utf8 in new bleeding-jumbo?
> I mean, which encoding option we should use - --input-encoding=utf-8,
> --target-encoding=utf-8, --internal-encoding=utf-8 or just
> --encoding=utf-8. Because none seems to work in case of --inc=utf8.
> For --inc=latin1 --target-encoding=cp1252 is mandatory for pot file
> to be utf-8 only and not mixed with other encodings.

The thing that mandates what encoding to use is what actual encoding 
was used by the system producing the hashes in the first place. If it's 
UCS-2/UTF-16 (eg. NT or MSSQL) you can use any encoding but if not, you 
*need* to tell JtR about what -target-enc to use (unless it's your default).

After the above is established: Will you give your input in some *other* 
encoding that your target (or default) encoding? In case of incremental 
mode that would not make any sense: You must use an incremental mode 
that corresponds with your encoding (any other approach would be slow). 
So instead of -target-encoding, just use -enc (a.k.a -input-enc) instead 
and do not specify any -target-enc (or set it same, that's the default).

Now, if you targeted old web hashes and picked -enc=latin1, you can use 
-inc=latin1. The default is -inc=ascii so it will always work, but 
things like "-inc=utf8 -enc=latin1" will definitely produce garbage.

-internal-encoding does not apply to incremental mode. It's only used in 
case of "utf8 wordlist -> rules -> utf8/16 hashes" and for "mask mode -> 
utf8/16 hashes" (if your mask contains non-ascii).

> PS. Without any encoding options there are characters that are not from
> utf-8. The same with --enc=raw. Is there a bug with utf8 incremental
> mode after defaulting to utf-8?

Incremental mode was not written with multi-byte charsets like UTF-8 in 
mind, so will sometimes produce some worthless invalid characters. You 
can add "-ext:filter_utf8" to filter them out but for fast formats it's 
better to just ignore them: The filter is much slower than the waste it 
mitigates.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.