Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 27 May 2015 22:10:45 +0200
From: magnum <john.magnum@...hmail.com>
To: john-users@...ts.openwall.com
Subject: Re: Bleeding jumbo now defaults to UTF-8

On 2015-05-27 10:11, Albert Veli wrote:
> Is there a recommended way to convert existing john.pot files to utf-8?
>
> I tried:
>
>   iconv -f ISO-8859-1 -t UTF-8 john.pot > john2.pot
>
> and it seems to work, but I am not really sure if it will "remember" all
> cracked hashes or if some hashes will be encoded wrong. Ie those that
> were not in iso-8859-1 encoding from the beginning. Chinese characters
> and so on.

It depends a lot on what you have in it. An alternative to the above is

	./cprepair -p john.pot > john3.pot

A huge difference is that cprepair will quite reliably detect any lines 
that is UTF-8 already, and will leave them as-is. Also, the -p option 
will make it never touch the hash (anything up to first ":"). It will 
also fix erroneously double-encoded lines very reliably.

---8<--------------8<--------------8<-----------
$ ../run/cprepair -h
Codepage repair (c) magnum 2014
Usage: ../run/cprepair [options] [file] [...]

Options:
  -i <cp>   Codepage to use for 8-bit input
  -f <cp>   Alternate codepage when no ASCII letters (a-z, A-Z) seen
  -n        Do not guess (leave 8-bit as-is)
  -s        Suppress lines that does not need fixing.
  -l        List supported encodings.
  -d        Debug (show conversions).
  -p        Only convert stuff after first ':' (.pot file).

Code pages default to CP1252 (MS Latin-1).
Double-conversions are handled automatically.
UTF-8 BOMs are stripped with no mercy. They should never be used, ever.

---8<--------------8<--------------8<-----------

You can run "./cprepair -s -d -p john.pot" to get an idea of what will 
be converted. The "-s -d" will make it suppress lines that need no 
conversion, and print conversions as "old -> new" like this:

$NT$062de529e54e31079861ec97d666a44f:m?ller -> 
$NT$062de529e54e31079861ec97d666a44f:müller

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.