Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 3 Nov 2020 19:36:59 +0100
From: François <>
To: "" <>
Subject: Re: Rules characters unicode support.

Thanks for the very thorough response magnum!

Francois Pesce

On Tue, Nov 3, 2020 at 7:27 PM magnum <> wrote:

> On 2020-11-03 15:48, François wrote:
> > While running my tool on a very large (and old) leak, I realized that
> some
> > character substitutions from ASCII to Unicode were hitting some results
> (a
> > few hits on a large leak) for example:
> > seé
> > (...)
> > They're making sense, because some old RFC or specs prevent non ASCII
> > characters to be used in email address or login information but passwords
> > fields actually take them now. For example, we could imagine that a
> > password associated to my email address could
> be
> > close to the way my French first name is actually written, thus
> "françois"
> > (possibly generated by a single rule substituting c to ç such as:  scç ).
> >
> > However, it seems that currently, john(-jumbo) does not support Unicode
> > characters for all rules commands (except for the content of command
> A"..."
> > ). Is anyone working on supporting that use case, should I just try to
> use
> > the A"..." command for my niche finding ? What are your thoughts?
> While the Unicode support could be better, there are ways to achieve
> what you need. First of all, we need to tell John what encoding we're
> expecting the hashes to be made from. Nowadays that's usually a
> no-brainer, it use to be UTF-8 and that's also the deafult in john.conf.
> Now if your need would have been eg. CP1252, things would be simpler
> since such legacy codepages are all single-byte: You'd simply write your
> rules such as scç and then be sure to save that config file with CP1252
> encoding. Run with --encoding=cp1252 and all should work just fine.
> With UTF-8 however, things currently aren't quite that easy because the
> rule engine does not (yet) honor multi-byte characters. But we have a
> work-around called --internal-codepage. What this does is we still
> expect UTF-8 input (the hash file, any wordlists) and we still produce
> hashes from an UTF-8 encoded cleartext - but internally within the rule
> engine we run the internal legacy codepage. Just pick any encoding that
> can hold all characters you need to use.
> So let's try it out:
> $ echo francois > words.lst
> $ cat john-local.conf
> [List.Rules:subs]
> seé
> suü
> scç
> $ ./john -stdout -w:words.lst -rules=subs -internal-codepage=cp1252
> Invalid rule in (null) at line 2: Unknown command seé
> We get this error because john-local.conf contains UTF-8. John should
> actually be smarter here and handle that, but we do not yet. So let's
> encode our config file in CP1252 instead:
> $ mv john-local.conf john-local.utf8
> $ iconv -t cp1252 < john-local.utf8 > john-local.conf
> $ ./john -stdout -w:words.lst -rules=subs -internal-codepage=cp1252
> francois
> françois
> Another way of achieving the same is to use \xHH hex encoding. The
> value for "ç" in CP1252 is \xe7 so you'd just write it as sc\xe7 instead
> of scç. This way there's less risk of your editor messing things up in
> the future. This notation can also be handy when specifying a rule
> directly on the command line, like so:
> $ ./john -stdout -w:words.lst -internal-c=cp1252 -ru=':sc\xe7 u'
> As you can see, once you run with an internal codepage things like
> case-shifting (and nearly all other commands and character classes) will
> work for non-ASCII letters as well. We wouldn't want that to end up as
> A final note is you can set DefaultInternalCodepage in your config file,
> saving you from giving the -internal-codepage option every time. I'd
> actually recommend doing so, the default is empty for backwards
> compatibility only.
> magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.