|
Message-ID: <CAMGgT5B-z+3uZpAaz-J4FVUn=z8CYPC=+sh3ZbpkP2fQtLpNew@mail.gmail.com> Date: Tue, 3 Nov 2020 19:36:59 +0100 From: François <francois.pesce@...il.com> To: "john-users@...ts.openwall.com" <john-users@...ts.openwall.com> Subject: Re: Rules characters unicode support. Thanks for the very thorough response magnum! Francois Pesce On Tue, Nov 3, 2020 at 7:27 PM magnum <john.magnum@...hmail.com> wrote: > On 2020-11-03 15:48, François wrote: > > While running my tool on a very large (and old) leak, I realized that > some > > character substitutions from ASCII to Unicode were hitting some results > (a > > few hits on a large leak) for example: > > seé > > (...) > > They're making sense, because some old RFC or specs prevent non ASCII > > characters to be used in email address or login information but passwords > > fields actually take them now. For example, we could imagine that a > > password associated to my email address francois.pesce@...il.com could > be > > close to the way my French first name is actually written, thus > "françois" > > (possibly generated by a single rule substituting c to ç such as: scç ). > > > > However, it seems that currently, john(-jumbo) does not support Unicode > > characters for all rules commands (except for the content of command > A"..." > > ). Is anyone working on supporting that use case, should I just try to > use > > the A"..." command for my niche finding ? What are your thoughts? > > While the Unicode support could be better, there are ways to achieve > what you need. First of all, we need to tell John what encoding we're > expecting the hashes to be made from. Nowadays that's usually a > no-brainer, it use to be UTF-8 and that's also the deafult in john.conf. > > Now if your need would have been eg. CP1252, things would be simpler > since such legacy codepages are all single-byte: You'd simply write your > rules such as scç and then be sure to save that config file with CP1252 > encoding. Run with --encoding=cp1252 and all should work just fine. > > With UTF-8 however, things currently aren't quite that easy because the > rule engine does not (yet) honor multi-byte characters. But we have a > work-around called --internal-codepage. What this does is we still > expect UTF-8 input (the hash file, any wordlists) and we still produce > hashes from an UTF-8 encoded cleartext - but internally within the rule > engine we run the internal legacy codepage. Just pick any encoding that > can hold all characters you need to use. > > So let's try it out: > > $ echo francois > words.lst > > $ cat john-local.conf > [List.Rules:subs] > seé > suü > scç > > $ ./john -stdout -w:words.lst -rules=subs -internal-codepage=cp1252 > Invalid rule in (null) at line 2: Unknown command seé > > We get this error because john-local.conf contains UTF-8. John should > actually be smarter here and handle that, but we do not yet. So let's > encode our config file in CP1252 instead: > > $ mv john-local.conf john-local.utf8 > $ iconv -t cp1252 < john-local.utf8 > john-local.conf > > $ ./john -stdout -w:words.lst -rules=subs -internal-codepage=cp1252 > francois > françois > > Another way of achieving the same is to use \xHH hex encoding. The > value for "ç" in CP1252 is \xe7 so you'd just write it as sc\xe7 instead > of scç. This way there's less risk of your editor messing things up in > the future. This notation can also be handy when specifying a rule > directly on the command line, like so: > > $ ./john -stdout -w:words.lst -internal-c=cp1252 -ru=':sc\xe7 u' > FRANÇOIS > > As you can see, once you run with an internal codepage things like > case-shifting (and nearly all other commands and character classes) will > work for non-ASCII letters as well. We wouldn't want that to end up as > FRANçOIS. > > A final note is you can set DefaultInternalCodepage in your config file, > saving you from giving the -internal-codepage option every time. I'd > actually recommend doing so, the default is empty for backwards > compatibility only. > > magnum > >
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.