john-users - Re: Markov phrases in john

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <33b86fe5cce1011e19188732b13eae58@smtp.hushmail.com>
Date: Thu, 5 Dec 2024 09:53:37 +0100
From: magnum <magnumripper@...hmail.com>
To: john-users@...ts.openwall.com
Subject: Re: Markov phrases in john

On 2024-12-05 02:45, Solar Designer wrote:
> As to multi-byte strings that are somehow special in UTF-8 (you show
> "\u2028" and "\u0085"), you could exclude (skip in the loop above) their
> individual bytes such as 0xc2 and 0xe2 (if I got these right).  You'd
> also need to decrease $maxtok further to 126.

U+2028 shouldn't special in any way but it will look like crap if your 
terminal font can't show it (which is likely). U+0085 is indeed special.

I'm not sure I understand the mentioned change of that script but if you 
want to exclude all UTF-8 first bytes, they are 0xc2, 0xe0, 0xe2, 0xe8 
and 0xf0 and decrease $maxtok to 123. With those five excluded, the 
tokenizer should never produce anything that can be parsed as valid UTF-8.

Also, Matt mentions using LC_CTYPE=C, perhaps LC_ALL=C is more effective?

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.