Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fbbbbd752646fa12110822483cd523cb@smtp.hushmail.com>
Date: Wed, 21 Oct 2015 21:50:25 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: locale

On 2015-10-21 20:07, Solar Designer wrote:
> On Wed, Oct 21, 2015 at 09:21:27AM +0200, magnum wrote:
>> I'm now checking/setting locale (if autoconf says I can) and fall back
>> to skipping the degree sign. Let me know if it misbehaves.
>
> Looking at these changes, I realize that my idea was probably bad:
> initializing the locale with setlocale() affects lots of things,
> including the ctype macros.  With some cracking modes, this might affect
> what candidate passwords they generate.  IIRC, we avoided using the
> ctype macros in our wordlist rules engine, but now that I grep e.g. for
> "islower", I find uses in dynamic_compiler.c, jumbo.c, mask.c.

I wasn't aware of these uses and we should replace them. Actually, the 
one in mask.c is kind of correct: It's for case-toggling the base word 
in hybrid mode, and just being able to do so with ASCII is a limitation. 
But we must honor our encoding options, not the terminal locale.

> While we might later choose to add initializing locale to JtR for other
> reasons, I think DEGREE_SIGN alone isn't a sufficient reason, and if we
> do add locale support, we should make it consistent: initialize it all
> the time and do so early on, and not only do it for OpenCL and CUDA
> formats like the current code does.

I agree that introducing a locale for the degree sign alone is overkill. 
I was just moving slowly: I actually had some vague idea that the 
arguable UTF-8 defaults (just the parts that affect screen output, in 
particular the "AlwaysReportUTF8 = Y") could be made depending on 
locale. But maybe we should back away from setlocale instead, at least 
for now.

> For now, maybe we should in fact check env vars explicitly to decide on
> DEGREE_SIGN.
>
> A maybe acceptable hack (for jumbo) is to do something like:
>
> 	setlocale(LC_ALL, "");
> 	... check for UTF-8 here ...
> 	setlocale(LC_CTYPE, "C");
>
> so that ctype macros are unaffected by the current locale (since our
> uses of them appear to be of the kind where we prefer consistency over
> customization; arguably, this means they are misuses).  But we'll need
> to do it all the time, and early on, to ensure consistent behavior
> regardless of whether an OpenCL or CUDA format is run.
>
> Also, the current checks for strchr(setlocale(LC_ALL, NULL), '.') do not
> tell us whether the locale is UTF-8 or not.  We'll need to do better.

The current implementation is not limited to UTF-8, it will also get you 
a proper degree sign for legacy codepages like ISO-8859-*, CP* or 
KOI8-R. For this to work I can't reset it back to C, and checking for 
UTF-8 is irrelevant (the current check for '.' is mostly a check for 
'neither "C" nor "POSIX" but some complete "aa_BB.CCCC" setting').

Anyway, you point out potential problems I did not realize. I think I'll 
just drop the use of setlocale for now but I'll sleep on it.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.