Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <51892098.6090006@mccme.ru>
Date: Tue, 07 May 2013 19:41:12 +0400
From: Alexander Cherepanov <cherepan@...me.ru>
To: john-dev@...ts.openwall.com
Subject: Re: Non-ASCII characters in various files -- core and
 jumbo

On 2013-05-07 18:21, magnum wrote:
>> 2. Jumbo.
>>
>> - There are some places where non-ascii char can easily be eliminated -- patches attached.
>
> As long as they are UTF-8 I do not think they need to be fixed.

They need not be fixed but I think it would be good. I don't see any 
reason to have non-ascii chars in john files (except for names and files 
dealing with encodings).

And for (c) there was commit 6915cfd1d0868ae63d83af426e92e27ec37b6f14 .

> The pass_gen.pl fix is probably good though, for avoiding accidental trashing.

>> - There are multiple names in utf-8. This is probably Ok.
>
> It is the canonical character set nowadays.

I agree.

> If my name had non-ascii characters I would hate having it mangled in some inferior and ambigous encoding.

That's true but I'm not sure how strong this reason is. Would it be 
convenient for others if I write my name in Cyrillic?

>> - There are two strings of lower- and upper-case letters from iso-8859-1 in doc/RULES. They are -- surprise:-) -- in iso-8859-1. IMHO it's better to remove them or to convert the file to utf-8.
>
> Actually, earlier today I converted doc/RULES to UTF-8 independantly of your findings :-)

I see, in bleeding. I've only looked at unstable.

>> - src/encoding_data.h contains many chars in utf-8. It's probably Ok to have one files where all such stuff lives.
>
> Yes, that file is clearly documented as being supposed to be UTF-8.
>
>> - src/rules.c contains several comments with non-ascii chars copied from src/encoding_data.h. Not sure, maybe remove them or non-ascii chars?
>
> Replacing them with some other characters would totally void their meaning :-)
> We could drop them though. I really don't think we need to.

More or less standart approach I think is to write unicode names instead 
of explicit chars. It's impractical for files like src/encoding_data.h 
but it's quite good for just several mentions like in src/rules.c .

> I'll apply some, most, maybe all of your patches.

-- 
Alexander Cherepanov

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.