Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8ea32632c1c8be83853b95088bf67112@smtp.hushmail.com>
Date: Sun, 6 Jan 2013 13:10:02 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Markov UTF-8 magic

On 6 Jan, 2013, at 11:32 , Frank Dittrich <frank_dittrich@...mail.com> wrote:
> Creating a really good UTF-8 validity checker is even somewhat more
> complicated, since you have to exclude illegal overlong sequences as
> well as invalid Unicode code points.
> 
> See the discussion here (just one example):
> http://stackoverflow.com/questions/1031645/how-to-detect-utf-8-in-plain-c
> 
> BTW: Here's a perl expression which checks for valid UTF-8, just in case
> we'll need one:
> http://www.w3.org/International/questions/qa-forms-utf-8
> 
> May be we should google for a well-tested free C implementation which we
> can use.

I'm pretty sure the original lib I got our Unicode support from had a validity checker, I'll have a look at that. It's pretty trivial but if we try to invent the wheel we'll probably end up overlooking something.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.