|
Message-ID: <8ea32632c1c8be83853b95088bf67112@smtp.hushmail.com> Date: Sun, 6 Jan 2013 13:10:02 +0100 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: Markov UTF-8 magic On 6 Jan, 2013, at 11:32 , Frank Dittrich <frank_dittrich@...mail.com> wrote: > Creating a really good UTF-8 validity checker is even somewhat more > complicated, since you have to exclude illegal overlong sequences as > well as invalid Unicode code points. > > See the discussion here (just one example): > http://stackoverflow.com/questions/1031645/how-to-detect-utf-8-in-plain-c > > BTW: Here's a perl expression which checks for valid UTF-8, just in case > we'll need one: > http://www.w3.org/International/questions/qa-forms-utf-8 > > May be we should google for a well-tested free C implementation which we > can use. I'm pretty sure the original lib I got our Unicode support from had a validity checker, I'll have a look at that. It's pretty trivial but if we try to invent the wheel we'll probably end up overlooking something. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.