|
Message-ID: <BLU0-SMTP137504CB99B336B43D9B6ABFD260@phx.gbl> Date: Sun, 6 Jan 2013 11:32:07 +0100 From: Frank Dittrich <frank_dittrich@...mail.com> To: john-dev@...ts.openwall.com Subject: Re: Markov UTF-8 magic Hi magnum, I wasn't fully awake (not enough coffee) when I sent my previous mail. I hope you can still parse most of it. Creating a really good UTF-8 validity checker is even somewhat more complicated, since you have to exclude illegal overlong sequences as well as invalid Unicode code points. See the discussion here (just one example): http://stackoverflow.com/questions/1031645/how-to-detect-utf-8-in-plain-c BTW: Here's a perl expression which checks for valid UTF-8, just in case we'll need one: http://www.w3.org/International/questions/qa-forms-utf-8 May be we should google for a well-tested free C implementation which we can use. Frank
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.