Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BLU0-SMTP137504CB99B336B43D9B6ABFD260@phx.gbl>
Date: Sun, 6 Jan 2013 11:32:07 +0100
From: Frank Dittrich <frank_dittrich@...mail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Markov UTF-8 magic

Hi magnum,

I wasn't fully awake (not enough coffee) when I sent my previous mail.
I hope you can still parse most of it.

Creating a really good UTF-8 validity checker is even somewhat more
complicated, since you have to exclude illegal overlong sequences as
well as invalid Unicode code points.

See the discussion here (just one example):
http://stackoverflow.com/questions/1031645/how-to-detect-utf-8-in-plain-c

BTW: Here's a perl expression which checks for valid UTF-8, just in case
we'll need one:
http://www.w3.org/International/questions/qa-forms-utf-8

May be we should google for a well-tested free C implementation which we
can use.

Frank

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.