|
Message-ID: <55CD84F6.3060501@mailbox.org> Date: Fri, 14 Aug 2015 08:04:38 +0200 From: Frank Dittrich <frank.dittrich@...lbox.org> To: john-dev@...ts.openwall.com Subject: Re: episerver UTF-8 On 08/14/2015 03:07 AM, jfoug@....net wrote: > On Thu, 13 Aug 2015 19:35:57 -0500, Lei Zhang <zhanglei.april@...il.com> > wrote: >> BTW, I think 3*PLAINTEXT_LENGTH means that we assume > > Yes, this is an 'assumption' > >> each UTF8 char to be no larger than 3 bytes. Is that assumption true? >> Or 4-byte UTF8 chars are too rare to be considered? > > In real world, they are somewhat rare. But your point is valid. There > could certainly be a string of X 4 byte utf8 (there are even 5 byte utf8 > characters) which cause something that should handle 25 characters to > not be able to handle a string of 25 4 (or 5) byte utf8. But we simply > have drawn a line in the sand where reality vs theoretical limits come > into play. For applications that use UTF-16 with surrogates internally, the above assumption is OK. If you enter characters that require more than tree bytes when converted to utf-8, the max. number of characters will be reduced accordingly. Frank
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.