|
Message-ID: <20150717213521.GD1173@brightrain.aerifal.cx> Date: Fri, 17 Jul 2015 17:35:22 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: Left-shift of negative number On Fri, Jul 17, 2015 at 09:02:59PM +0200, Jens Gustedt wrote: > Effectively, the C standard at the place that you cite doesn't define > a behavior for such shifts of negative values. But this doesn't mean > that a particular implementation of a C compiler or the C library > (here musl) can't define a behavior for that. musl does not assume GCC behavior like this, so the code indeed is wrong and should be fixed. > What worries me more than the shift of a negative value, is that this > code is erroneous if `int` is only 16 bit wide. Whereas we can > reasonably assume that a shift of a negative value in two's complement > is the same as an unsigned shift, compilers tend to produce just crap > if the shift exceeds the width. > > So I would feel much more comfortable if we'd use UINT32_C(0x40) > inside the R macro. The entire internal API here uses the type unsigned for character codes and state, so like the rest of musl there is an assumption (guaranteed by POSIX) that int is at least 32-bit. Since the UTF-8/multibyte code is written to be largely self-contained and independent of musl, we could look into enhancing the code to be portable to systems with 16-bit int, but I suspect this would be rather useless in practice. If we did that, we would need to use something ugly like uint_least32_t rather than uint32_t to gain any portability since the latter need not even exist. There are also aliasing issues with using a type different than 'unsigned' for the decoding state since mbstate_t's members are unsigned. So at least at this time I'd really rather not pursue this further. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.