|
Message-ID: <20130406052121.GA20915@brightrain.aerifal.cx> Date: Sat, 6 Apr 2013 01:21:21 -0400 From: Rich Felker <dalias@...ifal.cx> To: musl@...ts.openwall.com Subject: multibyte performance findings Hi all, I've been examining performance in the multibyte conversion functions (as part of the POSIX locale controversy), and have some interesting findings so far: 1. Performance of mbrtowc seems to be very sensitive to the compiler's code generation. Even adding code in untaken branches can drastically slow down or speed up the overall runtime. In one case, adding a dummy conditional to mimic locale-dependent encoding actually made the test run faster. I think this means that before we can draw any conclusions we need to figure out what's causing the compiler to behave to wackily, and whether the code can be restructured in such a way that its performance is less vulnerable to the whims of the compiler. 2. Implementing mbtowc (the old non-restartable function) as a wrapper for mbrtowc is a bad idea. The interface contract of mbrtowc forces it to be much slower than desirable; mbtowc's simpler interface can in theory give much better performance, and based on my first rewrite of mbtowc, the difference is big -- around 40% faster than the equivalent mbrtowc calls, and over 50% faster than the wrapper-based mbtowc. This means all musl-internal use of mbrtowc should probably be replaced by mbtowc, or perhaps even an internal-use-only function with a better interface. 3. A significant amount of time is "wasted" checking that the size n of the input buffer is not exceeded when reading; removing the checks speeds up mbtowc by 10%. As such, it might be desirable to break the function into two cases: n>=4 (in which case no further length checks are needed anywhere) and n<4 (in which case, each additional read needs a check). Alternately, for mbtowc, perhaps there's a quick and easy way to check the length against the state mask. I'm probably going to go ahead and commit some changes that seem to be clear wins in the above areas, but there's definitely room for discussion. If anybody's interested in poking around at what's going on with the optimizer or testing these functions heavily on other cpu variants, let me know. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.