![]() |
|
Message-ID: <20250629000843.GA19826@brightrain.aerifal.cx> Date: Sat, 28 Jun 2025 20:08:43 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Paintcans for reverse iterating strings One thing we're going to need for LC_COLLATE in locales where second-level weights are applied in reverse order (diacritic marks later in the string weigh more than earlier ones) is the ability to traverse (& live transform to NFD) the input string in reverse. In order not to take quadratic time doing this, we need a strategy for laying down a logarithmic (thus bounded by a small constant) number of waypoints ("paintcans") we can resume forward-processing at to simulate reverse-order random access. I'm posting here a proposal for comments on whether this seems fairly optimal or has any obvious improvements: Maintain an array of up to N waypoints, initially populated just by the halfway point. When requesting a position, first remove any waypoints past the requested position from the array, then start from the last remaining waypoint and work forward. When crossing the halfway point between the last waypoint and the desired position, if the waypoint array is not already full, add the halfway position as a waypoint. With unlimited N, I'm pretty sure this gives O(N log N) total traversal time. In a finite real-machine model, N=8*sizeof(size_t) then gives the same. And if we know ahead of time that the size of the input is way less than the maximum possible length (note: total length is known from a previous pass), we can pre-populate linear-spaced early waypoints or space them differently (like 1/4 instead of 1/2 the way to the target) so that less time is spent backtracking (I haven't yet thought much about which is best). Note that N=64 is no problem in terms of storage space; it's under 4k. And the majority would not even get used/touched except on pathologically long inputs to strcoll/strxfrm, and only in the case of a locale with reversed second-level weights. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.