|
Message-ID: <20130221010328.GN20323@brightrain.aerifal.cx> Date: Wed, 20 Feb 2013 20:03:28 -0500 From: Rich Felker <dalias@...ifal.cx> To: musl@...ts.openwall.com Subject: Re: strcasestr.c On Wed, Feb 20, 2013 at 11:28:31PM +0100, John Spencer wrote: > On 02/17/2013 08:04 PM, Rich Felker wrote: > >On Thu, Feb 14, 2013 at 10:23:49AM -0500, Rich Felker wrote: > >>On Thu, Feb 14, 2013 at 09:59:56AM -0500, Todd C. Miller wrote: > >>>When investigating using the musl strstr.c ofr OpenBSD I noticed > >>>that musl only has a stub for strcasestr() that calls strstr(). > > according to git log, this function dates back to the very first > musl commit ever... Yes, it seems to have been an early mistake I made getting wget to work. Unfortunately, I think it would be bad policy to remove it now since that would break existing dynamic binaries using it, but one could make an argument that breaking them is "right" since they were already broken (not behaving as intended)... > >Since strcasestr is nonstandard and not clearly specified, > > it's so non-standard that even nobody uses it. > i looked up the usage of the function in codesearch.debian.net, and > the only *user* (from all ~20K debian packages) of the function is > gnu wget. Are you sure this search was correct? IIRC there were more... > every other occurence of the function is from gnulib embedded into > other packages. > so the current status is: gnulib includes the function, but nobody > besides gnu wget uses it. > > >Comments from anybody else? > > given the above findings, we can just leave the function as-is (plus > a comment that nobody uses it anyway) or remove it entirely. I think leaving it as-is is the worst case. It's impossible to detect without runtime checks that it's incorrect, so it might encourage configure scripts to add runtime checks for broken strcasestr that break cross compiling. Or it might lead to programs just assuming it's correct, then breaking. > in any case it doesnt make sense to put much work and especially > much code into it. > if it's gonna be implemented "correctly" at all, it should be as > slim as possible, in the order of 3-5 LOC. As much as I appreciate Todd's interest in contributing a 2way-based version, I tend to agree. Not only would adopting the 2way code be a fairly large code addition for a never-used feature, but it would also bind us to the choice to do ASCII-only case mapping or drop performance drastically in the future if we want to change that decision. My leaning right now would be to write the naive strstr loop using strcasecmp instead of strcmp (or an inline loop) for the inner loop. This will cause strcasestr to have the exact same case-folding semantics as strcasecmp, whatever those are in the future. This is best for consistency. Unfortunately, it's very bad from a performance standpoint, but I don't know of any code using this function for high-performance use. The other somewhat reasonable option would be removing the function, which would expose breakage in programs that were already using the broken version in musl. I'm mildly against this, but I'd be interested in hearing arguments either way. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.