Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130221010328.GN20323@brightrain.aerifal.cx>
Date: Wed, 20 Feb 2013 20:03:28 -0500
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: strcasestr.c

On Wed, Feb 20, 2013 at 11:28:31PM +0100, John Spencer wrote:
> On 02/17/2013 08:04 PM, Rich Felker wrote:
> >On Thu, Feb 14, 2013 at 10:23:49AM -0500, Rich Felker wrote:
> >>On Thu, Feb 14, 2013 at 09:59:56AM -0500, Todd C. Miller wrote:
> >>>When investigating using the musl strstr.c ofr OpenBSD I noticed
> >>>that musl only has a stub for strcasestr() that calls strstr().
> 
> according to git log, this function dates back to the very first
> musl commit ever...

Yes, it seems to have been an early mistake I made getting wget to
work. Unfortunately, I think it would be bad policy to remove it now
since that would break existing dynamic binaries using it, but one
could make an argument that breaking them is "right" since they were
already broken (not behaving as intended)...

> >Since strcasestr is nonstandard and not clearly specified,
> 
> it's so non-standard that even nobody uses it.
> i looked up the usage of the function in codesearch.debian.net, and
> the only *user* (from all ~20K debian packages) of the function is
> gnu wget.

Are you sure this search was correct? IIRC there were more...

> every other occurence of the function is from gnulib embedded into
> other packages.
> so the current status is: gnulib includes the function, but nobody
> besides gnu wget uses it.
> 
> >Comments from anybody else?
> 
> given the above findings, we can just leave the function as-is (plus
> a comment that nobody uses it anyway) or remove it entirely.

I think leaving it as-is is the worst case. It's impossible to detect
without runtime checks that it's incorrect, so it might encourage
configure scripts to add runtime checks for broken strcasestr that
break cross compiling. Or it might lead to programs just assuming it's
correct, then breaking.

> in any case it doesnt make sense to put much work and especially
> much code into it.
> if it's gonna be implemented "correctly" at all, it should be as
> slim as possible, in the order of 3-5 LOC.

As much as I appreciate Todd's interest in contributing a 2way-based
version, I tend to agree. Not only would adopting the 2way code be a
fairly large code addition for a never-used feature, but it would also
bind us to the choice to do ASCII-only case mapping or drop
performance drastically in the future if we want to change that
decision.

My leaning right now would be to write the naive strstr loop using
strcasecmp instead of strcmp (or an inline loop) for the inner loop.
This will cause strcasestr to have the exact same case-folding
semantics as strcasecmp, whatever those are in the future. This is
best for consistency. Unfortunately, it's very bad from a performance
standpoint, but I don't know of any code using this function for
high-performance use.

The other somewhat reasonable option would be removing the function,
which would expose breakage in programs that were already using the
broken version in musl. I'm mildly against this, but I'd be interested
in hearing arguments either way.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.