Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121206021438.GP20323@brightrain.aerifal.cx>
Date: Wed, 5 Dec 2012 21:14:38 -0500
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: Fix strverscmp

On Wed, Dec 05, 2012 at 08:00:00PM -0500, Rich Felker wrote:
> > This should not be equal, but not for the reason I'd expected: a
> > leading 0 is supposed to be interpreted as "0.0". Decimal points are
> > not factored in...
> 
> My understanding of the code is that, after it hits the first place
> where the strings differ, it switches to reading a digit string as a
> decimal number, and the result is the difference of those decimal
> numbers. I just used the decimal point as an example because it
> terminates the loop.
> 
> I also don't understand what you mean about leading zero. If leading
> zeros are not considered equal to the number without leading zeros,
> then a simple algorithm would be to, on the first mismatching byte,
> remember the difference, then count the number of consecutive digits
> starting at that position in both strings. If this count is the same
> (possibly zero) for both strings, return the saved byte difference. If
> the count is different, consider the string with fewer digits to be
> less than the one with more digits. This is trivial to implement with
> no arithmetic, but I'm not sure it matches the original semantics.

OK, I read the "spec" in the man page, and this seems to be the
simplest implementation:

1. Find the first mismatching bytes. During this search, keep 1 bit of
   state that's set when a '0' byte is encountered now following a
   digit, and cleared when any non-digit byte is encountered.

2. Upon finding the first mismatching bytes, if either is a non-digit
   or '0' or if above-described flag is set, return the difference of
   the mismatching bytes.

3. Otherwise, count the number of consecutive digits beginning with
   the first mismatching bytes. If the count differs for the two
   strings, the string with the lower count compares less. Otherwise,
   return the difference of the first mismatching bytes.

The rules in step 2 cover both the leading-zeros rule and non-numeric
cases. The rules in step 3 cover the simple numeric comparison rule
that the longer number is greater, with ties broken by comparing the
most-significant position that differs.

Does this sound correct to you? I like the concept of the flag in the
first phase, which we could easily extend to treat other cases as
non-numeric if desired.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.