Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130217183944.GF20323@brightrain.aerifal.cx>
Date: Sun, 17 Feb 2013 13:39:44 -0500
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: strverscmp

Hey, sorry it's taken me so long to get to reading and reviewing this
well. Here are some comments, mostly stylistic, but one's about a
corner-case bug. See below. With that said, the algorithm looks right
to me, and I think we can commit soon.

Rich


On Fri, Feb 01, 2013 at 08:04:52PM -0800, Isaac Dunham wrote:
> On Fri, 1 Feb 2013 21:38:33 -0500
> Rich Felker <dalias@...ifal.cx> wrote:
> 
> > > Fix strverscmp (patch same as the last time I sent it)
> > 
> > I'm not sure whether we got it into a fully-working state or not; the
> > conversation kinda died out last time. I'll review it again too. I
> > remember it didn't look quite like the algorithm I described/proposed,
> > but that doesn't mean it's wrong. It looked like it could at least use
> > some streamlining though.
> 
> The last review was just before I got it working. Here's the final version.
> 
> Probably somewhere it could be optimized, though...
> As long as it doesn't end up looking like the GNU one.
> 
> -- 
> Isaac Dunham <idunham@...abit.com>

> diff --git a/src/string/strverscmp.c b/src/string/strverscmp.c
> index 7054967..8f3f11f 100644
> --- a/src/string/strverscmp.c
> +++ b/src/string/strverscmp.c
> @@ -1,7 +1,41 @@
> +#define _GNU_SOURCE
> +#include <ctype.h>
>  #include <string.h>
>  
>  int strverscmp(const char *l, const char *r)
>  {
> -	/* FIXME */
> -	return strcmp(l, r);
> +	int haszero=1;
> +	while (*l && *r && l[0]==r[0]){

I used to do a lot of while loops like this, but now I feel like it
makes it hard to follow the end-of-string logic -- a lot more code
runs after the null terminator is hit, and it's nontrivial to follow
through it all and make sure the right thing happens, as the
subsequent code after the loop is now handling two very-distinct
loop-exit conditions:

1. Loop exited due to mismatch.
2. Loop exited due to hitting end-of-string with no mismatch.

What about instead doing something like this:

	while (*l==*r) {
		if (!*l) return 0;
		/* ... */

That way, the "equal strings" case is trivially correct. It also might
generate better code.

Note that I also made a couple stylistic changes here; unless you
object, I prefer *p to p[0] when p is just being used as a moving
pointer within a string, always accessed with offset 0. It's just more
concise/less cluttered and avoids suggesting it's used as a string in
itself. Also added a space before the open brace, for consistency with
style elsewhere.


> +		if (l[0]=='0'){
> +			if (haszero==1) {
> +				haszero=0;
> +			}
> +		} else if (isdigit(l[0])) {
> +			if (haszero==1) {
> +				haszero=2;
> +			}
> +		} else {
> +			haszero=1;
> +		}
> +		l++; r++;
> +	}
> +	if (haszero==1 && (l[0]=='0' || r[0]=='0')) {
> +		haszero=0;
> +	}
> +	if ((isdigit(l[0]) && isdigit(r[0]) ) && haszero) {
> +		int lenl=0, lenr=0, firstl=l[0], firstr=r[0];

These variables have the wrong types. int is not sufficient to store a
character count in a string. You need size_t.

> +		while (isdigit(l++[0]) ) {
> +			lenl++;
> +		}
> +		while (isdigit(r++[0]) ) {
> +			lenr++;
> +		}
> +		if (lenl==lenr) {
> +			return (firstl -  firstr);
> +		} else {
> +			return (lenl - lenr);
> +		}

I see what's going on here, but I find saving the first mismatching
characters (btw, you could have just saved the difference) then
continuing to advance the l/r pointers a bit confusing at first. Why
not instead do:

		while (isdigit(l[lenl])) lenl++;

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.