Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210709112629.024302221844@gateway02.insomnia247.nl>
Date: Fri,  9 Jul 2021 13:26:28 +0200 (CEST)
From: jason <jason@...omnia247.nl>
To: musl@...ts.openwall.com
Cc: jason@...omnia247.nl
Subject: Improvements to qsort

I've been trying to understand ngmalloc, and it's a struggle, so for
relaxation I had a look at qsort, and have a few suggestions.

To avoid burying the lead, the following reduces comparisons by
about 4% with no downside:

--- a/qsort.c
+++ b/qsort.c
@@ -100,18 +100,16 @@ static void sift(unsigned char *head, size_t width, cmpfun cmp, int pshift, size
 		rt = head - width;
 		lf = head - width - lp[pshift - 2];
 
-		if((*cmp)(ar[0], lf) >= 0 && (*cmp)(ar[0], rt) >= 0) {
-			break;
-		}
 		if((*cmp)(lf, rt) >= 0) {
-			ar[i++] = lf;
 			head = lf;
 			pshift -= 1;
 		} else {
-			ar[i++] = rt;
 			head = rt;
 			pshift -= 2;
 		}
+		if((*cmp)(ar[0], head) >= 0)
+			break;
+		ar[i++] = head;
 	}
 	cycle(width, ar, i);
 }

Basically, instead of doing two comparisons to see if ar[0] is greater
than both the left and right children and then comparing the children
with each other to find the greater one, find the greater child and
compare ar[0] with that one.

I have several other less-significant changes that I wonder if musl is
interested in:

* I notice than pntz() is always used in the following context:
	trail = pntz(p);
	shr(p, trail);
	pshift += trail;
  both pntz() and shr() have branches conditional on whether the shift
  is more or less than 8*sizeof(size_t) bits.  I have replaced this
  by a single function, which allows the conditional branch to be
  shared:

/*
 * Re-normalize p[].  Ignore the low-order bit (which must be set!),
 * find the position of the next-least-significant bit, and shift p[]
 * down by that much.  Return the number of bits shifted.
 *
 * There must be a penultimate trailing bit; i.e. p[] must not be {1, 0}.
 */
static inline int normalize(size_t p[2])
{
	int s;
	size_t t = p[0] - 1;

	if (t == 0) {
		t = p[1];
		s = ntz(t);
		p[0] = t >> s;
		p[1] = 0;
		s += 8*sizeof(size_t);
	} else {
		s = ntz(t);
		p[0] = t >> s | p[1] << (8*sizeof(size_t) - s);
		p[1] >>= s;
	}
	return s;
}

  which is used simply as:
	pshift += normalize(p);

* As part of my learning about the code, I have added a lot of comments,
  particularly large block comments about the smoothsort algorithm
  in general, the meaning of p[] and pindex, what a "stepson" is,
  when it's okay to use sift() instead of trinkle(), what the "trusty"
  flag means, how cycle() is used, etc.  Would these be worth cleaning
  up and submitting?

* In my private copy, I have changed the type of "trusty" to bool,
  a data type I'm quite fond of as it concisely documents the limited
  range of values.  I notice, however, that musl is completely devoid of
  any use of <stdbool.h>, although it uses other C99 features.  Is this
  a conscious decision?

* Likewise, I notice that most musl code has a space in "keyword (condition)"
  like "if (foo != 0)", "while (bar < N)" and "for (i=0; i < N; i++)".
  But the qsort() code, with one single exception, omits the space.

  Likewise, most msul code is quite happy to "if (condition) break;"
  on a single line, with no braces, but the original sift() function
  above puts the break on a second line and a closing brace on a third.

  The musl code base is inconsistent enough that it's hard to infer code
  formatting conventions.  Are these discrepancies in qsort a problem
  worth fixing?

* The way the heap-building loop maintains its loop invariants is a trifle
  peculiar.  It begins with the element at *head included in p[] and pindex,
  but excluded from the heap ordering invariants.  Then it establishes
  the ordering invariants (with sift() or trinkle()) and immediately adds
  another (un-ordered) element to p[] and pindex.

  After the main loop exits, a standalone call to trinkle() places the
  final element in heap order.

  I have a more conventional (add, then sift) ordering working which
  seems to simplify the code:
 	while(head < high) {
+		bool notrinkle = false;
+
+		/* Append an element to the heap */
+		head += width;
 		if((p[0] & 3) == 3) {
-			sift(head, width, cmp, pshift, lp);
+			/* Two trees of adjacent sizes: merge with head */
 			shr(p, 2);
 			pshift += 2;
-		} else {
-			if(lp[pshift - 1] >= high - head) {
-				trinkle(head, width, cmp, p, pshift, 0, lp);
-			} else {
+			if((size_t)(high - head) > lp[pshift - 1]) {
 				sift(head, width, cmp, pshift, lp);
+				notrinkle = true;
 			}
-			
-			if(pshift == 1) {
-				shl(p, 1);
-				pshift = 0;
-			} else {
-				shl(p, pshift - 1);
-				pshift = 1;
-			}
+		} else if(pshift == 1) {
+			/* order-1 exists; new tree is order-0 */
+			shl(p, 1);
+			pshift = 0;
+			notrinkle = (high != head);
+		} else {
+			/* new tree is order-1 */
+			shl(p, pshift - 1);
+			pshift = 1;
+			notrinkle = ((size_t)(high - head) > lp[0]);
 		}
-		
 		p[0] |= 1;
-		head += width;
+
+		/* Re-establish the heap using trinkle */
+		if(!notrinkle)
+			trinkle(head, width, cmp, p, pshift, false, lp);
 	}
 
-	trinkle(head, width, cmp, p, pshift, 0, lp);

  If the original ordering is wanted, then the loop initialization
  can be changed to start with two elements (p[0] = 3, pindex = 0,
  head = base + width), since two elements automatically fulfil the
  required loop invariants.

  Does anyone have a preference?

* Since the first test in qsort() excludes width == 0, the loop over
  width in cycle() can be changed to "do { ... } while (width);".

* The ar[] array in sift() only has to be as large as the lp[]
  array in qsort(), not 1/6 larger.

* The ar[] array in trinkle() only has to be half as large, plus one.

* The pindex > 1 case in the heap-consuming loop contains some
  redundant left-then-right shifting:
			shl(p, 2);
			pshift -= 2;
			p[0] ^= 7;
			shr(p, 1);
  This can be optimized.

* There are several cases where variables can be moved into
  smaller local scopes, particularly the "stepson", "rt" and "lf"
  variables in trinkle().  This is another change I've made privately
  as I think it makes the code more reasonable.  Worth submitting?

* GCC emits a couple of signed/unsigned comparison warnings with
  -W -Wall.  Worth fixing?

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.