|
Message-ID: <20210710002015.GT13220@brightrain.aerifal.cx> Date: Fri, 9 Jul 2021 20:20:15 -0400 From: Rich Felker <dalias@...c.org> To: jason <jason@...omnia247.nl> Cc: musl@...ts.openwall.com Subject: Re: Improvements to qsort On Fri, Jul 09, 2021 at 01:26:28PM +0200, jason wrote: > I've been trying to understand ngmalloc, and it's a struggle, so for > relaxation I had a look at qsort, and have a few suggestions. > > To avoid burying the lead, the following reduces comparisons by > about 4% with no downside: > > --- a/qsort.c > +++ b/qsort.c > @@ -100,18 +100,16 @@ static void sift(unsigned char *head, size_t width, cmpfun cmp, int pshift, size > rt = head - width; > lf = head - width - lp[pshift - 2]; > > - if((*cmp)(ar[0], lf) >= 0 && (*cmp)(ar[0], rt) >= 0) { > - break; > - } > if((*cmp)(lf, rt) >= 0) { > - ar[i++] = lf; > head = lf; > pshift -= 1; > } else { > - ar[i++] = rt; > head = rt; > pshift -= 2; > } > + if((*cmp)(ar[0], head) >= 0) > + break; > + ar[i++] = head; > } > cycle(width, ar, i); > } > > Basically, instead of doing two comparisons to see if ar[0] is greater > than both the left and right children and then comparing the children > with each other to find the greater one, find the greater child and > compare ar[0] with that one. I think this (and addition of comments) are the most worthwhile to look at first. In particular, cmp is a caller-provided function that can be expensive, so reducing the number of calls is very helpful. As an aside, it may be nice to get qsort_r in before making changes so that qsort_r is available for backport without also backporting bleeding edge improvements. > I have several other less-significant changes that I wonder if musl is > interested in: > > * I notice than pntz() is always used in the following context: > trail = pntz(p); > shr(p, trail); > pshift += trail; > both pntz() and shr() have branches conditional on whether the shift > is more or less than 8*sizeof(size_t) bits. I have replaced this > by a single function, which allows the conditional branch to be > shared: > > /* > * Re-normalize p[]. Ignore the low-order bit (which must be set!), > * find the position of the next-least-significant bit, and shift p[] > * down by that much. Return the number of bits shifted. > * > * There must be a penultimate trailing bit; i.e. p[] must not be {1, 0}. > */ > static inline int normalize(size_t p[2]) > { > int s; > size_t t = p[0] - 1; > > if (t == 0) { > t = p[1]; > s = ntz(t); > p[0] = t >> s; > p[1] = 0; > s += 8*sizeof(size_t); > } else { > s = ntz(t); > p[0] = t >> s | p[1] << (8*sizeof(size_t) - s); > p[1] >>= s; > } > return s; > } > > which is used simply as: > pshift += normalize(p); I'm unclear whether it's an improvement open-coding shr rather than just calling it here. Seems like it introduced new duplication of that logic while eliminating other duplication. If you think a change is really better here, it might make sense to factor it as first moving the repeated pntz/shr/pshift+=... to a function without refactoring, then expanding pntz out into it (since pntz is not called anywhere else) in a second change, but without expanding out shr. > * As part of my learning about the code, I have added a lot of comments, > particularly large block comments about the smoothsort algorithm > in general, the meaning of p[] and pindex, what a "stepson" is, > when it's okay to use sift() instead of trinkle(), what the "trusty" > flag means, how cycle() is used, etc. Would these be worth cleaning > up and submitting? This sounds good. If you can submit them, I'll see if myself and the original author can review that they make sense. > * In my private copy, I have changed the type of "trusty" to bool, > a data type I'm quite fond of as it concisely documents the limited > range of values. I notice, however, that musl is completely devoid of > any use of <stdbool.h>, although it uses other C99 features. Is this > a conscious decision? Yes, the _Bool type isn't used. That's just a style choice (mainly because it has some mildly non-obvious behaviors and doesn't really help anything) but one I'd like to remain consistent on. > * Likewise, I notice that most musl code has a space in "keyword (condition)" > like "if (foo != 0)", "while (bar < N)" and "for (i=0; i < N; i++)". > But the qsort() code, with one single exception, omits the space. The qsort code was written with mildly different spacing style here than what I usually use and ask others to use, but I don't think it makes any difference. I'd rather be consistent, but I'd also rather not have "let's gratuitously change spaces" commits. > Likewise, most msul code is quite happy to "if (condition) break;" > on a single line, with no braces, but the original sift() function > above puts the break on a second line and a closing brace on a third. Really there is no strong preference here, except possibly to put it on its own line if it's parallel to an else clause that needs to be on its own line (e.g. for line length), or if there's a special risk of it being overlooked if it's on the same line. Mostly I'd aim for *local consistency* here. There is no compelling reason to need global consistency across the codebase on how this is done. > The musl code base is inconsistent enough that it's hard to infer code > formatting conventions. Are these discrepancies in qsort a problem > worth fixing? No. If someone wrote code for musl with mildly different style but that wasn't deemed offensively bad when it was contributed, there's no need to change it now/later. > * The way the heap-building loop maintains its loop invariants is a trifle > peculiar. It begins with the element at *head included in p[] and pindex, > but excluded from the heap ordering invariants. Then it establishes > the ordering invariants (with sift() or trinkle()) and immediately adds > another (un-ordered) element to p[] and pindex. > > After the main loop exits, a standalone call to trinkle() places the > final element in heap order. > > I have a more conventional (add, then sift) ordering working which > seems to simplify the code: > while(head < high) { > + bool notrinkle = false; > + > + /* Append an element to the heap */ > + head += width; > if((p[0] & 3) == 3) { > - sift(head, width, cmp, pshift, lp); > + /* Two trees of adjacent sizes: merge with head */ > shr(p, 2); > pshift += 2; > - } else { > - if(lp[pshift - 1] >= high - head) { > - trinkle(head, width, cmp, p, pshift, 0, lp); > - } else { > + if((size_t)(high - head) > lp[pshift - 1]) { > sift(head, width, cmp, pshift, lp); > + notrinkle = true; > } > - > - if(pshift == 1) { > - shl(p, 1); > - pshift = 0; > - } else { > - shl(p, pshift - 1); > - pshift = 1; > - } > + } else if(pshift == 1) { > + /* order-1 exists; new tree is order-0 */ > + shl(p, 1); > + pshift = 0; > + notrinkle = (high != head); > + } else { > + /* new tree is order-1 */ > + shl(p, pshift - 1); > + pshift = 1; > + notrinkle = ((size_t)(high - head) > lp[0]); > } > - > p[0] |= 1; > - head += width; > + > + /* Re-establish the heap using trinkle */ > + if(!notrinkle) > + trinkle(head, width, cmp, p, pshift, false, lp); > } > > - trinkle(head, width, cmp, p, pshift, 0, lp); > > If the original ordering is wanted, then the loop initialization > can be changed to start with two elements (p[0] = 3, pindex = 0, > head = base + width), since two elements automatically fulfil the > required loop invariants. > > Does anyone have a preference? I'll need to read this in a lot more detail to comment on it. > * Since the first test in qsort() excludes width == 0, the loop over > width in cycle() can be changed to "do { ... } while (width);". I really doubt this makes any difference. > * The ar[] array in sift() only has to be as large as the lp[] > array in qsort(), not 1/6 larger. > > * The ar[] array in trinkle() only has to be half as large, plus one. These observations may be true, but are very conspicuous high-risk changes from the perspective of someone not familiar with the code. Unless there is a compelling reason I would not touch them. Someone reviewing musl commits for new security bugs (aka code suspicious of being a backdoor) would sink a lot of time into analyzing that sort of change, and I do not want to waste their time. > * The pindex > 1 case in the heap-consuming loop contains some > redundant left-then-right shifting: > shl(p, 2); > pshift -= 2; > p[0] ^= 7; > shr(p, 1); > This can be optimized. I think this has been mentioned before by others, and at the time we were not sure why it was written this way either. It can/should probably be changed. > * There are several cases where variables can be moved into > smaller local scopes, particularly the "stepson", "rt" and "lf" > variables in trinkle(). This is another change I've made privately > as I think it makes the code more reasonable. Worth submitting? Probably not. There is no consistent rule about how this should be done, and as mentioned above i don't like "style conformance" commits unless there's something egregiously wrong. > * GCC emits a couple of signed/unsigned comparison warnings with > -W -Wall. Worth fixing? No, the set of warnings musl targets is in configure. Intentional use of signed/unsigned promotion semantics, including comparison, is used all over musl.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.