Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260412161743.GA1827@brightrain.aerifal.cx>
Date: Sun, 12 Apr 2026 12:17:43 -0400
From: Rich Felker <dalias@...c.org>
To: Luca Kellermann <mailto.luca.kellermann@...il.com>
Cc: musl@...ts.openwall.com
Subject: Re: [PATCH] scandir: fix qsort usage

On Sun, Apr 12, 2026 at 11:24:45AM -0400, Rich Felker wrote:
> On Sun, Apr 12, 2026 at 11:17:22AM -0400, Rich Felker wrote:
> > On Sun, Apr 12, 2026 at 01:43:08PM +0200, Luca Kellermann wrote:
> > > On Fri, Apr 10, 2026 at 01:30:27PM -0400, Rich Felker wrote:
> > > > On Fri, Feb 27, 2026 at 09:17:15PM +0100, Luca Kellermann wrote:
> > > > > calling qsort() with a pointer to a function whose type is not
> > > > > compatible with int(const void *, const void *) results in UB because
> > > > > qsort() would call this function with an incompatible type.
> > > > > 
> > > > > avoid this by using qsort_r(). this is similar to how qsort() is
> > > > > implemented on top of qsort_r().
> > > > > 
> > > > > the casts to void * in wrapper_cmp() are required because scandir()
> > > > > takes int (*)(const struct dirent **, const struct dirent **) and not
> > > > > int (*)(const struct dirent *const *, const struct dirent *const *).
> > > > > ---
> > > > > Since you haven't posted a patch yet, I went ahead and fixed this. The
> > > > > patch is based on the scandir() patch series I submitted in July.
> > > > > 
> > > > >  src/dirent/scandir.c | 12 +++++++++---
> > > > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/src/dirent/scandir.c b/src/dirent/scandir.c
> > > > > index 9e4e623f..993e0c8b 100644
> > > > > --- a/src/dirent/scandir.c
> > > > > +++ b/src/dirent/scandir.c
> > > > > @@ -7,9 +7,15 @@
> > > > >  #include <errno.h>
> > > > >  #include <stddef.h>
> > > > >  
> > > > > +typedef int (*cmpfun)(const struct dirent **, const struct dirent **);
> > > > > +
> > > > > +static int wrapper_cmp(const void *de1, const void *de2, void *cmp)
> > > > > +{
> > > > > +	return ((cmpfun)cmp)((void *)de1, (void *)de2);
> > > > > +}
> > > > > +
> > > > 
> > > > I was going to ask you what the (void *) casts are for, but now I see
> > > > it. The scandir comparison function was specified wrong and takes as
> > > > const T ** rather than a T *const * like it should.
> > > 
> > > I anticipated that this might raise some questions. That's why I tried
> > > to explain it in the commit message :)
> > > 
> > > > Unfortunately this also reveals an aliasing violation elsewhere. The
> > > > lvalue type the caller's comparison function will be using to access
> > > > elements of the names[] array are of type const struct dirent *, but
> > > > the actual elements of the names[] array have type struct dirent *. I
> > > > can prepare a separate patch to address this.
> > > 
> > > I'm curious what fix you have in mind for this. The only thing I can
> > > think of is to change the type of the names array to const struct
> > > dirent ** and after sorting copy it back to a new array of type struct
> > > dirent ** that can then be stored in *res.
> > 
> > Uhg, I missed that the array returned in *res mismatches the type of
> > the array to be sorted; I was thinking we could just adjust it to
> > const. Indeed copying would be gratuitously awful. Maybe this is
> > fixable just with some union punning. Or.. wrapper_cmp could just copy
> > the pointer values into local vars and pass the addresses of those
> > local vars to the cmp function. This is probably the most correct.
> 
> As in (untested but compiles without warnings):
> 
> static int wrapper_cmp(const void *de1, const void *de2, void *cmp)
> {
> 	const struct dirent *cd1 = *(struct dirent *const *)de1;
> 	const struct dirent *cd2 = *(struct dirent *const *)de2;
> 	return ((int (*)(const struct dirent **, const struct dirent **))cmp)(&cd1, &cd2);
> }
> 
> This is fully const-correct and lacks aliasing violations.

OK, digging into this more (thanks Haelwenn on Mastodon for help
finding this stuff):

Historically, glibc and the BSDs did not have these weird comparson
and function types with double-pointers. They did the right thing,
having the comparison function just take a pointer to the dirent,
which was rightly const-qualified to signify that the comparison
function is not to modify the data being sorted.

This was changed by glibc in commit
eee6b1432794967d4272394dfed1e2b5cca4be39 to align with the newly
redesigned-by-committee POSIX-2008 signature, which has the const
qualification in the wrong place. My only guess is that they made this
change in order to have it be "possible to implement with qsort by
casting the function pointer", which is gross UB.

Everything we're doing to fix this now is working around gratuitous
breakage the committee introduced to avoid just standardizing the
historical practice that was right but that required a qsort_r or
similar to implement.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.