|
|
Message-ID: <ad4tAkFzAsVvozQy@mail.gmail.com>
Date: Tue, 14 Apr 2026 14:03:14 +0200
From: Luca Kellermann <mailto.luca.kellermann@...il.com>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: [PATCH] scandir: fix qsort usage
On Sun, Apr 12, 2026 at 06:20:29PM -0400, Rich Felker wrote:
> On Sun, Apr 12, 2026 at 11:21:58PM +0200, Luca Kellermann wrote:
> > On Sun, Apr 12, 2026 at 12:17:43PM -0400, Rich Felker wrote:
> > > On Sun, Apr 12, 2026 at 11:24:45AM -0400, Rich Felker wrote:
> > > > On Sun, Apr 12, 2026 at 11:17:22AM -0400, Rich Felker wrote:
> > > > > On Sun, Apr 12, 2026 at 01:43:08PM +0200, Luca Kellermann wrote:
> > > > > > On Fri, Apr 10, 2026 at 01:30:27PM -0400, Rich Felker wrote:
> > > > > > > On Fri, Feb 27, 2026 at 09:17:15PM +0100, Luca Kellermann wrote:
> > > > > > > > calling qsort() with a pointer to a function whose type is not
> > > > > > > > compatible with int(const void *, const void *) results in UB because
> > > > > > > > qsort() would call this function with an incompatible type.
> > > > > > > >
> > > > > > > > avoid this by using qsort_r(). this is similar to how qsort() is
> > > > > > > > implemented on top of qsort_r().
> > > > > > > >
> > > > > > > > the casts to void * in wrapper_cmp() are required because scandir()
> > > > > > > > takes int (*)(const struct dirent **, const struct dirent **) and not
> > > > > > > > int (*)(const struct dirent *const *, const struct dirent *const *).
> > > > > > > > ---
> > > > > > > > Since you haven't posted a patch yet, I went ahead and fixed this. The
> > > > > > > > patch is based on the scandir() patch series I submitted in July.
> > > > > > > >
> > > > > > > > src/dirent/scandir.c | 12 +++++++++---
> > > > > > > > 1 file changed, 9 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/src/dirent/scandir.c b/src/dirent/scandir.c
> > > > > > > > index 9e4e623f..993e0c8b 100644
> > > > > > > > --- a/src/dirent/scandir.c
> > > > > > > > +++ b/src/dirent/scandir.c
> > > > > > > > @@ -7,9 +7,15 @@
> > > > > > > > #include <errno.h>
> > > > > > > > #include <stddef.h>
> > > > > > > >
> > > > > > > > +typedef int (*cmpfun)(const struct dirent **, const struct dirent **);
> > > > > > > > +
> > > > > > > > +static int wrapper_cmp(const void *de1, const void *de2, void *cmp)
> > > > > > > > +{
> > > > > > > > + return ((cmpfun)cmp)((void *)de1, (void *)de2);
> > > > > > > > +}
> > > > > > > > +
> > > > > > >
> > > > > > > I was going to ask you what the (void *) casts are for, but now I see
> > > > > > > it. The scandir comparison function was specified wrong and takes as
> > > > > > > const T ** rather than a T *const * like it should.
> > > > > >
> > > > > > I anticipated that this might raise some questions. That's why I tried
> > > > > > to explain it in the commit message :)
> > > > > >
> > > > > > > Unfortunately this also reveals an aliasing violation elsewhere. The
> > > > > > > lvalue type the caller's comparison function will be using to access
> > > > > > > elements of the names[] array are of type const struct dirent *, but
> > > > > > > the actual elements of the names[] array have type struct dirent *. I
> > > > > > > can prepare a separate patch to address this.
> > > > > >
> > > > > > I'm curious what fix you have in mind for this. The only thing I can
> > > > > > think of is to change the type of the names array to const struct
> > > > > > dirent ** and after sorting copy it back to a new array of type struct
> > > > > > dirent ** that can then be stored in *res.
> > > > >
> > > > > Uhg, I missed that the array returned in *res mismatches the type of
> > > > > the array to be sorted; I was thinking we could just adjust it to
> > > > > const. Indeed copying would be gratuitously awful. Maybe this is
> > > > > fixable just with some union punning. Or.. wrapper_cmp could just copy
> > > > > the pointer values into local vars and pass the addresses of those
> > > > > local vars to the cmp function. This is probably the most correct.
> > > >
> > > > As in (untested but compiles without warnings):
> > > >
> > > > static int wrapper_cmp(const void *de1, const void *de2, void *cmp)
> > > > {
> > > > const struct dirent *cd1 = *(struct dirent *const *)de1;
> > > > const struct dirent *cd2 = *(struct dirent *const *)de2;
> > > > return ((int (*)(const struct dirent **, const struct dirent **))cmp)(&cd1, &cd2);
> > > > }
> > > >
> > > > This is fully const-correct and lacks aliasing violations.
> > >
> > > OK, digging into this more (thanks Haelwenn on Mastodon for help
> > > finding this stuff):
> > >
> > > Historically, glibc and the BSDs did not have these weird comparson
> > > and function types with double-pointers. They did the right thing,
> > > having the comparison function just take a pointer to the dirent,
> > > which was rightly const-qualified to signify that the comparison
> > > function is not to modify the data being sorted.
> > >
> > > This was changed by glibc in commit
> > > eee6b1432794967d4272394dfed1e2b5cca4be39 to align with the newly
> > > redesigned-by-committee POSIX-2008 signature, which has the const
> > > qualification in the wrong place. My only guess is that they made this
> > > change in order to have it be "possible to implement with qsort by
> > > casting the function pointer", which is gross UB.
> >
> > At least for glibc, I think this is not entirely true. Before the
> > mentioned commit, the declaration in the public header was equivalent
> > to:
> >
> > int scandir(const char *restrict, struct dirent ***restrict,
> > int (*)(const struct dirent *),
> > int (*)(const void *, const void *));
> >
> > (note that POSIX never specified those restrict qualifiers but glibc
> > and the Linux man page still have them today)
> >
> > The comparison function was directly passed to qsort() to sort an
> > array of type struct dirent **. Because the comparison function for
> > scandir() had the same type as qsort(), this was totally fine and free
> > of UB. The comparison function still had to deal with double-pointers
> > though. It also had to take care of the type conversion itself (which
> > glibc's alphasort() got wrong).
>
> Oh, ok. That disagrees with the old man page I saw which I assumed was
> accurately documenting old glibc:
>
> https://manned.org/man/debian-potato/scandir
>
> Maybe this was different at some point in distant history, or maybe
> the documentation was just wrong?
The oldest versions of scandir() I could find from glibc [1] and the
BSDs [2] (>30 years old) both pass double-pointers to the comparison
function. I doubt the interface was changed twice (to direct pointers
and then back to double-pointers) since then.
I guess the documentation was just wrong.
Luca
[1] https://sourceware.org/git/?p=glibc.git;a=blob;f=dirent/scandir.c;hb=28f540f45bbacd939bfd07f213bcad2bf730b1bf
[2] https://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/gen/scandir.c?rev=1.1;content-type=text%2Fx-cvsweb-markup
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.