Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230525132557.GI4163@brightrain.aerifal.cx>
Date: Thu, 25 May 2023 09:25:57 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: getopt_long() can corrupt argv when an argument for a
 short option is missing

On Thu, May 25, 2023 at 10:53:09AM +0300, Alexey Izbyshev wrote:
> POSIX requires getopt() to set optind to argc + 1 in case of a
> missing argument[1], and musl follows it. This bites getopt_long()
> (which reuses getopt()) in two ways:
> 
> * getopt_long() moves argv[optind - 1] (NULL) when permuting argv to
> make all options precede other arguments, essentially corrupting
> argv.
> 
> * even when permuting is not required, getopt_long() is both
> incompatible with glibc (which doesn't increment optind past NULL)
> and inconsistent with itself (for a long option with a missing
> argument, musl doesn't increment optind past NULL too).
> 
> Example of the wrong NULL shifting:
> 
> #include <getopt.h>
> #include <stdio.h>
> 
> int main(int argc, char *argv[]) {
>     for (int i = 0; i < 2; i++) {
>         int r = getopt_long(argc, argv, "o:", NULL, NULL);
>         printf("r: %d\n", r);
>         printf("optind: %d\n", optind);
>         for (int i = 0; i <= argc; i++)
>             printf("%d: '%s'\n", i, argv[i]);
>     }
> }
> 
> With glibc:
> $ ./a.out arg -o
> ../a.out: option requires an argument -- 'o'
> r: 63
> optind: 3
> 0: './a.out'
> 1: 'arg'
> 2: '-o'
> 3: '(null)'
> r: -1
> optind: 2
> 0: './a.out'
> 1: '-o'
> 2: 'arg'
> 3: '(null)'
> 
> (Note that glibc permutes argv *before* parsing then next option,
> and even before comparing optind and argc, so argv is still permuted
> on the second invocation.)
> 
> With musl:
> $ ./a.out arg -o
> ../a.out: option requires an argument: o
> r: 63
> optind: 3
> 0: './a.out'
> 1: '-o'
> 2: '(null)'
> 3: 'arg'
> r: -1
> optind: 3
> 0: './a.out'
> 1: '-o'
> 2: '(null)'
> 3: 'arg'
> 
> Maybe we could just skip permuting and adjust optind if we detected
> a missing argument?
> 
>         resumed = optind;
>         ret = __getopt_long_core(argc, argv, optstring, longopts,
> idx, longonly);
> +       if (optind > argc)
> +               return optind--, ret;
>         if (resumed > skipped) {
> 
> On a subsequent invocation we won't permute, unlike glibc, but maybe
> this is a good thing, given that such permutation makes it look like
> there is no missing argument, essentially changing the command
> semantics.
> 
> Alexey
> 
> [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html

OK, this is indeed a mess. I think there's some inherent inconsistency
here, and in general the application should not be calling getopt*
again after a missing argument error, but argv[] should not be
clobbered and the application might semi-legitimately want to do
something with remaining non-option arguments.

Just leaving optind indexing the end of the argv array is probably not
nice. It loses all information about where non-option arguments
started.

I think there are two "kinda reasonable" options aside from what you
proposed: 

1. We could leave optind where it was on invocation (so that it points
   to the first non-option arg and not do any permutation. This will
   make subsequent calls to getopt_long repeat the same error over and
   over, but if the caller does not attempt further calls, would tell
   the caller the start of the non-option args. However, the final
   option with missing argument would also appear in this list.

2. We could permute the option with missing argument before the
   remaining non-option args. I think this gives a final ordering
   matching glibc, and lets the application see all of the non-option
   args, without gratuitously including the option with missing arg.
   However, it does produce a result that re-running getopt_long from
   the start would misinterpret that option as having had an argument
   (repurposing the first non-option arg as its arg). Since glibc does
   this, though, apparently it's expected.

My leaning is to do option 2. I think it's as easy as getting rid of
the return part of your patch:

+       if (optind > argc)
+               optind--;

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.