musl - Re: getopt_long permutation algorithm questions

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141204000434.GN4574@brightrain.aerifal.cx>
Date: Wed, 3 Dec 2014 19:04:34 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: getopt_long permutation algorithm questions

On Wed, Dec 03, 2014 at 09:47:53PM +0100, Jens Gustedt wrote:
> Am Mittwoch, den 03.12.2014, 14:56 -0500 schrieb Rich Felker:
> > On Wed, Dec 03, 2014 at 08:43:09PM +0100, Jens Gustedt wrote:
> > > sounds all a bit complicated and fragile to me.  getopt should neither
> > > be performance nor memory critical, so there is not much need for
> > > optimization here, no?
> > > 
> > > Why don't you just keep track of the cases in an array on the stack,
> > > and then do all the moves after the processing in a second scan? You
> > > have argc, so you know the size of a VLA (`char[argc]` should suffice)
> > > that you would have to define.
> > 
> > I'm not trying to optimize anything for performance here, just get it
> > correct and simple. I don't quite understand what you're proposing,
> 
> so I have to explain better
> 
> > but doing a multi-pass,
> 
> actually two-pass
> 
> > out-of-place operation does not sound simple,
> 
> but in place
> 
> > and it's not robust since the kernel allows huge argument lists
> > whereby char[argc] would overflow the stack (and even if it didn't,
> > assuming that it doesn't allow them would be an unwarranted assumption
> > unless it actually bought us some simplicity, which I don't see).
> 
> My idea was just to do a first pass for the "real" argument processing
> and to note during that pass if an argument wasn't used. Then in a
> second pass reorder argv with that information.

I don't think you want to pre-process the whole argument list, for
various reasons. The GNU version, as I understand, does argument
permutation as it goes based on the state of the opt* vars rather than
permuting everything at once, and presumably some callers could rely
on this, for example as a way of conditionally suppressing the
permutation by incrementing optind rather than calling getopt when
optind points to a non-option argument.

It's also just a more complicated design and not compatible with the
existing design/implementation of getopt[_long] to preprocess
everything.

> For the storage of that information, this is one bit per argc, and if
> we waste one byte for it to have life simple, then `char[argc]` should
> suffice. argc can be greater than several K ?

My understanding is that the kernel imposes no hard limit, aside from
resource limits, on modern kernels... Yes this is a nasty and
completely unnecessary DoS vector.

> But thinking of it:
> 
> > > Am Mittwoch, den 03.12.2014, 14:29 -0500 schrieb Rich Felker:
> > > > As part of resolving the rest of the dist-local changes Alpine is
> > > > applying to musl, I'm trying to figure out how to add GNU-style
> > > > argument permutation to getopt_long. The basic concept is simple: when
> > > > a non-option argument is encountered, skip forward until the next
> > > > option (argument beginning with '-') and move it (and possibly its
> > > > argument) before the non-option arguments. However, there are some
> > > > ugly corner cases like:
> > > > 
> > > > arg1 -ab foo arg2
> > > > 
> > > > where 'a' and 'b' are options, and 'b' takes an argument, foo. Here it
> > > > seems like, in order to perform the correct permutation, lookahead is
> > > > required to see that foo also needs to be moved. Is this correct?
> 
> I wouldn't call it "lookahead".
> 
> Why don't you
> 
>  - have k as the position where the next option argument should be
>    placed and p as the next position that has not been processed yet
>  - move p such that argv[p] is the next option argument
>  - process argv[p] in place, including an eventual argument
>    argv[p+1] that a long option or the last option character takes
>  - shuffle the found argv[p] (plus eventually argv[p+1]) to the
>    correct position argv[k] (and argv[k+1])
>  - increment k and p by 1 (or 2)
>  - iterate
> 
> The disadvantage of that general approach is that it can be
> quadratic. For very long argument lists, if you have argc more than
> several K, this could be prohibitive.

This is pretty close to what I described, except I avoided the
"in-place" processing which doesn't work because of examples like the
one I gave, where you would have to process more than one option at a
time, then rewind to return just the first one. The process I
described avoids the need for that rewinding. Note that part of the
complexity here in "processing ahead and then rewinding" is that the
internal state of getopt[_long] is not encapsulated but exists in
global variables that the caller can access.

By quadratic time, I assume you meant for processing the whole
argument list. The algorithm I described is linear-time in the number
of consecutive misordered non-option arguments for a single call to
getopt_long, and thus quadratic when you call it repeatedly. This
could be avoided by saving additional hidden state, but I suspect the
practical gains are small and the potential for incorrect behavior
exists if the argument list is modified between calls.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.