Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141204000434.GN4574@brightrain.aerifal.cx>
Date: Wed, 3 Dec 2014 19:04:34 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: getopt_long permutation algorithm questions

On Wed, Dec 03, 2014 at 09:47:53PM +0100, Jens Gustedt wrote:
> Am Mittwoch, den 03.12.2014, 14:56 -0500 schrieb Rich Felker:
> > On Wed, Dec 03, 2014 at 08:43:09PM +0100, Jens Gustedt wrote:
> > > sounds all a bit complicated and fragile to me.  getopt should neither
> > > be performance nor memory critical, so there is not much need for
> > > optimization here, no?
> > > 
> > > Why don't you just keep track of the cases in an array on the stack,
> > > and then do all the moves after the processing in a second scan? You
> > > have argc, so you know the size of a VLA (`char[argc]` should suffice)
> > > that you would have to define.
> > 
> > I'm not trying to optimize anything for performance here, just get it
> > correct and simple. I don't quite understand what you're proposing,
> 
> so I have to explain better
> 
> > but doing a multi-pass,
> 
> actually two-pass
> 
> > out-of-place operation does not sound simple,
> 
> but in place
> 
> > and it's not robust since the kernel allows huge argument lists
> > whereby char[argc] would overflow the stack (and even if it didn't,
> > assuming that it doesn't allow them would be an unwarranted assumption
> > unless it actually bought us some simplicity, which I don't see).
> 
> My idea was just to do a first pass for the "real" argument processing
> and to note during that pass if an argument wasn't used. Then in a
> second pass reorder argv with that information.

I don't think you want to pre-process the whole argument list, for
various reasons. The GNU version, as I understand, does argument
permutation as it goes based on the state of the opt* vars rather than
permuting everything at once, and presumably some callers could rely
on this, for example as a way of conditionally suppressing the
permutation by incrementing optind rather than calling getopt when
optind points to a non-option argument.

It's also just a more complicated design and not compatible with the
existing design/implementation of getopt[_long] to preprocess
everything.

> For the storage of that information, this is one bit per argc, and if
> we waste one byte for it to have life simple, then `char[argc]` should
> suffice. argc can be greater than several K ?

My understanding is that the kernel imposes no hard limit, aside from
resource limits, on modern kernels... Yes this is a nasty and
completely unnecessary DoS vector.

> But thinking of it:
> 
> > > Am Mittwoch, den 03.12.2014, 14:29 -0500 schrieb Rich Felker:
> > > > As part of resolving the rest of the dist-local changes Alpine is
> > > > applying to musl, I'm trying to figure out how to add GNU-style
> > > > argument permutation to getopt_long. The basic concept is simple: when
> > > > a non-option argument is encountered, skip forward until the next
> > > > option (argument beginning with '-') and move it (and possibly its
> > > > argument) before the non-option arguments. However, there are some
> > > > ugly corner cases like:
> > > > 
> > > > arg1 -ab foo arg2
> > > > 
> > > > where 'a' and 'b' are options, and 'b' takes an argument, foo. Here it
> > > > seems like, in order to perform the correct permutation, lookahead is
> > > > required to see that foo also needs to be moved. Is this correct?
> 
> I wouldn't call it "lookahead".
> 
> Why don't you
> 
>  - have k as the position where the next option argument should be
>    placed and p as the next position that has not been processed yet
>  - move p such that argv[p] is the next option argument
>  - process argv[p] in place, including an eventual argument
>    argv[p+1] that a long option or the last option character takes
>  - shuffle the found argv[p] (plus eventually argv[p+1]) to the
>    correct position argv[k] (and argv[k+1])
>  - increment k and p by 1 (or 2)
>  - iterate
> 
> The disadvantage of that general approach is that it can be
> quadratic. For very long argument lists, if you have argc more than
> several K, this could be prohibitive.

This is pretty close to what I described, except I avoided the
"in-place" processing which doesn't work because of examples like the
one I gave, where you would have to process more than one option at a
time, then rewind to return just the first one. The process I
described avoids the need for that rewinding. Note that part of the
complexity here in "processing ahead and then rewinding" is that the
internal state of getopt[_long] is not encapsulated but exists in
global variables that the caller can access.

By quadratic time, I assume you meant for processing the whole
argument list. The algorithm I described is linear-time in the number
of consecutive misordered non-option arguments for a single call to
getopt_long, and thus quadratic when you call it repeatedly. This
could be avoided by saving additional hidden state, but I suspect the
practical gains are small and the potential for incorrect behavior
exists if the argument list is modified between calls.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.