|
Message-ID: <20141204000434.GN4574@brightrain.aerifal.cx> Date: Wed, 3 Dec 2014 19:04:34 -0500 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: getopt_long permutation algorithm questions On Wed, Dec 03, 2014 at 09:47:53PM +0100, Jens Gustedt wrote: > Am Mittwoch, den 03.12.2014, 14:56 -0500 schrieb Rich Felker: > > On Wed, Dec 03, 2014 at 08:43:09PM +0100, Jens Gustedt wrote: > > > sounds all a bit complicated and fragile to me. getopt should neither > > > be performance nor memory critical, so there is not much need for > > > optimization here, no? > > > > > > Why don't you just keep track of the cases in an array on the stack, > > > and then do all the moves after the processing in a second scan? You > > > have argc, so you know the size of a VLA (`char[argc]` should suffice) > > > that you would have to define. > > > > I'm not trying to optimize anything for performance here, just get it > > correct and simple. I don't quite understand what you're proposing, > > so I have to explain better > > > but doing a multi-pass, > > actually two-pass > > > out-of-place operation does not sound simple, > > but in place > > > and it's not robust since the kernel allows huge argument lists > > whereby char[argc] would overflow the stack (and even if it didn't, > > assuming that it doesn't allow them would be an unwarranted assumption > > unless it actually bought us some simplicity, which I don't see). > > My idea was just to do a first pass for the "real" argument processing > and to note during that pass if an argument wasn't used. Then in a > second pass reorder argv with that information. I don't think you want to pre-process the whole argument list, for various reasons. The GNU version, as I understand, does argument permutation as it goes based on the state of the opt* vars rather than permuting everything at once, and presumably some callers could rely on this, for example as a way of conditionally suppressing the permutation by incrementing optind rather than calling getopt when optind points to a non-option argument. It's also just a more complicated design and not compatible with the existing design/implementation of getopt[_long] to preprocess everything. > For the storage of that information, this is one bit per argc, and if > we waste one byte for it to have life simple, then `char[argc]` should > suffice. argc can be greater than several K ? My understanding is that the kernel imposes no hard limit, aside from resource limits, on modern kernels... Yes this is a nasty and completely unnecessary DoS vector. > But thinking of it: > > > > Am Mittwoch, den 03.12.2014, 14:29 -0500 schrieb Rich Felker: > > > > As part of resolving the rest of the dist-local changes Alpine is > > > > applying to musl, I'm trying to figure out how to add GNU-style > > > > argument permutation to getopt_long. The basic concept is simple: when > > > > a non-option argument is encountered, skip forward until the next > > > > option (argument beginning with '-') and move it (and possibly its > > > > argument) before the non-option arguments. However, there are some > > > > ugly corner cases like: > > > > > > > > arg1 -ab foo arg2 > > > > > > > > where 'a' and 'b' are options, and 'b' takes an argument, foo. Here it > > > > seems like, in order to perform the correct permutation, lookahead is > > > > required to see that foo also needs to be moved. Is this correct? > > I wouldn't call it "lookahead". > > Why don't you > > - have k as the position where the next option argument should be > placed and p as the next position that has not been processed yet > - move p such that argv[p] is the next option argument > - process argv[p] in place, including an eventual argument > argv[p+1] that a long option or the last option character takes > - shuffle the found argv[p] (plus eventually argv[p+1]) to the > correct position argv[k] (and argv[k+1]) > - increment k and p by 1 (or 2) > - iterate > > The disadvantage of that general approach is that it can be > quadratic. For very long argument lists, if you have argc more than > several K, this could be prohibitive. This is pretty close to what I described, except I avoided the "in-place" processing which doesn't work because of examples like the one I gave, where you would have to process more than one option at a time, then rewind to return just the first one. The process I described avoids the need for that rewinding. Note that part of the complexity here in "processing ahead and then rewinding" is that the internal state of getopt[_long] is not encapsulated but exists in global variables that the caller can access. By quadratic time, I assume you meant for processing the whole argument list. The algorithm I described is linear-time in the number of consecutive misordered non-option arguments for a single call to getopt_long, and thus quadratic when you call it repeatedly. This could be avoided by saving additional hidden state, but I suspect the practical gains are small and the potential for incorrect behavior exists if the argument list is modified between calls. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.