|
Message-ID: <20220803224342.GF1320090@port70.net> Date: Thu, 4 Aug 2022 00:43:42 +0200 From: Szabolcs Nagy <nsz@...t70.net> To: Mike Beattie <mike@...ernal.org> Cc: musl@...ts.openwall.com Subject: Re: Bug: BOL/EOL anchors in regex capture groups won't match EOL * Mike Beattie <mike@...ernal.org> [2022-07-21 18:08:19 +1200]: > FRRouting uses musl-libc in its docker container build, and it also appears > to be in use in the GNS3 appliances for frr available online. > > BGP as-path matching is regex powered, and usage of a special token of '_' > allows for the easy matching of the boundary of an ASN in an as-path. > Internally, it's translated into the regex capture group of: > > (^|[,{}() ]|$) > > A valid as-path is a sequence of integers such as: > > 100 200 300 > > A BGP as-path filter might be specified as so: > > bgp as-path access-list foo seq 20 permit _300_ > > which would get expanded to: > > (^|[,{}() ]|$)300(^|[,{}() ]|$) > > when checking for a match. The usage of the pattern "(^|$)" in musl's regex > implementation will never match EOL, but it does match BOL. Removal of the > circumflex will let the match succeed. thanks for the report. it seems to me regcomp does not handle assertions corretly if there is a union (|) of multiple subexpressions that match the empty string. it simply takes the assertion of the leftmost subexpression so e.g. '(|$)a' matches 'a' but '($|)a' does not because it matches as '$a' and the $ assertion fail. since posix does not allow (| empty pattern in the syntax a conforming example is e.g. '(b*|$)a' vs '($|b*)a' all supported assertions are affected (^, $, \b, \B, \<, \>). the fix is not obvious: there is a regcomp step like tags, assertions = leftmost_empty_match(subexpr) process(tags, assertions) which should be list = all_empty_match(subexpr) for tags, assertions in list: if assertions are weaker than previous ones: process(tags, assertions) i think this can increase storage and computation requirements significantly unless the algorithm is further optimized. > > Here is the output of a test programs I've written to confirm this: > > $ musl-gcc -o r r.c > > $ ./r "_300_" "100 200 300" > regex: (^|[,{}() ]|$)300(^|[,{}() ]|$) > regexec on [100 200 300]: NOT Found > > Removal of "^|" from the beginning of the trailing capture group: > > $ ./r "(^|[,{}() ]|$)300([,{}() ]|$)" "0000 1111 2222" > regex: (^|[,{}() ]|$)300([,{}() ]|$) > regexec on [100 200 300]: Found > > Thanks, > Mike. > -- > Mike Beattie <mike@...ernal.org>
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.