|
Message-ID: <20130717053811.GA1719@newbook> Date: Tue, 16 Jul 2013 22:38:11 -0700 From: Isaac <idunham@...abit.com> To: musl@...ts.openwall.com Subject: Re: regex libs (was Re: embedded newbies site.) On Tue, Jul 16, 2013 at 11:32:20AM -0400, Rich Felker wrote: > On Tue, Jul 16, 2013 at 11:10:34AM -0400, LM wrote: > > On Tue, Jul 16, 2013 at 10:00 AM, Rich Felker <dalias@...ifal.cx> wrote: > > > > > The whole concept of regular expressions is that they're regular, > > > meaning they're matchable in O(n) time with O(1) space. PCRE (the > > > implementation) uses backtracking for everything, giving it > > > exponentially-bad performance (JIT cannot fix this), and PCRE (the > > > language) has a lot of features that are fundamentally not regular and > > > thus can't be implemented efficiently. Also, the behavior of some of > > > the features (e.g. greedy vs non-greedy matching) were not designed > > > intentionally but just arose out of the backtracking implementation, > > > and thus don't make a lot of sense unless you think from the > > > standpoint of such an implementation. > > > > > > > Went back and rechecked the documentation ( > > http://www.pcre.org/readme.txt). You're both right, PCRE is offering > > the Perl regular expressions > > implementation even when one uses the pcreposix interface. Would have been > > nice if they offered actual regular expressions handling if you only want > > to use the POSIX compatible part of the interface. The pcreposix stuff is only for those who want perl regexes with an existing POSIX codebase. > > So what are some good regex library solutions? I'm also wondering if there > > POSIX systems provide the regcomp and regexec api as part of libc. On > musl, glibc, uclibc, and (afaik) all the bsd libcs, this gives you a > good implementation. However musl's is the only one I know of with no > conformance bugs, but the conformance bugs in the others are isolated > to obscure subexpression match issues. > > > are some good cross-platform portable library solutions (or if PCRE is the > > best pattern matching solution from a portability standpoint even if it's > > not strictly regex compatible). > > TRE, on which musl's is based, is the closest to being good, in my > opinion. It has optimal big-O but fairly suboptimal constant. However > it's unmaintained and has at least two bugs that aren't fixed > upstream, one of which may be exploitable if an untrusted party can > supply arbitrary regular expressions, and the other which just causes > some POSIX BRE expressions (the legacy style used by sed and non-e > grep) to be misinterpreted. Have these been mentioned to NetBSD? I ask because they use TRE for their libc (the relicense happened at their request). Thanks, Isaac Dunham
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.