Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130115134244.GW20323@brightrain.aerifal.cx>
Date: Tue, 15 Jan 2013 08:42:44 -0500
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: REG_STARTEND (regex)

On Tue, Jan 15, 2013 at 11:34:59AM +0100, Daniel Cegiełka wrote:
> Hi,
> Is there a chance that musl will support REG_STARTEND? It is used
> quite often in *BSD.
> 
> http://www.sourceware.org/ml/libc-alpha/2004-03/msg00038.html

Probably not, at least not in the immediate future. The original TRE
code actually worked with strings as a base+length rather than
null-terminated internally, which meant a lot of things were a lot
more expensive they should be; if I remember correctly, even searches
for text guaranteed to be found near the beginning of the string
required strlen for the whole string, i.e. the whole operation was
needlessly O(n). In one of the cleanup rounds, I changed it to use
null termination, which simplified a lot of the tests; many checks
collapsed away since \0 was automatically not in the set being checked
against and thus no second check was requried.

If/when we overhaul regex again, I'll certainly consider this request
and see if the design can be made such that it's not expensive. But I
don't see any easy way to do it right now short of making a temp copy
of the string. That _would_ be possible; \0 could be replaced with
\xff, and \xff replaced with \fe, and special logic added to allow
\xff (which is otherwise an invalid byte and never matchable) while
still rejecting \xfe and other invalid bytes. This would require no
changes to the internals, but it would have the property of requiring
an O(n) malloc/memcpy, which is certainly not very appealing.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.