Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130115184513.GZ20323@brightrain.aerifal.cx>
Date: Tue, 15 Jan 2013 13:45:13 -0500
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: REG_STARTEND (regex)

On Tue, Jan 15, 2013 at 04:16:29PM +0100, Daniel Cegiełka wrote:
> Thank you for your reply. It's terribly sad that there are so many
> problems with portability. There are a lot of high-quality tools in
> the *BSD, which could be used in Linux. And rather than stick to the
> POSIX people still create a barrier, like REG_STARTEND, 'sed -i',
> bison (instead POSIX yacc), perl in the Makefile(!!!) etc.
> 
> 'sed -i' is used in many programs (even linux, e2fsprogs, old libcap
> etc.) and there is no chance to avoid it. So I'm looking for an
> alternative to the gnu-sed+gnulib. I found that sed from FreeBSD has
> support for -i and is much smaller than the gnu sed:
> 
> http://svnweb.freebsd.org/base/release/9.1.0/usr.bin/sed/
> 
> ls -lh /bin/sed ./sed
> -rwxr-xr-x 1 root root 143K Jun 22  2012 /bin/sed
> -rwxr-xr-x 1 root root  35K Jan 15 14:32 ./sed
> 
> (compiled on linux with glibc)
> 
> Now I want to use it with musl, but sed (and grep) from FreeBSD uses
> REG_STARTEND and I don't really know how to solve this problem.
> 
> 
> http://svnweb.freebsd.org/base/release/9.1.0/usr.bin/sed/process.c?revision=243808&view=markup
> 
> 651 	/* Set anchors */
> 652 	match[0].rm_so = 0;
> 653 	match[0].rm_eo = slen;
> 654 	
> 655 	eval = regexec(defpreg, string,
> 656 	nomatch ? 0 : maxnsub + 1, match, eflags | REG_STARTEND);
> 
> 
> Does anyone have suggestions on how this can be modified to be able to
> use it with musl.

If the start position is 0, which it seems to be here, there's nothing
to be done but removing REG_STARTEND. All it's doing is allowing you
to process data with embedded nul bytes, which is not required by the
standard or useful for any meaningful use of sed. Nobody will notice
the difference with it missing unless they're trying to perform
hideous hacks like patching binary files with sed...

If the start position were not zero, you could compensate by just
adding the start offset to the pointer you pass in, then adjusting all
the match offsets after regexec returns.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.