|
Message-ID: <20161005162305.GI19318@brightrain.aerifal.cx> Date: Wed, 5 Oct 2016 12:23:05 -0400 From: Rich Felker <dalias@...c.org> To: Julien Ramseier <j.ramseier@...il.com> Cc: musl@...ts.openwall.com, Johannes.Schindelin@....de Subject: Re: [PATCH] regex: REG_STARTEND support On Wed, Oct 05, 2016 at 02:19:35PM +0200, Julien Ramseier wrote: > Here's my REG_STARTEND patch, mostly copied from the original tre[1] > implementation. > It's only lightly tested. > [...] > diff --git a/src/regex/regexec.c b/src/regex/regexec.c > index 16c5d0a..ae65726 100644 > --- a/src/regex/regexec.c > +++ b/src/regex/regexec.c > @@ -29,6 +29,7 @@ > > */ > > +#include <sys/types.h> > #include <stdlib.h> > #include <string.h> > #include <wchar.h> > @@ -51,11 +52,15 @@ tre_fill_pmatch(size_t nmatch, regmatch_t pmatch[], int cflags, > > #define GET_NEXT_WCHAR() do { \ > prev_c = next_c; pos += pos_add_next; \ > - if ((pos_add_next = mbtowc(&next_c, str_byte, MB_LEN_MAX)) <= 0) { \ > - if (pos_add_next < 0) { ret = REG_NOMATCH; goto error_exit; } \ > - else pos_add_next++; \ > + if (len >= 0 && pos >= len) \ > + next_c = L'\0'; \ As caught discussing this on #musl yesterday, pos (int) here has the wrong type, int, which is a big problem. I'm going to work on a test case to show it and confirm that changing the type fixes it. > + else { \ > + if ((pos_add_next = mbtowc(&next_c, str_byte, MB_LEN_MAX)) <= 0) { \ > + if (pos_add_next < 0) { ret = REG_NOMATCH; goto error_exit; } \ > + else pos_add_next++; \ > + } \ > + str_byte += pos_add_next; \ > } \ > - str_byte += pos_add_next; \ There also seems to be a bug, which was also present in the original TRE I think, whereby read past len can happen if the buffer up to len ends with a partial multibyte character. Avoiding this seems rather costly. Otherwise this doesn't look too bad. I'll see if we can get some figures for how it affects performance. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.