|
|
Message-ID: <6D612B6AC5DCDA4580AF97B1068118AD2DC49A@DGGEML501-MBX.china.huawei.com>
Date: Sat, 18 Apr 2020 08:44:50 +0000
From: "liheng (P)" <liheng40@...wei.com>
To: Rich Felker <dalias@...c.org>
CC: "musl@...ts.openwall.com" <musl@...ts.openwall.com>, "Xiangrui (Euler)"
<rui.xiang@...wei.com>, Lizefan <lizefan@...wei.com>
Subject: regex Back reference matching result not same as glibc and
tre.
Rich Felker:
Hello, I've noticed musl regex matching result is not same as glibc and tre.
The back reference maybe not supported well in latest version.
Here is a simple test case:
#include <regex.h>
#include <stdio.h>
#include <string.h>
#define str "aba"
#define N 2
static const char *expected[N] =
{
str, "a"
};
static const char pat[] = "(.?).?\\1";
int test_regex(void)
{
regex_t rbuf;
int err = regcomp(&rbuf, pat, REG_EXTENDED);
if (err != 0) {
char errstr[300];
regerror(err, &rbuf, errstr, sizeof (errstr));
puts (errstr);
return err;
}
regmatch_t m[N];
err = regexec(&rbuf, str, N, m, 0);
if (err != 0) {
puts ("regexec failed");
return 1;
}
int result = 0;
int i;
for (i = 0; i < N; ++i) {
if (m[i].rm_so == -1) {
printf ("m[%d] unused\n", i);
result = 1;
}
else {
int len = m[i].rm_eo - m[i].rm_so;
printf ("m[%d] = \"%.*s\"\n", i, len, str + m[i].rm_so);
if (strlen (expected[i]) != len
|| memcmp (expected[i], str + m[i].rm_so, len) != 0)
result = 1;
}
}
return result;
}
int main (void)
{
int result = 0;
result = test_regex();
if (result != 0) {
printf("test regex failed\n");
} else {
printf("test regex success\n");
}
return result;
}
musl:
# ./test
regexec failed
test regex failed
glibc:
# ./test
m[0] = "aba"
m[1] = "a"
m[2] = ""
test regex success
tre:
# ./test
m[0] = "aba"
m[1] = "a"
m[2] = ""
test regex success
I noticed Rich Felker made change about back reference in below commit to suppress back reference processing in ERE regcomp.
commit 7c8c86f6308c7e0816b9638465a5917b12159e8f
Author: Rich Felker <dalias@...ifal.cx>
Date: Fri Mar 20 18:25:01 2015 -0400
suppress backref processing in ERE regcomp
one of the features of ERE is that it's actually a regular language
and does not admit expressions which cannot be matched in linear time.
introduction of \n backref support into regcomp's ERE parsing was
unintentional.
diff --git a/src/regex/regcomp.c b/src/regex/regcomp.c index bce6bc15..4d80cb1c 100644
--- a/src/regex/regcomp.c
+++ b/src/regex/regcomp.c
@@ -839,7 +839,7 @@ static reg_errcode_t parse_atom(tre_parse_ctx_t *ctx, const char *s)
break;
default:
- if (isdigit(*s)) {
+ if (!ere && isdigit(*s)) {
/* back reference */
This commit reminds me that if i want to use back reference i should not to tag REG_EXTENDED, but this test case matching still failed.
And I try to support back reference in ERE regcomp by below modify and then the musl regex matching success same as glibc and tre.
--- a/src/regex/regcomp.c
+++ b/src/regex/regcomp.c
default:
+ if (!ere && isdigit(*s)) {
+ if (ere && isdigit(*s)) {
/* back reference */
Thank you for considering this.
Li Heng
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.