|
Message-ID: <6D612B6AC5DCDA4580AF97B1068118AD2DC49A@DGGEML501-MBX.china.huawei.com> Date: Sat, 18 Apr 2020 08:44:50 +0000 From: "liheng (P)" <liheng40@...wei.com> To: Rich Felker <dalias@...c.org> CC: "musl@...ts.openwall.com" <musl@...ts.openwall.com>, "Xiangrui (Euler)" <rui.xiang@...wei.com>, Lizefan <lizefan@...wei.com> Subject: regex Back reference matching result not same as glibc and tre. Rich Felker: Hello, I've noticed musl regex matching result is not same as glibc and tre. The back reference maybe not supported well in latest version. Here is a simple test case: #include <regex.h> #include <stdio.h> #include <string.h> #define str "aba" #define N 2 static const char *expected[N] = { str, "a" }; static const char pat[] = "(.?).?\\1"; int test_regex(void) { regex_t rbuf; int err = regcomp(&rbuf, pat, REG_EXTENDED); if (err != 0) { char errstr[300]; regerror(err, &rbuf, errstr, sizeof (errstr)); puts (errstr); return err; } regmatch_t m[N]; err = regexec(&rbuf, str, N, m, 0); if (err != 0) { puts ("regexec failed"); return 1; } int result = 0; int i; for (i = 0; i < N; ++i) { if (m[i].rm_so == -1) { printf ("m[%d] unused\n", i); result = 1; } else { int len = m[i].rm_eo - m[i].rm_so; printf ("m[%d] = \"%.*s\"\n", i, len, str + m[i].rm_so); if (strlen (expected[i]) != len || memcmp (expected[i], str + m[i].rm_so, len) != 0) result = 1; } } return result; } int main (void) { int result = 0; result = test_regex(); if (result != 0) { printf("test regex failed\n"); } else { printf("test regex success\n"); } return result; } musl: # ./test regexec failed test regex failed glibc: # ./test m[0] = "aba" m[1] = "a" m[2] = "" test regex success tre: # ./test m[0] = "aba" m[1] = "a" m[2] = "" test regex success I noticed Rich Felker made change about back reference in below commit to suppress back reference processing in ERE regcomp. commit 7c8c86f6308c7e0816b9638465a5917b12159e8f Author: Rich Felker <dalias@...ifal.cx> Date: Fri Mar 20 18:25:01 2015 -0400 suppress backref processing in ERE regcomp one of the features of ERE is that it's actually a regular language and does not admit expressions which cannot be matched in linear time. introduction of \n backref support into regcomp's ERE parsing was unintentional. diff --git a/src/regex/regcomp.c b/src/regex/regcomp.c index bce6bc15..4d80cb1c 100644 --- a/src/regex/regcomp.c +++ b/src/regex/regcomp.c @@ -839,7 +839,7 @@ static reg_errcode_t parse_atom(tre_parse_ctx_t *ctx, const char *s) break; default: - if (isdigit(*s)) { + if (!ere && isdigit(*s)) { /* back reference */ This commit reminds me that if i want to use back reference i should not to tag REG_EXTENDED, but this test case matching still failed. And I try to support back reference in ERE regcomp by below modify and then the musl regex matching success same as glibc and tre. --- a/src/regex/regcomp.c +++ b/src/regex/regcomp.c default: + if (!ere && isdigit(*s)) { + if (ere && isdigit(*s)) { /* back reference */ Thank you for considering this. Li Heng
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.