|
Message-ID: <5e13e61f0a963e4ffc074be0bd6432b4@smtp.hushmail.com> Date: Wed, 22 May 2013 21:00:18 +0200 From: magnum <john.magnum@...hmail.com> To: john-users@...ts.openwall.com Subject: Re: Fuzzing with regular expressions On 22 May, 2013, at 12:40 , Jan Starke <jan.starke@...ofbed.org> wrote: > 2013/5/22 magnum <john.magnum@...hmail.com> >> I do not quite understand the section about Unicode. And it does not seem >> to work (my terminal is UTF-8): >> >> $ rexgen "M[üö]ller" >> Mller >> Mller >> Mller >> $ rexgen -u8 n "M[üö]ller" >> Mller >> Mller >> Mller >> >> -DUTF_VARIANT=8 does not change the above, in case it was supposed to. > > rexgen currently cannot use Unicode strings as input, due to limitations of > the lexer (GNU flex). flex ignores any characters which are not known to > it. If you want to generate unicode characters, you must specify them with > the \uxxxx syntax, e.g. > > rexgen 'M(ue|oe|\u00fc|\u00f6)ller' This contradicts the Unicode section on http://code.google.com/p/rexgen/ so you might want to revise that. Or better, make the code work like the docs says :-) > The aim of the options u8, u16 and u32 are to enforce the output encoding. > To verify this, you could create a hexdump of the output: > > rexgen 'test' | od -x OK, I see it now. This also contradicts the web docs: the default is UTF-8 and not UTF-32. And in this case the actual behavior is better - defaulting to UTF-32 would be very odd! magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.