|
Message-ID: <CAHv4kXg1VW2Ym+X0GEH4LezMMuREKAJe-DEsGq_xTEdjxYAvrQ@mail.gmail.com> Date: Wed, 22 May 2013 21:45:56 +0200 From: Jan Starke <jan.starke@...ofbed.org> To: john-users@...ts.openwall.com Subject: Re: Fuzzing with regular expressions > > This contradicts the Unicode section on http://code.google.com/p/rexgen/so you might want to revise that. Or better, make the code work like the > docs says :-) > This is a really cool challenge, as flex only supports single byte character sets (if not only ASCII). Their are some really weird approaches throughout the web. Maybe I will take a look at it. Until, I changed the spec to match the code ;-) > OK, I see it now. This also contradicts the web docs: the default is UTF-8 > and not UTF-32. And in this case the actual behavior is better - defaulting > to UTF-32 would be very odd! > I updated the docs; thank you for the advice. The original approach of using UTF-32 internally by default was driven by performance issues. Handling UTF-32 is simpler than handling UTF-8. The current approach is faster with UTF-8, which seems to be the better way... Regards, Jan
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.