Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5e13e61f0a963e4ffc074be0bd6432b4@smtp.hushmail.com>
Date: Wed, 22 May 2013 21:00:18 +0200
From: magnum <john.magnum@...hmail.com>
To: john-users@...ts.openwall.com
Subject: Re: Fuzzing with regular expressions

On 22 May, 2013, at 12:40 , Jan Starke <jan.starke@...ofbed.org> wrote:
> 2013/5/22 magnum <john.magnum@...hmail.com>
>> I do not quite understand the section about Unicode. And it does not seem
>> to work (my terminal is UTF-8):
>> 
>> $ rexgen "M[üö]ller"
>> Mller
>> Mller
>> Mller
>> $ rexgen -u8 n "M[üö]ller"
>> Mller
>> Mller
>> Mller
>>  
>> -DUTF_VARIANT=8 does not change the above, in case it was supposed to. 
> 
> rexgen currently cannot use Unicode strings as input, due to limitations of
> the lexer (GNU flex). flex ignores any characters which are not known to
> it. If you want to generate unicode characters, you must specify them with
> the \uxxxx syntax, e.g.
> 
> rexgen 'M(ue|oe|\u00fc|\u00f6)ller'

This contradicts the Unicode section on http://code.google.com/p/rexgen/ so you might want to revise that. Or better, make the code work like the docs says :-)

> The aim of the options u8, u16 and u32 are to enforce the output encoding.
> To verify this, you could create a hexdump of the output:
> 
> rexgen 'test' | od -x

OK, I see it now. This also contradicts the web docs: the default is UTF-8 and not UTF-32. And in this case the actual behavior is better - defaulting to UTF-32 would be very odd!

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.