Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 3 Nov 2020 20:16:42 +0100
From: Solar Designer <>
Subject: Re: Rules characters unicode support.

In addition to what magnum wrote:

On Tue, Nov 03, 2020 at 03:48:57PM +0100, François wrote:
> character substitutions from ASCII to Unicode were hitting some results (a
> few hits on a large leak) for example:
> seé
> suü
> scç
> soö
> saã
> soø
> snñ
> saå

> should I just try to use the A"..." command for my niche finding ?

BTW, you can:

/e Dp Ap"é"

This is three commands: search for one character, delete the found
character, insert a possibly multi-character string (in our case, just
a multi-byte character) in the former character's place.

You can also specify the multi-byte character via its hex codes, which
makes the .conf file format character set agnostic (so you can have any
character set active in your text editor, and it won't matter):

/e Dp Ap"\xc3\xa9"

However, the rules are indeed not character set agnostic - as written
above, the rule produces UTF-8.

A difference from the "s" command is that the above rule will find and
replace only the first match, whereas "s" would find and replace all.

You can reduce this difference by writing multiple rules like this:

/e Dp Ap"\xc3\xa9"
/e Dp Ap"\xc3\xa9" /e Dp Ap"\xc3\xa9"
/e Dp Ap"\xc3\xa9" /e Dp Ap"\xc3\xa9" /e Dp Ap"\xc3\xa9"

You can also choose which instances of the character you replace, e.g.
to replace only the second:

%2e Dp Ap"\xc3\xa9"


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.