Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 6 Nov 2012 01:06:36 +0100
From: magnum <>
Subject: Re: Croatian charset support (was: KoreLogic rules on Openwall)

On 4 Nov, 2012, at 8:31 , magnum <> wrote:

> On 3 Nov, 2012, at 11:23 , Vlatko Kosturjak <> wrote:
>> On 11/02/2012 11:49 AM, magnum wrote:
>>>>> I have also made localized/Croatian version of the rules (only parts
>>>>> which are relevant):
>>>> Cool. On that subject, would you like me to add some codepage support? It's extremely easy to do so with the toolchain we made when adding the codepage support. What would be right for Croatian? CP852 and ISO-8859-2 perhaps?
>>> I found this:
>>> So I suppose CP852, CP1250 and ISO-8859-2? Having these in place will make the rules engine able to, for example, upper/lowercase Croatian non-ascii letters. The UTF-8 support does not include the Rules engine when it comes to single letter manipulations.
>> Thanks magnum!
>> Yes, it's CP852, CP1250, ISO-8859-2 and of course UTF-8.
>> Also, does it support stripping of special characters, so for example š
>> becomes s, č becomes c, etc..?
>> Also, I'm interested - in what charset I should write rules? So, they
>> can be automatically converted?

The mentioned codepages are now added to unstable-jumbo. Have a look at doc/ENCODINGS and try it out!

If you really want "stripping" implemented in the rules engine, I have found a canonical way to do it. Using Unicode's decomposition feature, the script we use for adding new codepages can be extended to generate the needed lookup table for any codepage. É will become E, ü will become u and so on. The hardest part is probably deciding what rule command letter to use.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.