|
Message-ID: <20110809120231.GA27064@openwall.com> Date: Tue, 9 Aug 2011 16:02:31 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: "valid character" class On Tue, Aug 09, 2011 at 01:00:31PM +0200, magnum wrote: > OK, I think we'll go for ?y for 'valid' then. Sounds good. > Question to *all*: There are some characters that are truly invalid for > a codepage, like 0x98 in cp1251. There are also characters that are not > really invalid per the Unicode spec, but control characters. For > example, in most (all?) ISO-8859-xx codepages, the characters > 0x80..0x9F. Should we treat the latter as invalid? There are pros and > cons. My personal vote is that we should treat them as invalid, i.e. the > rule !?Y would drop any candidate that contains 0x80..0x9F if we're > using --enc=iso-8859-1 but only 0x98 if using -enc=cp1251. I concur. We could also want to introduce a class for control chars, though. By default, it'd cover whatever chars are usually the control ones on terminals - see the DumbForce sample. However, for example, --encoding=cp1251 will turn most chars in the 0x80 to 0x9f range to non-control, even though they will remain risky to the terminal... In practice, I'd expect the complement of this class (non-control) to be more useful. We'll get that one automatically. So we'll have ?y for valid and ?O for non-control - similar, but different (as you explained above). Oh, and we could want to allocate a consecutive range of character class letters (maybe a very small range) for user-defined classes. Maybe we could use digits rather than letters, but then there won't be automatic complements. > One effect of doing so is ability to reject/accept any UTF-8 encoded > words (from a mixed wordlist like RockYou.txt) using such rules because > *all* non-ascii characters in UTF-8 contains octets in that range. In what range? Sorry, I don't understand what you mean here. There are UTF-8 characters that are not ASCII yet that do not contain octets in the 0x80 to 0x9f range. So perhaps you meant something else. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.