|
Message-ID: <00fb01cc5931$5dec2b50$19c481f0$@net> Date: Fri, 12 Aug 2011 15:49:44 -0500 From: "jfoug" <jfoug@....net> To: <john-dev@...ts.openwall.com> Subject: RE: Unicode, casing, obtaining data, and some real-world MSSQL (2000) data. >From: magnum [mailto:rawsmooth@...dband.net] >> What I found here, is several things. First, if the _wsetlocale() was >not >> called, then the only upcasing/lowcasing was A..Z<-> a..z Then, if >> _wsetlocale() was called (with a valid locale), then the exact same >casing >> was happening, NO MATTER WHAT locale is used. Remember, we are in >Unicode, >> so the OS simply turns on the above 0x7F casing rules, but they are >the same >> for the OS. > >Are you saying that if you set a locale it would go from just a-z to >complete Unicode - BUT using the system locale instead of the one you >specified? That weird, kinda defeats the whole purpose of wsetlocale(). No, what I meant to say, is if you did NOT call wsetlocale() then the ONLY data which was upcased using the _wcsupr() C function (which simple drills down to the Win32 API), only cases lower 128 byte ASCII values. Now, it may be within the _wcsupr() function, it knows whether you have called the setlocale on LC_CTYPE or LC_ALL (the 2nd one sets 'all' the locale items), then it does not drill down to the API, but simply falls back to the builtin clib strupr type functionality. I do not know for sure. However, no matter what locale information I fed into the wsetlocale function, the casing changes which showed up in the call to wcsupr() were the exact same characters. >> Thus, when I do release this, it will likely be an initial release, >and need >> some work tweaking it. Also, I had some problems with magnums recent >UTF-32 >> changes. I need to work through some of that with him, as I do not >fully >> understand all of that code. > >Do you mean the reinstated "third case" in utf8towcs()? I believe so. There were a couple of if blocks which printf error codes, and 'tried' to correct their location within the data stream. I commented both of those out, at this time. I know it is not right, and we will have to work through what 'is' right, but it allowed the format to process every data point from U+0 to U+FFFF. It is likely that I was simply spitting out invalid nonsense data, and the code was correct, in 'expecting' another UTF16 character, which was not present. However, I think this is simply garbage avoidance code. We simply have to get it where it keeps the process image 'safe', and does not output unneeded warnings. Like I said, what I initially publish will likely need some tuning. However, I do not think this would cause anyones cracking job to have any heartburn at all, right now. Jim.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.