Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fb94f729ce1157cf5326e8bf9937057d@smtp.hushmail.com>
Date: Tue, 15 May 2012 02:08:06 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: RAR early reject (was: New RAR OpenCL kernel)

Milen,

I now tested on a couple of thousand files more. Unfortunately this test
is pointless. We're trading false negatives for performance.

I have seen up to 61 same-in-a-row from valid streams (jpeg files). In
the same test set, the worst number for an *invalid* stream was a third
of that.

Still, you made me realise that "if (n >= decode->MaxNum)" thing (that
*is* a safe early reject, though not as effective as I was hoping) as
well as lots of memory leaks I had lurking :)

magnum


On 05/09/2012 02:13 AM, magnum wrote:
> OK, then I might have got equivalent code now. I made 326 test rar files
> and used a threshold of 3. All but 9 was cracked. Raised threshold to 4
> and got 4 more. Raised to 5 and got all but one. That last one need a
> threshold of 10. All these were pretty small files (*.[ch] from john's
> src dir, the worst file was md5_eq.c rar'ed with -ptest)
> 
> magnum
> 
> 
> On 05/08/2012 09:26 AM, Milen Rangelov wrote:
>> I've never seen more than 3 of them in a row for a correct stream, yet
>> having 2 or 3 in a row is quite common. I am doing all my checks in
>> unpack29 though.
>>
>> On Mon, May 7, 2012 at 10:00 PM, magnum <john.magnum@...hmail.com> wrote:
>>
>>> On 05/07/2012 09:27 AM, Milen Rangelov wrote:
>>>> I think we can narrow the case further (that's what I did, still not sure
>>>> it's completely safe).
>>>>
>>>> To understand why a bad stream returns long series of same values, you
>>> need
>>>> to have a look at the end of rar_decode_number() function. You can do one
>>>> more stricter check: if rar_decode_number() returns repeatedly
>>>> decode->DecodeNum[0], then it's "likely" a bad stream. Still I think it
>>>> might happen for some valid stream too. But I guess there must be some
>>>> threshold that can be considered relatively safe.
>>>
>>> I totally missed that one. Even if decode->DecodeNum[0] is valid several
>>> times, I would *guess* this code is safe:
>>>
>>> @@ -387,7 +387,7 @@ int rar_decode_number(unpack_data_t *unpack_data,
>>> struct Decode *decode)
>>>        rar_addbits(unpack_data, bits);
>>>
>>>
>>> n=decode->DecodePos[bits]+((bit_field-decode->DecodeLen[bits-1])>>(16-bits));
>>>        if (n >= decode->MaxNum) {
>>> -               n=0;
>>> +               return -1;
>>>        }
>>>        rar_dbgmsg("rar_decode_number return(%d)\n", decode->DecodeNum[n]);
>>>
>>> ...and add code in read_tables() and rar_unpack29() so this -1
>>> propagates to an immediate bailout. Seems to work fine but this alone
>>> doesn't help performance as much as the threshold for repeated numbers.
>>> BTW forget what I wrote seeing 26 of them, I did not reset the counter
>>> properly. Besides, it depends on how/where you count them. I had the
>>> check in read_tables() but it should be inside rar_decode_number()
>>> itself (again returning -1 for bailout) because there are more callers.
>>>
>>> magnum
>>>
>>>
>>>> I think with ZIP we have somewhat similar situation - it is possible
>>>> (though very unlikely) that for a small file the verifier matches, file
>>> is
>>>> inflated correctly and checksum matches, yet password candidate is wrong.
>>>>
>>>>
>>>>
>>>> On Mon, May 7, 2012 at 2:59 AM, magnum <john.magnum@...hmail.com> wrote:
>>>>
>>>>> On 05/05/2012 02:09 AM, Milen Rangelov wrote:
>>>>>> Well, there are two cases here, a PPM block or a LZ block. The PPM case
>>>>>> would very quickly give you an error if stream is invallid and unpack29
>>>>>> would switch back to the pure LZSS case.
>>>>>
>>>>> Do you mean you modified something in unpack29 for this case? When I
>>>>> debugged it I got the impression it already rejects properly in this
>>> case.
>>>>>
>>>>>> Then for Lempel-Ziv  it's highly
>>>>>> unlikely that you'd get the same result from rar_decode_number()
>>>  several
>>>>>> times at a row unless the stream is bad. The key point here being
>>>>> "several"
>>>>>> and that's where  my concerns are.
>>>>>
>>>>> Nice. I tested a little and I've seen the same result at most 26 times
>>>>> in a row for a valid stream while bad streams can have several hundreds.
>>>>> The question is what threshold would be safe...
>>>>>
>>>>> I think I'll have a look at Jim's pkzip code again, maybe something in
>>>>> there is usable for the LZ case.
>>>>>
>>>>> magnum
>>>>>
>>>>>
>>>>>
>>>>>> On Sat, May 5, 2012 at 2:45 AM, magnum <john.magnum@...hmail.com>
>>> wrote:
>>>>>>
>>>>>>> On 05/04/2012 08:19 PM, Milen Rangelov wrote:
>>>>>>>> OK, my tests are all successful until now. Speed is relatively the
>>>>> same,
>>>>>>>> doesn't matter whether the compressed file is a 4KB text file, or a
>>>>> 50MB
>>>>>>>> avi movie :) Unfortunately there is still overhead as compared to the
>>>>> HE
>>>>>>>> case, speed being somewhat 20% slower. I am still not 100% whether
>>> this
>>>>>>>> would hold true in all cases and if I am wrong, it might happen that
>>>>> for
>>>>>>>> some archives we can get false negatives.
>>>>>>>>
>>>>>>>> I am not yet 100% convinced I am correct yet though. The approach is
>>>>>>> based
>>>>>>>> on 2 assumptions, first one can easily be proved, the second one is
>>>>>>> related
>>>>>>>> to the LZ algorithm and I cannot yet prove what I check is safe and
>>>>> can't
>>>>>>>> occur in a valid LZ stream.
>>>>>>>
>>>>>>> This is good news Milen, just what I hoped for. I am SURE there are
>>> such
>>>>>>> short-cuts because cRARk runs full speed no matter what you throw at
>>> it.
>>>>>>> Please do share your findings, you know I would.
>>>>>>>
>>>>>>> magnum
> 
> 


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.