|
Message-ID: <20120923051447.GC6940@openwall.com> Date: Sun, 23 Sep 2012 09:14:47 +0400 From: Solar Designer <solar@...nwall.com> To: George Argyros <argyros.george@...il.com> Cc: oss-security@...ts.openwall.com, Aggelos Kiayias <aggelos@...yias.com>, Vladimir Vorontsov <vladimir.vorontsov@...ec.ru>, gifts <gifts.antichat@...il.com>, Anthony Ferrara <ircmaxell@...il.com>, Pierre Joye <pierre.php@...il.com> Subject: Re: Randomness Attacks Against PHP Applications George, I watched a video of your USENIX Security talk the other day (after I was done with my experiments with mt_rand() seed cracking, though): http://www.youtube.com/watch?v=yE0qTTi-_iQ Thanks. Shortly before that, I released a faster version of my PHP mt_rand() seed cracker, now down to 1 minute worst case on CPU: http://www.openwall.com/lists/john-users/2012/09/20/2 http://download.openwall.net/pub/projects/php_mt_seed/ $ time ./php_mt_seed 1328851649 Found 0, trying 637534208 - 671088639, speed 63310249 seeds per second seed = 658126103 Found 1, trying 1207959552 - 1241513983, speed 63343447 seeds per second seed = 1234567890 Found 2, trying 4261412864 - 4294967295, speed 63347894 seeds per second Found 2 real 1m7.798s user 9m1.942s sys 0m0.008s In one minute of real time (on an FX-8120 CPU, using vectorized fused multiply-add instructions), it found the original seed, another seed that also produces the same mt_rand() output, and it searched the rest of the 32-bit seed space (not finding other matches in this case). Gifts implemented the same thing in OpenCL: https://github.com/Gifts/pyphp_rand_ocl https://github.com/Gifts/pyphp_rand_ocl/tree/SpeedyOCL His SpeedyOCL branch achieves similar performance (and even 10% better than mine in his test of both on a Core i5-2500) on CPUs (with Intel's OpenCL SDK) and several times better performance on GPUs. On Thu, Sep 20, 2012 at 02:31:19PM -0400, George Argyros wrote: > FYI, we have also released some code to exploit such vulnerabilities > about a month ago in github > (https://github.com/GeorgeArgyros/Snowflake). We hope that this > framework will allow the easier development of exploits for such > vulnerabilities. > The main difference with the code mentioned in the previous posts, is > that it may not always be possible to obtain the first output from > mt_rand() after seeding. For example, many applications only leak > truncated mt_rand() outputs in which case we should really compare > bits of different mt_rand outputs to test whether we found the correct > seed. Yes. Your implementation is far more generic. The way I'd approach this is by limiting the set of possible seeds based on the first mt_rand() output (perhaps with a min-max range, so _many_ seeds will be potentially valid - could be millions, yet significantly fewer than the full 4 billion set). This can be accomplished almost as quickly as cracking the seed based on an exact mt_rand() output (full 31 bits of it, no min-max range) - that is, in one minute on CPU. Then slower, but more generic code may be used to filter out the impossible seeds based on further mt_rand() outputs, until there's just one seed value left. The slowness of that second cracker would not matter much because it'd only need to search a much smaller seed space. A limitation, though, is that the very first mt_rand() output after seeding must be among those available, even if in truncated form. If it is not, then more of the state has to be maintained in the initial cracking pass, thereby making it slightly slower. When the first output is (at least partially) available, we only need 3 state elements, so they're kept in registers nicely. > For this reason, in our tool the user defines a "hash" function > that expresses any mt_rand dependent output and then we search for a > preimage for this hash function. That's a generic and curious approach, but I think it's slower than what I described above while not being _more_ generic. > We also include a python API for this functionalities as well as a > sample exploit for mediawiki 1.18.1. Some POCs might indeed help... > > We will also release the gaussian solver based tool we developed in > order to recover the internal state of mt_rand from its (truncated) > outputs in the following days. Cool! > I agree too that education is important. This is something that we > came to an agreement with the PHP team (for example that additional > information is needed on the mt_rand manual). However, as pointed out > nothing has changed yet (the conversations between us and the PHP team > took place in March/April). Did PHP 5.4's change of session IDs (vs. 5.3's) occur before or after your conversations with them? > However I think that the specific issue goes beyond education. First > of all, We believe that adding simple exploit mitigations such as > secure seeding in these functions is something that should definitely > happen. Secure seeding is easy to add using randomness from the > operating system and furthermore it will incur a negligible > performance overhead since it would happen only once in the process > lifetime. Agreed. > For a more concrete solution I think that the existence of secure > PRNGs is not enough. Even if mt_rand() was producing secure random > numbers we would still be having a lot of vulnerable applications just > from the way this function is used. Right. > Just as PHP devs use > session_start() to start a new PHP session there is a need for a > generate_token() function that will return a random token and take > care of providing the necessary level of security for this token. Perhaps, and this is in line with Anthony Ferrara's work on a new password hashing API for PHP 5.5 (which will eventually obsolete phpass, I guess). Simpler interfaces that are easier to use correctly than not. However, a reason why many PHP app devs use mt_rand() and the like is because they want to generate tokens of a certain format, such as to meet a standard. One such use is crypt() salts, which we're helping eliminate with phpass and then with Anthony Ferrara's new API. But I guess other uses will remain. So perhaps that generate_token() function would need to accept arguments and return tokens of different target formats accordingly. Maybe it should have a format argument similar to that of pack(). Its advantage over separate calls to a PRNG and then to pack() on the PRNG's output would be that it'd keep the distribution uniform. It is curious how it is often overlooked that simple modulo division may turn a uniform distribution into non-uniform. Here's an illustration (in case this is news to someone on oss-security): $ php -r 'for ($n = $i = 0; $i < 1000000; $i++) { $x = mt_rand(); $n += ($x < 0x40000000); } echo "$n\n";' 499567 $ php -r 'for ($n = $i = 0; $i < 1000000; $i++) { $x = mt_rand(); $n += ($x < 0x40000000); } echo "$n\n";' 500558 $ php -r 'for ($n = $i = 0; $i < 1000000; $i++) { $x = mt_rand() % 500000000; $n += ($x < 250000000); } echo "$n\n";' 535117 $ php -r 'for ($n = $i = 0; $i < 1000000; $i++) { $x = mt_rand() % 500000000; $n += ($x < 250000000); } echo "$n\n";' 534124 First two runs show uniform distribution of mt_rand() outputs themselves (assuming the 0 to 0x7fffffff range). The other two runs show non-uniform distribution over the 0 to 500 million minus 1 range, after the modulo division by 500 million. (Values lower than half the maximum suddenly became more common than higher values. It was 50/50 before the modulo division, but became 53.5/46.5 afterwards. The reason is obvious: the original space is not divisible into a whole number of 500 million sub-spaces.) This is why having separate functions that return a random number and those that process it to the desired format might not be good enough. A single function that does both jobs at once (avoiding the above problem by using a smarter approach) could work better. Alexander
Powered by blists - more mailing lists
Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.