|
Message-ID: <20111126194543.GA16734@openwall.com> Date: Sat, 26 Nov 2011 23:45:43 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: 1.7.9's --external + OpenMP fails on Cygwin JimF - Here's a really weird bug that I spent several hours on today, without much luck. Maybe you'll be able to figure it out? I built 1.7.9 on latest Cygwin (with gcc 4.5.3), including with OpenMP (this required a fix to the Makefile line for john.exe to pass LDFLAGS). Then I went to test it. While most things worked fine, I unexpectedly got the program to lock up with --external=LanMan. Then I reproduced the same with -e=Double. Then with other hash types as well. In my testing, the problem occurs only with OpenMP builds on Cygwin running more than one thread, but only when --external is the main cracking mode. Hash type does not matter (I tested DES, MD5, BF from pw-fake-unix available on the wiki). The problem does not occur with --incremental, not even when I add an external filter to that. It also does not occur with OMP_NUM_THREADS=1. I tried to reproduce it on Linux by compiling without OS_TIMER (more similar to the build with Cygwin) - no luck (that is, the program ran fine on Linux no matter what). Also, the problem does not occur with 1.7.8 built in a similar fashion (tested on BF only, obviously). It is new with 1.7.9. I did not test any -jumbo, I did not try moving my build to another machine yet, and I did not try using a third-party build of recent JtR. I debugged the problem in OllyDbg a little bit. (It's my first time using this debugger, by the way.) On a dual-core, there are three threads - two are running, one is mostly waiting. When the problem is triggered - which happens just a few seconds after program start - only one running thread remains, and it is looping in cyggomp-1's calls to cygwin1.dll's sem_wait(). Specifically, per gcc/libgomp sources, it appears to assume that if sem_wait() returns an error, that error must be EINTR because of a signal, so it simply repeats the call. In my case, the error is instead EINVAL (yes, I did locate and check errno). Why the semaphore is invalid I don't know. There's code that checks the semaphore struct at offset +4 for magic values 0xdf0df04c (PTHREAD_MUTEX_MAGIC) and 0xdf0df046 (PTHREAD_RWLOCK_MAGIC). When the EINVAL looping occurs, the value is the latter. Changing it to the former (which the underlying code checks for first) made the EINVAL go away and the program continue working for a while longer (even cracking some more passwords), but that's just black magic. I don't see relevant changes between 1.7.8 and 1.7.9. While I did change the external mode code a little bit, none of those changes look like a likely culprit. I've tested that int_word[] is not being overflown on the copy from ext_word[] - it is not. I suppose we can try bisecting the changes between 1.7.8 and 1.7.9 anyway, but I am not sure if this will help much. Would you try to reproduce this? I am really not into Windows. Even for OllyDbg, I am running it over VNC from a Linux desktop. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.