|
Message-ID: <20120315234631.GA10059@openwall.com> Date: Fri, 16 Mar 2012 03:46:31 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: SSH thread-safety Dhiru, magnum, all - It was reported to me off-list that the "SSH" format in 1.7.9-jumbo-5 crashes on self-test on a 64-way machine running RHEL 6.2 on x86-64. I managed to reproduce similar crashes on an 8-core machine by increasing OMP_NUM_THREADS: $ for n in {1..10000}; do OMP_NUM_THREADS=$n GOMP_SPINCOUNT=1000000 ./john -te -fo=ssh; done &> sshout *** glibc detected *** double free or corruption (!prev): 0x0000000013d9ac50 *** *** glibc detected *** realloc(): invalid next size: 0x0000000000ba0600 *** *** glibc detected *** double free or corruption (!prev): 0x0000000003e80f50 *** *** glibc detected *** double free or corruption (!prev): 0x0000000001bcff70 *** *** glibc detected *** realloc(): invalid next size: 0x000000000de36c20 *** *** glibc detected *** realloc(): invalid next size: 0x000000001c12f010 *** *** glibc detected *** realloc(): invalid next size: 0x0000000004df17c0 *** *** glibc detected *** double free or corruption (!prev): 0x0000000001ad3ed0 *** *** glibc detected *** realloc(): invalid next size: 0x0000000006974160 *** *** glibc detected *** double free or corruption (!prev): 0x000000001798c2e0 *** *** glibc detected *** realloc(): invalid next size: 0x0000000002e73d50 *** *** glibc detected *** double free or corruption (!prev): 0x00000000135b0650 *** *** glibc detected *** realloc(): invalid next size: 0x00000000098041f0 *** *** glibc detected *** double free or corruption (!prev): 0x000000001144d830 *** *** glibc detected *** double free or corruption (!prev): 0x0000000015636440 *** *** glibc detected *** double free or corruption (!prev): 0x0000000005962d20 *** *** glibc detected *** double free or corruption (!prev): 0x0000000001caf160 *** *** glibc detected *** realloc(): invalid next size: 0x000000001c654eb0 *** *** glibc detected *** double free or corruption (!prev): 0x000000001f5b6fa0 *** These crashes correspond to these thread counts: $ fgrep Aborted sshout Benchmarking: ssh [32/64]... (44xOMP) Aborted Benchmarking: ssh [32/64]... (202xOMP) Aborted Benchmarking: ssh [32/64]... (523xOMP) Aborted Benchmarking: ssh [32/64]... (664xOMP) Aborted Benchmarking: ssh [32/64]... (765xOMP) Aborted Benchmarking: ssh [32/64]... (884xOMP) Aborted Benchmarking: ssh [32/64]... (1041xOMP) Aborted Benchmarking: ssh [32/64]... (1073xOMP) Aborted Benchmarking: ssh [32/64]... (1090xOMP) Aborted Benchmarking: ssh [32/64]... (1315xOMP) Aborted Benchmarking: ssh [32/64]... (1771xOMP) Aborted Benchmarking: ssh [32/64]... (2027xOMP) Aborted Benchmarking: ssh [32/64]... (2045xOMP) Aborted Benchmarking: ssh [32/64]... (2538xOMP) Aborted Benchmarking: ssh [32/64]... (3450xOMP) Aborted Benchmarking: ssh [32/64]... (3725xOMP) Aborted Benchmarking: ssh [32/64]... (4243xOMP) Aborted Benchmarking: ssh [32/64]... (4528xOMP) Aborted Benchmarking: ssh [32/64]... (4699xOMP) Aborted Additionally, john went into an infinite loop two times during the above run - I had to kill those john processes. That was for 103 and 4773 threads. In both cases, the gdb backtrace looked like: (gdb) bt #0 0x00002b4af3dbc591 in gomp_team_barrier_wait_end () from /usr/lib64/libgomp.so.1 #1 0x00002b4af3dbb62e in gomp_team_end () from /usr/lib64/libgomp.so.1 Perhaps I could find something more informative by looking at per-thread backtraces, but I did not bother. BTW, for 4773 threads, the process consumed over 46 GB of address space: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND solar 682459 799 0.2 48915460 43696 pts/3 Sl+ Mar15 1524:08 ./john -te -fo=ssh I did not proceed to test even higher thread counts (I interrupted the "for" loop in the shell) - I felt the above was enough info. I also repeated the experiment with ASLR disabled: $ for n in {1..10000}; do OMP_NUM_THREADS=$n GOMP_SPINCOUNT=1000000 ./john -te -fo=ssh; done &> sshout-nonrand *** glibc detected *** double free or corruption (!prev): 0x00000000008dac50 *** *** glibc detected *** realloc(): invalid next size: 0x000000000091d4d0 *** *** glibc detected *** double free or corruption (!prev): 0x000000000092c650 *** *** glibc detected *** realloc(): invalid next size: 0x000000000092ec20 *** *** glibc detected *** double free or corruption (!prev): 0x0000000000935b30 *** *** glibc detected *** double free or corruption (!prev): 0x0000000000941db0 *** *** glibc detected *** realloc(): invalid next size: 0x0000000000920c80 *** *** glibc detected *** realloc(): invalid next size: 0x00000000009135d0 *** $ fgrep -i abort sshout-nonrand Benchmarking: ssh [32/64]... (44xOMP) Aborted Benchmarking: ssh [32/64]... (642xOMP) Aborted Benchmarking: ssh [32/64]... (803xOMP) Aborted Benchmarking: ssh [32/64]... (826xOMP) Aborted Benchmarking: ssh [32/64]... (897xOMP) Aborted Benchmarking: ssh [32/64]... (1024xOMP) Aborted Benchmarking: ssh [32/64]... (1027xOMP) Aborted Benchmarking: ssh [32/64]... (1272xOMP) Aborted Got infinite loop for 1532 threads this time, same kind of backtrace: (gdb) bt #0 0x00002aaaabb8a591 in gomp_team_barrier_wait_end () from /usr/lib64/libgomp.so.1 #1 0x00002aaaabb8962e in gomp_team_end () from /usr/lib64/libgomp.so.1 I similarly did not proceed to try higher thread counts. GCC 4.6.2, OpenSSL 1.0.0d. (The RHEL 6.2 system where the problem was initially detected had slightly different versions, though.) My guess is that the OpenSSL functions we're calling are still not entirely thread-safe even in these recent versions of OpenSSL. We could want to look into this and maybe end up submitting a patch to OpenSSL. Additionally, the has_been_cracked[] array elements type should be changed from char to int (or maybe even sig_atomic_t) because at least the original Alpha lacked instructions to update individual bytes in memory. To update a byte, it would have to read a 32- or 64-bit word, update it in a register, and write the entire word back. This may undo a change being made to a nearby byte by another thread at about the same time. For Alpha, this was corrected with BWX: http://en.wikipedia.org/wiki/DEC_Alpha#Byte-Word_Extensions_.28BWX.29 I don't know if there are any other archs (on which JtR may reasonably be run) that have a similar limitation, and Alpha is history, yet I think we want to correct this. We'll also need to adjust the memset() call to use sizeof() instead of MAX_KEYS_PER_CRYPT. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.