|
Message-ID: <20140308060621.GA29594@openwall.com> Date: Sat, 8 Mar 2014 10:06:21 +0400 From: Solar Designer <solar@...nwall.com> To: cve-assign@...re.org Cc: oss-security@...ts.openwall.com Subject: Re: Linux-PAM pam_unix/unix_chkpwd is fail-open Hi, FWIW, my posting was not a CVE request. It was a suggestion for making Linux-PAM safer to use. That said, I appreciate your comments, and I am happy to clarify: On Fri, Mar 07, 2014 at 11:53:11PM -0500, cve-assign@...re.org wrote: > (1) part of the PAM software needs to run a helper program by using > execve > > (2) the purpose of the helper program is to check whether a password > is correct > > (3) the helper program is inherently a trusted program (it is under > the same administrative control as the PAM software) Correct so far. > (4) the helper program could use a simple programming model in which > a zero exit status confirms that the password is correct, and no > other exit status confirms that Linux-PAM's helper program does use this programming model, yes. As to whether this is an acceptable choice, opinions may vary. > (5) if the helper program is correctly written, and the operating > system is behaving normally, this programming model is > sufficient Sort of. The issue here is that this programming model turns non-security bugs/peculiarities in multiple parts of the operating system into security holes, or it may be more appropriate to say that if/when this happens, this programming model itself is a security hole. Process termination (or failed startup) may happen for a lot of reasons, and with this programming model it is sufficient for any one of those to look like normal termination with zero exit code, for there to be a vulnerability. For example, does the dynamic linker always exit non-zero if it fails to start the program up (in one of many ways)? Do all libc functions that might happen to terminate the program guarantee a non-zero exit code? And indeed, does the kernel guarantee indication of abnormal process termination in all the many cases that it may have to refuse to start or to kill a process? I'd expect bugs of this nature to be introduced once in a while, and it'd be a pity (and unjustified risk) for them to be escalated to security bugs via a poorly chosen programming model in Linux-PAM (or elsewhere, for that matter). > (6) however, some people feel that this is not good enough. > Specifically, they feel that the PAM software must have a > defense against the possibility that the helper program has a > minor logic error in which it sometimes has an unintended zero > exit status. Yes, but mostly not against errors in the (tiny) program itself, but against errors in the (much larger, more complicated, and changing) system components that the program's exit code also depends upon. > (7) there are two examples of ways to have this defense: (A) the > exit status of the helper program is not used, and instead the > helper program must print "authorized" or (B) the helper program > must exit with the status 0x0a00ff7f, which is less likely to > occur with a logic error No, (A) and (B) are actually the same. The magic value 0x0a00ff7f is passed via a file descriptor, just like the "authorized" word is. This is in addition to the exit status check. (We couldn't pass a value this large via the exit code.) > Is (5) above inaccurate? In other words, is the threat model that the > PAM software is realistically sometimes used on systems in which > waitpid determines that WIFEXITED was true and WEXITSTATUS was zero, > even though the actual code path of the helper program provided a > nonzero exit status? Are we, for example, anticipating kernel bugs or > hardware bugs that cause this? Kernel and dynamic linker and libc bugs mostly. With such bugs, the problem may appear before control would reach one of the helper program's normal exit() calls. > If not, then why is 0x0a00ff7f implemented only for this > interprocess-communication case, and not for in-process function > calls? In other words, any time that a C program calls a > security-critical function and tests for a return value of zero, > shouldn't this be changed to a return value of, for example, > 0x0a00ff7f? Any function might have a minor logic error in which it > calls "return;" or reaches the end, even though "return -1" was > intended. We obviously need to draw the paranoia line somewhere. The problem needs to be somewhat likely to occur, the defense likely to be effective, and the complexity increase affordable and not likely to cause additional bugs (especially not security bugs). This approach might be reasonable for some especially critical functions in a C program as well, but it is less obviously the right thing to do. If used within a program, it'd protect mostly against different risks - such as improper use of APIs within the program itself, out of bounds writes, and/or hardware errors that would otherwise result in fail-open or otherwise risky behavior. Magic values are in fact sometimes used within (production builds of) programs. Also, a function is very unlikely to be made to return control back to the caller other than through reaching a return statement or the end of its body. This isn't something the kernel or a libc function would be likely to cause an application's function to do. In contrast, the kernel and libc are likely to terminate a process on many conditions. Here's a test question: was this communication channel meant to carry security-critical information (as well as possibly other information)? I think that for C function return values, the answer is "yes", whereas for process exit status the answer might be "no". > Going back to the execve case, one downside of the Owl change is that > a custom helper program designed for another distribution apparently > has to be modified before it is used on Owl. In other words, > maintainability is reduced a little, apparently in favor of a > defense-in-depth security improvement. Theoretically, yes, although this protocol was and remains internal to each implementation (Linux-PAM's pam_unix and its helper, or our pam_tcb and its helper). > This is not the type of scenario that would typically have a CVE ID. Yes, I didn't expect it would be. (I guess a CVE ID would need to be assigned to some piece of software if this problem is demonstrated in practice on a specific combination of software versions later.) Thanks, Alexander
Powered by blists - more mailing lists
Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.