|
Message-ID: <20200707154738.GA2846@openwall.com> Date: Tue, 7 Jul 2020 17:47:38 +0200 From: Solar Designer <solar@...nwall.com> To: lkrg-users@...ts.openwall.com Subject: user-triggerable Oops on Linux 4.17+ (64-bit only) Hi, This is a heads-up that there's an important bug fix commit now in the LKRG repo: commit b3a499e7f6071e00338f173b2c614227e810397e Author: Adam_pi3 <pi3@....com.pl> Date: Sat Jul 4 16:11:17 2020 -0400 Fix user-triggerable Oops (dereference of a near-NULL pointer) on newer kernels with new syscall implementation. Found by Jason A. Donenfeld. We recommend all users of LKRG on Linux 4.17 or newer on x86_64 or arm64 to update to a revision of LKRG with the above fix included. We intend to release LKRG 0.8.1 including this fix shortly. Bug impact: As long as the kernel's mmap_min_addr works as intended, the impact of near-NULL pointer dereference bugs is limited. On systems that don't set panic_on_oops and don't use lockdown, this is just a nuisance and some information getting in the logs. On systems that set panic_on_oops (most notably, RHEL and its clones), this is a DoS (kernel panic). Finally, as Jason A. Donenfeld pointed out, there's a shortcoming in the kernel's lockdown mechanism where root may disable mmap_min_addr thus making many of the (near-)NULL pointer dereference bugs exploitable into lockdown bypasses by root (thus, for escalation from root to ring 0). We didn't evaluate whether this particular bug is usable as a lockdown bypass or not. For this to matter, LKRG would need to be signed and used along with lockdown, which we think is currently unusual. Bug origin: Linux 4.17+ includes a major change to how syscalls are handled within the kernel (see the patch series starting with "[PATCH 000/109] remove in-kernel calls to syscalls"), in particular introducing CONFIG_ARCH_HAS_SYSCALL_WRAPPER and enabling it on x86_64 and arm64. This change matters to modules like LKRG where we hook syscalls and need to retrieve their arguments. Thus, LKRG needed to be updated to support Linux 4.17+ on those architectures, which Adam did with the corresponding major update on August 14, 2018. Unfortunately, the delete_module() syscall hooks were overlooked, and continued to use the old convention, which Linux 4.17+ on those architectures no longer uses. Bug detail: The affected code in LKRG is only reached when the delete_module() syscall fails, which it normally does not. This is what enabled the bug to stay unnoticed for this long. The specific discrepancy in calling conventions results in LKRG setting an unintended register to -1, which the kernel later uses as a pointer and tries to read from an offset relative to that pointer, resulting in a read from a near-NULL address (in our testing, from address 0x6f). Since nothing can normally be mapped at that address due to mmap_min_addr, this results in an instant kernel Oops, killing the process that attempted the failed delete_module() call. Reminder to users: As we write on the LKRG homepage from the very beginning and now also in CONCEPTS since LKRG 0.8: "Like any software, LKRG may contain bugs and some of those might even be new security vulnerabilities. You need to weigh the benefits vs. risks of using LKRG, considering that LKRG is most useful on systems that realistically, despite of this being a best practice for security, won't be promptly rebooted into new kernels (nor live-patched) whenever a new kernel vulnerability is discovered. LKRG is currently in an experimental stage. We expect occasional false positives [...]" Luckily, the bug's impact is typically limited to what could have been the impact of some LKRG false positives (kernel panic if that response to certain issues is enabled in the configuration), which are unfortunately the expected kind of occasional issues when using LKRG. The only additional impact we're currently aware this bug might have is lockdown bypass by root. Thus, this is more of a near-miss (or near-hit if you like) than a full-blown LKRG vulnerability. Regardless, this is a reminder to LKRG users of the risks associated with its use, and of the need to weigh the benefits against such risks. Lessons to learn for developers: This is also an opportunity for us to try and see what we could possibly have done to avoid this bug or to detect it promptly, so that we're more likely to avoid or promptly detect other bugs. The bug and it having been overlooked are in part a result of LKRG trying to support multiple and changing kernel versions while needing to be aware of those kernels' specifics. This is unavoidable without hurting LKRG's usefulness. Nevertheless, here are some points we identified: 1. Fuzzing. So far, we've been stress-testing and benchmarking LKRG with valid inputs, and we've been testing kernel vulnerability exploits, however we haven't been deliberately throwing arbitrary invalid inputs against systems with LKRG loaded. We should. Simply running Trinity as non-root might have caught this bug. Does anyone in the community possibly want to help with this going forward? 2. Limit symbol visibility. If a symbol isn't currently used from outside of a source file, we should actively break such unexpected uses. For .c files, this means use of the "static" keyword where possible (something I've been telling Adam before). For .h files (like in this case), this means either moving stuff to .c files or using a (re)naming convention where we'd indicate header-internal symbol names e.g. by the new "ph_" prefix instead of Adam's usual "p_". 3. Reduce source code duplication. Mariusz Zaborski started work on this, with some changes already included in 0.8, and we should do more. 4. Reduce source code size by other means as well, so that we'd have a better chance to notice issues in what's left. I am suggesting to Adam what functionality we might better drop from LKRG while only minimally reducing its usefulness and effectiveness against attacks. (I think a primary candidate for dropping is validation of waking-up tasks. Such validation was originally an idea I shared while we were brainstorming, but I no longer liked it when Adam started to implement it and ran into some complications.) 5. Knowledge transfer on LKRG internals and development conventions from Adam to another capable developer, so that Adam wouldn't be the only one who could have noticed a bug like this from a look at the buggy code without needing further context. We'd appreciate any comments from the lkrg-users community. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.