Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190726163142.GA6757@pi3.com.pl>
Date: Fri, 26 Jul 2019 18:31:42 +0200
From: Adam Zabrocki <pi3@....com.pl>
To: lkrg-users@...ts.openwall.com
Subject: Re: LKRG 0.7 CI & ED bypass

Hi,

I was managed to fix the PoC and make a repro. Original PoC is generating a 
fatal exception (on my VMs) most likely because of the #PF during user-mode 
page reference. Since int3 instruction generates kprobe exception we have #PF 
in int3 and have fatal exception. Nevertheless, I was managed to fix the PoC 
that #PF is not generated at all and then I repro entire scenario. Moreover 
I've improved PoC in a various ways that it works on a SMEP machines as well. 
However, this PoC does not leave machine in a stable state and has some 
limitations:

 - if SMEP is enabled, it works around 60%-70% of time (at least on my 
various test machines). LKRG has a chance to detect it, or to generate other 
type of crashes. 60%-70% numbers might be different, depends on the 
environment so I would not make strong assumption on that. However, it is not 
stable to work all the time.
 - 'text_mutex' is never released (to block CI) and machine is very slow:
    a. All of my machines are stuck wih 99.9+ CPU usage, e.g. %Cpu(s):  0.0 
us,100.0 sys
    b. Some of my machine are spitting OOM - depends how overloaded machine 
is
    c. You can't unload any kernel module
    d. If you try to load any kernel module, machine will freeze
    e. None of the kernel functionality which relies on that lock will work, 
e.g. tracing, perf, etc.
 - Kernel is trying to restore from the 'bad state' and trying to kill 
'stuck' threads. You are spammed in the logs with e.g.:

    Jul 25 12:10:47 pi3-ubuntu kernel: INFO: task kworker/u480:1:47 blocked for more than 120 seconds.
    Jul 25 12:10:47 pi3-ubuntu kernel:       Tainted: G           OE   4.8.0-53-generic #56~16.04.1-Ubuntu
    Jul 25 12:10:47 pi3-ubuntu kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Jul 25 12:10:47 pi3-ubuntu kernel: kworker/u480:1  D ffff8a2dff777cf8     0    47      2 0x00000000
    Jul 25 12:10:47 pi3-ubuntu kernel: Workqueue: events_unbound p_check_integrity [p_lkrg]
    Jul 25 12:10:47 pi3-ubuntu kernel:  ffff8a2dff777cf8 ffff8a2dff4d56c0 ffffffff8d60d500 ffff8a2dff4d4c40
    Jul 25 12:10:47 pi3-ubuntu kernel:  0000000000000286 ffff8a2dff778000 ffffffff8d649da4 ffff8a2dff4d4c40
    Jul 25 12:10:47 pi3-ubuntu kernel:  00000000ffffffff ffffffff8d649da8 ffff8a2dff777d10 ffffffff8d096045
    Jul 25 12:10:47 pi3-ubuntu kernel: Call Trace:
    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8d096045>] schedule+0x35/0x80
    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8d0962ee>] schedule_preempt_disabled+0xe/0x10
    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8d097f49>] __mutex_lock_slowpath+0xb9/0x130
    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8d097fdf>] mutex_lock+0x1f/0x30
    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffffc06d9c52>] p_check_integrity+0xe2/0x1360 [p_lkrg]
    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8c89d89b>] process_one_work+0x16b/0x4a0
    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8c89dc1b>] worker_thread+0x4b/0x500
    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8c89dbd0>] ? process_one_work+0x4a0/0x4a0
    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8c89dbd0>] ? process_one_work+0x4a0/0x4a0
    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8c8a3fb8>] kthread+0xd8/0xf0
    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8d09aa9f>] ret_from_fork+0x1f/0x40
    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8c8a3ee0>] ? kthread_create_on_node+0x1e0/0x1e0

   a. Depends on the kernel configuration, it might happen more or less often. 
You can configure machine to not generate that messages.
   b. Machine can also be configured to invoke panic() if task is being 'stuck' 
/ hung like in that situation. It is controled by 
"/proc/sys/kernel/hung_task_panic" interface. Some distros do enable panic on 
hung by default.

 - If you do not restore mutexes to the valid state, you machine will finally 
crash (it's is on the slow DoS path), you can also see it in the process logs 
(a lot of tasks):
    2176 root      20   0       0      0      0 R   8.6  0.0   2:29.08 kworker/u480:5
    2185 root      20   0       0      0      0 R   8.6  0.0   1:06.75 kworker/u480:11
       6 root      20   0       0      0      0 R   8.3  0.0   2:46.26 kworker/u480:0
    2178 root      20   0       0      0      0 R   8.3  0.0   2:16.42 kworker/u480:6
    2182 root      20   0       0      0      0 R   8.3  0.0   1:38.66 kworker/u480:8
    2190 root      20   0       0      0      0 R   8.3  0.0   0:54.86 kworker/u480:15
    2200 root      20   0       0      0      0 R   8.3  0.0   0:46.68 kworker/u480:25
    2207 root      20   0       0      0      0 R   8.3  0.0   0:36.62 kworker/u480:27
    2212 root      20   0       0      0      0 R   8.3  0.0   0:17.43 kworker/u480:32
    2213 root      20   0       0      0      0 R   8.3  0.0   0:27.97 kworker/u480:33
    ...
    ...
    2221 root      20   0       0      0      0 R   7.0  0.0   0:14.28 kworker/u480:41
    2233 root      20   0       0      0      0 R   7.0  0.0   0:10.17 kworker/u480:43 

We were aware about possibility of attacking synchronization mechanism at it 
is documented (e.g. here 
https://www.openwall.com/presentations/CONFidence2018-LKRG-Under-The-Hood/slide-39.html). 
How machine reacts on that type of attack, matches what I've seen during 
first LKRG developement.

LKRG's CI should verify SMEP / WP CPU bits, but currently it does not do it. 
It is wrong, so I've prepared a simple patch which verifies critical CPU 
bits, on every CPU-core, whenever CI is invoked and before any mutex/spinlock 
is taken:
 
https://bitbucket.org/Adam_pi3/lkrg-main/commits/13a9b5c3a93549b5f0ac1f8317ced3baefbfa501

This patch always stops the current PoC (on machines with SMEP).
As a workaround you can also enable /proc/sys/kernel/hung_task_panic and tune 
timeout value.

Thanks,
Adam


On 
Thu, Jul 25, 2019 at 03:25:37PM +0400, Ilya Matveychikov wrote:
> 
> 
> > On Jul 22, 2019, at 11:40 PM, Adam Zabrocki <pi3@....com.pl> wrote:
> > 
> >> CI timer is a periodic job with 15 seconds period by default so I don???t see the reason why
> >> it isn???t possible to launch the exploit when CI is not yet started. Lucky you, but it works
> >> well on my VM :-)
> > 
> > CI is not only triggered on timer. I've made a test where I've completely 
> > disabled timer, and still LKRG's CI was able to catch that. Mostly, because 
> > LKRG's CI can also be executed on the random events in the system which are 
> > generated by the nature of the bug.
> > 
> > Nevertheless, I've tried to reproduce your environment by disabling SMEP, 
> > disabling CI timer and also disabling CI on random events in the system. I 
> > still was not able to reproduce your bypass instead I'm getting critical kernel 
> > panic (usually fatal exception in interrupt). Can you share a screenshot from 
> > your tests where LKRG is running?
> 
> Here is a demo:
> https://mega.nz/#!g6gnzK4B!5VEgZA3JgnZeCwmjkhJcyf45RTDWM_yOcgW6WAqAUa8
> 
> > 
> > Thanks,
> > Adam
> > 
> > -- 
> > pi3 (pi3ki31ny) - pi3 (at) itsec pl
> > http://pi3.com.pl
> > 
> 

-- 
pi3 (pi3ki31ny) - pi3 (at) itsec pl
http://pi3.com.pl

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.