|
Message-ID: <605e9763.18b8.177ccde87d7.Coremail.hyouyan@126.com>
Date: Tue, 23 Feb 2021 11:11:11 +0800 (CST)
From: youyan <hyouyan@....com>
To: lkrg-users@...ts.openwall.com
Subject: Re:Re: feedback about lkrg porting to qcm2150&SL8541E
android 10
Hi adam
sorry for late to reply. Feb 05, 2021 is my last work date before chinese Spring Festival。I come back work today.
1:freeze userspace timeout lead to app anr, the fellow is the diff
#include "p_lkrg_main.h" +#include <linux/suspend.h> + unsigned int log_level = 3; unsigned int heartbeat = 0; @@ -41,12 +43,14 @@ unsigned int smap_enforce = 2; unsigned int profile_validate = 3; unsigned int profile_enforce = 2; +static struct hrtimer freeze_timer; +int freeze_successful=0;
+static enum hrtimer_restart freeze_timeout_timer_func(struct hrtimer *timer)
+{
+ int ret =0;
+ if(freeze_successful==0)
+ {
+ p_print_log(P_LKRG_CRIT, "freeze proccess has some problem 00...\n");
+
+ pm_system_wakeup();
+ hrtimer_forward_now(timer, ms_to_ktime(500));
+ ret = HRTIMER_RESTART;
+ }
+ else
+ {
+ p_print_log(P_LKRG_CRIT, "freeze proccess successful 11...\n");
+ ret = HRTIMER_NORESTART;
+ }
+
+ return ret;
+}
+
+
/*
* Main entry point for the module - initialization.
*/
@@ -388,13 +413,16 @@ static int __init p_lkrg_register(void) {
- - // Freeze all non-kernel processes - while (P_SYM(p_freeze_processes)()) - schedule(); - +#endif + freeze_successful=0; + + hrtimer_init(&freeze_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + hrtimer_start(&freeze_timer, ktime_set(0, 500*1000*1000),HRTIMER_MODE_REL); + + freeze_timer.function = freeze_timeout_timer_func; + + // Freeze all non-kernel processes + while (P_SYM(p_freeze_processes)()) + { + schedule(); + } + freeze_successful=1;
}
* This function normally should never be called - unloading module cleanup
*/
static void __exit p_lkrg_deregister(void) {
+ freeze_successful=0;
+ hrtimer_start(&freeze_timer, ktime_set(0, 500*1000*1000),HRTIMER_MODE_REL);
- // Freeze all non-kernel processes
- while (P_SYM(p_freeze_processes)())
- schedule();
+ // Freeze all non-kernel processes
+ while (P_SYM(p_freeze_processes)())
+ {
+ //msleep(500);
+ p_print_log(P_LKRG_CRIT, "p_freeze_processes failed\n");
+ schedule();
+
+ }
+ freeze_successful=1;
>When this situation happens, do you retry freezing or are you bailing out?
Cancel this time freeze. restart freeze after schedule.
>Did you consider synchronizing TEE state before enforcing 'freeze' ? E.g. if
>you know when is 'safe' to execute 'freeze' onlt then perform initialization?
This solution do not fit my platform, the reason:
1: Before freeze check TEE state is idle, but during freezing TEE state maybe busy.
2: qualcomm source code,I can not modify,, at the same time qualcomm do not want to modify their code for lkrg
2: The number of exit threads exceed 40,cause kernel crash
thanks for your information.I will try latest lkrg code, but the situation is difficult to appear, if have the result ,I feedback to you.
4:Calculate kernel text and ro data, lock irq may lead some thread or interrupt can not process in time
>Would you be able to elaborate how do you do it?
change kint_validate value to 0
-unsigned int kint_validate = 3;
+unsigned int kint_validate = 0;
At 2021-02-06 02:49:18, "Adam Zabrocki" <pi3@....com.pl> wrote:
>Hi Ethan,
>
>Thanks for your report and feedback! Please find some of my comments inlined
>
>On Fri, Feb 05, 2021 at 05:06:13PM +0800, youyan wrote:
>> Hi admins
>> Thanks admins for supporting me porting lkrg to android, specify thanks Adam. After a few months stability test, LKRG already run well on my android device.
>> Now, I want to feedback the issue which I met during the period of porting and stability testing, and some fix solution which may not be a good way,just for reference。
>> 1:freeze userspace timeout lead to app anr
>> (1) At some situation,some thread block all signals(for example qcom TEE driver,use sigprocmask(SIG_SETMASK, &new_sigset, &old_sigset))
>> (2) qcom TEE driver must wait qcom TEE user application to send notify to restore the signal(sigprocmask(SIG_SETMASK, &old_sigset, NULL);)
>> (3) insmod lkrg module,code run on P_SYM(p_freeze_processes)(). which will freeze qcom TEE user application.
>> (4) above situation will lead to freeze processes timeout, timeout time is 20s. However, android anr time is 5s. So frezze timeout will lead some proccess crash.
>>
>>
>> My fix solution:
>> Before freeze processes,start hrtimer. Timer handle will be execed 500ms later. when timer handle exec will check if freeze processes is sucessful. If not sucessful,cancel freeze processes.
>> if(freeze_successful==0)
>> {
>> p_print_log(P_LKRG_CRIT, "freeze proccess has some problem 00...\n");
>> pm_system_wakeup();
>> hrtimer_forward_now(timer, ms_to_ktime(500));
>> ret = HRTIMER_RESTART;
>> }
>>
>
>When this situation happens, do you retry freezing or are you bailing out?
>Did you consider synchronizing TEE state before enforcing 'freeze' ? E.g. if
>you know when is 'safe' to execute 'freeze' onlt then perform initialization?
>
>>
>> 2: The number of exit threads exceed 40,cause kernel crash
>> (1) When system boot, or system abnormal, at this time a lot of threads exit.
>> (2)The number of exit threads exceed 40.
>> (3) A thread is running on code do exit(do_exit), at the same time, lkrg is checking all process(p_cmp_tasks),may cause kernel null pointer crash。
>>
>>
>> My fix solution:
>> temporarily increase kretprobe maxactive。
>
>
>In fact, I've changed exit() logic. If you have opportunity, can you try the
>latest LKRG from github repo and verify if you have the same issue?
>
>> 3: CONFIG_OPTPROBES=y will lead insmod lkrg module more slowly
>> when kernel config have CONFIG_OPTPROBES=y, finish insmod lkrg module will need more time.
>> My fix solution:
>> before insmod lkrg,turn off optimization by echo 0 to /proc/sys/debug/kprobes-optimization
>>
>
>Right, optimized kprobes were broken in Linux kernel for some time. We've
>managed to report and fix OPT kprobes in mainline. More about that you can read
>here:
>
>http://blog.pi3.com.pl/?p=831
>
>It is worth to add, if you don't have FTRACE compiled-in, you shouldn't have
>OPT kprobes. I'm not sure if that is smth which is acceptable from your point
>of view.
>
>>
>> 4:Calculate kernel text and ro data, lock irq may lead some thread or interrupt can not process in time
>> Calculate kernel text and ro data need 100ms(more or less) on qcm2150&SL8541E, lock irq 100ms may lead some thread or interrupt can not process in time
>> My fix solution:
>> temporarily disable Calculate kernel text and ro data.
>
>Would you be able to elaborate how do you do it?
>
>> 5:mutex_lock() lead to kernel report bug crash
>> when kernel config have CONFIG_DEBUG_ATOMIC_SLEEP=y, schedule on atmoic context may cause kernel report bug crash,for example turn off selinux(setenforce 0).
>> My fix solution:
>> not very good way at this time,write some mutex code by myself, the function just do not have schedule on atmoic context
>>
>
>Thanks for all useful information.
>I'm wondering if it is possible to share your diff and maybe some of the
>solutions can be merged to LKRG repo.
>
>Thanks,
>Adam
>
>
>>
>>
>>
>>
>>
>>
>>
>> thanks and best regards
>> ethan
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>--
>pi3 (pi3ki31ny) - pi3 (at) itsec pl
>http://pi3.com.pl
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.