|
Message-ID: <2d99ae4a.2cff.17b3e6e48e6.Coremail.hyouyan@126.com>
Date: Fri, 13 Aug 2021 15:33:40 +0800 (CST)
From: youyan <hyouyan@....com>
To: lkrg-users@...ts.openwall.com
Subject: Re:Re: Re:deadlock happen on
p_rb_hash[i].p_lock.lock
hi Adam
The deadlock issue due to hard to reproduce , it needs dozens of machines and weeks. At the same time, the machine has been mass-produced。So
I can not switch new lkrg code before full verity test.
On my machine has fellow funtion ftrace.Could you help me review? If some situation may casue deallock? while before p_cmp_tasks have lock the rwlock,and another cpu want the rwlock to write. Thanks!!!
1) awbctrl-3361 => kworker-3331
------------------------------------------
1) | p_cmp_tasks [sidkm]() {
1) ==========> |
1) | gic_handle_irq() {
1) | handle_IPI() {
1) | irq_enter() {
1) 0.808 us | rcu_irq_enter();
1) 0.230 us | preempt_count_add();
1) 6.307 us | }
1) | __wake_up() {
1) | __wake_up_common_lock() {
1) | _raw_spin_lock_irqsave() {
1) 0.539 us | preempt_count_add();
1) 0.307 us | do_raw_spin_lock();
1) 4.731 us | }
1) | __wake_up_common() {
1) | autoremove_wake_function() {
1) | default_wake_function() {
1) | try_to_wake_up() {
1) | _raw_spin_lock_irqsave() {
1) 0.230 us | preempt_count_add();
1) 0.462 us | do_raw_spin_lock();
1) 4.461 us | }
1) | select_task_rq_fair() {
1) 0.231 us | __rcu_read_lock();
1) 0.270 us | idle_cpu();
1) 0.269 us | target_load();
1) 0.269 us | source_load();
1) 0.346 us | task_h_load();
1) 0.231 us | idle_cpu();
1) 0.385 us | idle_cpu();
1) 0.269 us | idle_cpu();
1) 0.385 us | idle_cpu();
1) 0.230 us | __rcu_read_unlock();
1) 0.230 us | __rcu_read_lock();
1) 0.230 us | __rcu_read_unlock();
1) 0.231 us | nohz_balance_exit_idle();
1) + 31.231 us | }
1) 0.308 us | cpus_share_cache();
1) | _raw_spin_lock() {
1) 0.230 us | preempt_count_add();
1) 0.231 us | do_raw_spin_lock();
1) 4.346 us | }
1) 0.423 us | update_rq_clock();
1) | ttwu_do_activate() {
1) | activate_task() {
1) | psi_task_change() {
1) 0.539 us | record_times();
1) 3.154 us | }
1) | enqueue_task_fair() {
1) | update_curr() {
1) 0.269 us | update_min_vruntime();
1) | cpuacct_charge() {
1) 0.577 us | __rcu_read_lock();
1) 0.231 us | __rcu_read_unlock();
1) 5.346 us | }
1) 9.885 us | }
1) 0.346 us | __update_load_avg_se();
1) 0.385 us | __update_load_avg_cfs_rq();
1) 0.231 us | update_cfs_shares();
1) 0.346 us | account_entity_enqueue();
1) 0.269 us | check_spread();
1) 0.231 us | __rcu_read_lock();
1) 0.231 us | __rcu_read_unlock();
1) 0.231 us | hrtick_update();
1) + 30.462 us | }
1) + 37.692 us | }
1) | optimized_callback() {
1) | opt_pre_handler() {
1) | pre_handler_kretprobe() {
1) | _raw_spin_lock_irqsave() {
1) 0.231 us | preempt_count_add();
1) 0.461 us | do_raw_spin_lock();
1) 4.693 us | } /* _raw_spin_lock_irqsave */
1) | _raw_spin_unlock_irqrestore() {
1) 0.307 us | do_raw_spin_unlock();
1) 0.270 us | preempt_count_sub();
1) 4.461 us | }
1) | p_ttwu_do_wakeup_entry [sidkm]() {
1) | _raw_read_trylock() {
1) 0.231 us | preempt_count_add();
1) 0.539 us | do_raw_read_trylock();
1) 4.769 us | }
1) | p_ed_validate_from_running [sidkm]() {
1) | p_validate_task_from_running [sidkm]() {
1) 0.231 us | __rcu_read_lock();
1) 0.538 us | p_rb_find_ed_pid [sidkm]();
1) | p_cmp_tasks [sidkm]() {
1) 0.577 us | p_ed_pcfi_validate_sp [sidkm]();
1) | p_cmp_creds [sidkm]() {
thanks and best regards
ethan
At 2021-07-16 01:39:45, "Adam Zabrocki" <pi3@....com.pl> wrote:
>Can you try LKRG from git TOT ?
>
>On Thu, Jul 15, 2021 at 08:52:49PM +0800, youyan wrote:
>> Hi all
>> I am sorry ,do not notice picture can not direct dispaly on mail list。I also describe it in words.
>> cpu0 cpu1 wait for the lock ,which is holded on cpu2.
>> cpu2 wait kretprobe_table_locks[hash].lock which is hold cpu3
>> cpu3 wait for the p_rb_hash[i].p_lock.lock.
>> the value of p_rb_hash[i].p_lock.lock is 0x01. 0x01 also mean this lock is holded throuh read lock.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> 在 2021-07-15 20:20:50,"youyan" <hyouyan@....com> 写道:
>>
>> Hi all
>> I met a deadlock issue, p_rb_hash[i].p_lock.lock is not unlocked. lkrg version is 0.8, software is android 10 ,hardware is unisoc SL8541E。
>> fellow picture is trace32 stack callback and register。
>> 1:cpu 0
>>
>>
>>
>>
>> 2:cpu1
>> 3:cpu 2
>> 4:cpu3
>>
>>
>> Above situation,I think where use read_lock for p_rb_hash[i].p_lock.lock ,but not unlock.Or after lock,there is some code may cause schedule. Go throuh lkrg code, I can not find this situation code.
>> Repeating this issue need at least two weeks.
>> Have anybody met this similar issue??
>>
>>
>> thanks and best regards
>> ethan
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
>
>
>
>--
>pi3 (pi3ki31ny) - pi3 (at) itsec pl
>http://pi3.com.pl
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.