Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210816184105.GA2071@pi3.com.pl>
Date: Mon, 16 Aug 2021 20:41:05 +0200
From: Adam Zabrocki <pi3@....com.pl>
To: lkrg-users@...ts.openwall.com
Subject: Re: Re:Re: Re:deadlock happen
 on p_rb_hash[i].p_lock.lock

Hi Ethan,

I took a look at yhe stack traces. Since LKRG 0.9 we do have 
p_ttwu_do_wakeup_entry. The hook for this function was removed. In fact that's 
one of the reason I would suggest to update LKRG...

- Adam

On Fri, Aug 13, 2021 at 03:33:40PM +0800, youyan wrote:
> hi Adam
>    The deadlock issue due to hard to reproduce , it needs dozens of machines and weeks. At the same time, the machine has been mass-produced。So 
> I can not switch new lkrg code before full verity test.
>    On my machine has fellow funtion ftrace.Could you help me review?  If some situation may casue deallock? while before p_cmp_tasks have lock the rwlock,and another cpu want the rwlock to write. Thanks!!! 
>  
> 1)  awbctrl-3361  =>  kworker-3331 
>  ------------------------------------------
> 
> 
>  1)               |  p_cmp_tasks [sidkm]() {
>  1)   ==========> |
>  1)               |  gic_handle_irq() {
>  1)               |    handle_IPI() {
>  1)               |      irq_enter() {
>  1)   0.808 us    |        rcu_irq_enter();
>  1)   0.230 us    |        preempt_count_add();
>  1)   6.307 us    |      }
>  1)               |      __wake_up() {
>  1)               |        __wake_up_common_lock() {
>  1)               |          _raw_spin_lock_irqsave() {
>  1)   0.539 us    |            preempt_count_add();
>  1)   0.307 us    |            do_raw_spin_lock();
>  1)   4.731 us    |          }
>  1)               |          __wake_up_common() {
>  1)               |            autoremove_wake_function() {
>  1)               |              default_wake_function() {
>  1)               |                try_to_wake_up() {
>  1)               |                  _raw_spin_lock_irqsave() {
>  1)   0.230 us    |                    preempt_count_add();
>  1)   0.462 us    |                    do_raw_spin_lock();
>  1)   4.461 us    |                  }
>  1)               |                  select_task_rq_fair() {
>  1)   0.231 us    |                    __rcu_read_lock();
>  1)   0.270 us    |                    idle_cpu();
>  1)   0.269 us    |                    target_load();
>  1)   0.269 us    |                    source_load();
>  1)   0.346 us    |                    task_h_load();
>  1)   0.231 us    |                    idle_cpu();
>  1)   0.385 us    |                    idle_cpu();
>  1)   0.269 us    |                    idle_cpu();
>  1)   0.385 us    |                    idle_cpu();
>  1)   0.230 us    |                    __rcu_read_unlock();
>  1)   0.230 us    |                    __rcu_read_lock();
>  1)   0.230 us    |                    __rcu_read_unlock();
>  1)   0.231 us    |                    nohz_balance_exit_idle();
>  1) + 31.231 us   |                  }
>  1)   0.308 us    |                  cpus_share_cache();
>  1)               |                  _raw_spin_lock() {
>  1)   0.230 us    |                    preempt_count_add();
>  1)   0.231 us    |                    do_raw_spin_lock();
>  1)   4.346 us    |                  }
>  1)   0.423 us    |                  update_rq_clock();
>  1)               |                  ttwu_do_activate() {
>  1)               |                    activate_task() {
>  1)               |                      psi_task_change() {
>  1)   0.539 us    |                        record_times();
>  1)   3.154 us    |                      }
>  1)               |                      enqueue_task_fair() {
>  1)               |                        update_curr() {
>  1)   0.269 us    |                          update_min_vruntime();
>  1)               |                          cpuacct_charge() {
>  1)   0.577 us    |                            __rcu_read_lock();
>  1)   0.231 us    |                            __rcu_read_unlock();
>  1)   5.346 us    |                          }
>  1)   9.885 us    |                        }
>  1)   0.346 us    |                        __update_load_avg_se();
>  1)   0.385 us    |                        __update_load_avg_cfs_rq();
>  1)   0.231 us    |                        update_cfs_shares();
>  1)   0.346 us    |                        account_entity_enqueue();
>  1)   0.269 us    |                        check_spread();
>  1)   0.231 us    |                        __rcu_read_lock();
>  1)   0.231 us    |                        __rcu_read_unlock();
>  1)   0.231 us    |                        hrtick_update();
>  1) + 30.462 us   |                      }
>  1) + 37.692 us   |                    }
>  1)               |                    optimized_callback() {
>  1)               |                      opt_pre_handler() {
>  1)               |                        pre_handler_kretprobe() {
>  1)               |                          _raw_spin_lock_irqsave() {
>  1)   0.231 us    |                            preempt_count_add();
>  1)   0.461 us    |                            do_raw_spin_lock();
>  1)   4.693 us    |                          } /* _raw_spin_lock_irqsave */
>  1)               |                          _raw_spin_unlock_irqrestore() {
>  1)   0.307 us    |                            do_raw_spin_unlock();
>  1)   0.270 us    |                            preempt_count_sub();
>  1)   4.461 us    |                          }
>  1)               |                          p_ttwu_do_wakeup_entry [sidkm]() {
>  1)               |                            _raw_read_trylock() {
>  1)   0.231 us    |                              preempt_count_add();
>  1)   0.539 us    |                              do_raw_read_trylock();
>  1)   4.769 us    |                            }
>  1)               |                            p_ed_validate_from_running [sidkm]() {
>  1)               |                              p_validate_task_from_running [sidkm]() {
>  1)   0.231 us    |                                __rcu_read_lock();
>  1)   0.538 us    |                                p_rb_find_ed_pid [sidkm]();
>  1)               |                                p_cmp_tasks [sidkm]() {
>  1)   0.577 us    |                                  p_ed_pcfi_validate_sp [sidkm]();
>  1)               |                                  p_cmp_creds [sidkm]() {
> 
> 
> 
> 
> 
> 
> 
> thanks and best regards
> ethan
> 
> 
> 
> 
> 
> 
> 
> 
> At 2021-07-16 01:39:45, "Adam Zabrocki" <pi3@....com.pl> wrote:
> >Can you try LKRG from git TOT ?
> >
> >On Thu, Jul 15, 2021 at 08:52:49PM +0800, youyan wrote:
> >> Hi all
> >>      I am sorry ,do not notice picture can not direct dispaly on mail list。I also describe it in words.
> >>   cpu0 cpu1 wait for the lock ,which is holded on cpu2.
> >>   cpu2 wait kretprobe_table_locks[hash].lock which is hold cpu3
> >>   cpu3 wait for the p_rb_hash[i].p_lock.lock.
> >>   the value of p_rb_hash[i].p_lock.lock is 0x01. 0x01 also mean this lock is holded throuh read lock.
> >>    
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 在 2021-07-15 20:20:50,"youyan" <hyouyan@....com> 写道:
> >> 
> >> Hi all
> >>        I met a deadlock issue, p_rb_hash[i].p_lock.lock is not unlocked. lkrg version is 0.8,  software is android  10 ,hardware is unisoc SL8541E。
> >>  fellow picture is trace32 stack callback and register。
> >>  1:cpu 0
> >> 
> >> 
> >> 
> >> 
> >> 2:cpu1
> >> 3:cpu 2
> >> 4:cpu3 
> >> 
> >> 
> >>      Above situation,I think where use read_lock for p_rb_hash[i].p_lock.lock ,but not unlock.Or after lock,there is some code may cause schedule. Go throuh lkrg code, I can not find this situation code.
> >> Repeating this issue need at least two weeks. 
> >>     Have anybody met this similar issue??
> >> 
> >> 
> >> thanks and best regards
> >> ethan
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >>  
> >> 
> >> 
> >> 
> >> 
> >> 
> >>  
> >
> >
> >
> >
> >
> >
> >-- 
> >pi3 (pi3ki31ny) - pi3 (at) itsec pl
> >http://pi3.com.pl

-- 
pi3 (pi3ki31ny) - pi3 (at) itsec pl
http://pi3.com.pl

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.