|
Message-ID: <20210816184105.GA2071@pi3.com.pl> Date: Mon, 16 Aug 2021 20:41:05 +0200 From: Adam Zabrocki <pi3@....com.pl> To: lkrg-users@...ts.openwall.com Subject: Re: Re:Re: Re:deadlock happen on p_rb_hash[i].p_lock.lock Hi Ethan, I took a look at yhe stack traces. Since LKRG 0.9 we do have p_ttwu_do_wakeup_entry. The hook for this function was removed. In fact that's one of the reason I would suggest to update LKRG... - Adam On Fri, Aug 13, 2021 at 03:33:40PM +0800, youyan wrote: > hi Adam > The deadlock issue due to hard to reproduce , it needs dozens of machines and weeks. At the same time, the machine has been mass-produced。So > I can not switch new lkrg code before full verity test. > On my machine has fellow funtion ftrace.Could you help me review? If some situation may casue deallock? while before p_cmp_tasks have lock the rwlock,and another cpu want the rwlock to write. Thanks!!! > > 1) awbctrl-3361 => kworker-3331 > ------------------------------------------ > > > 1) | p_cmp_tasks [sidkm]() { > 1) ==========> | > 1) | gic_handle_irq() { > 1) | handle_IPI() { > 1) | irq_enter() { > 1) 0.808 us | rcu_irq_enter(); > 1) 0.230 us | preempt_count_add(); > 1) 6.307 us | } > 1) | __wake_up() { > 1) | __wake_up_common_lock() { > 1) | _raw_spin_lock_irqsave() { > 1) 0.539 us | preempt_count_add(); > 1) 0.307 us | do_raw_spin_lock(); > 1) 4.731 us | } > 1) | __wake_up_common() { > 1) | autoremove_wake_function() { > 1) | default_wake_function() { > 1) | try_to_wake_up() { > 1) | _raw_spin_lock_irqsave() { > 1) 0.230 us | preempt_count_add(); > 1) 0.462 us | do_raw_spin_lock(); > 1) 4.461 us | } > 1) | select_task_rq_fair() { > 1) 0.231 us | __rcu_read_lock(); > 1) 0.270 us | idle_cpu(); > 1) 0.269 us | target_load(); > 1) 0.269 us | source_load(); > 1) 0.346 us | task_h_load(); > 1) 0.231 us | idle_cpu(); > 1) 0.385 us | idle_cpu(); > 1) 0.269 us | idle_cpu(); > 1) 0.385 us | idle_cpu(); > 1) 0.230 us | __rcu_read_unlock(); > 1) 0.230 us | __rcu_read_lock(); > 1) 0.230 us | __rcu_read_unlock(); > 1) 0.231 us | nohz_balance_exit_idle(); > 1) + 31.231 us | } > 1) 0.308 us | cpus_share_cache(); > 1) | _raw_spin_lock() { > 1) 0.230 us | preempt_count_add(); > 1) 0.231 us | do_raw_spin_lock(); > 1) 4.346 us | } > 1) 0.423 us | update_rq_clock(); > 1) | ttwu_do_activate() { > 1) | activate_task() { > 1) | psi_task_change() { > 1) 0.539 us | record_times(); > 1) 3.154 us | } > 1) | enqueue_task_fair() { > 1) | update_curr() { > 1) 0.269 us | update_min_vruntime(); > 1) | cpuacct_charge() { > 1) 0.577 us | __rcu_read_lock(); > 1) 0.231 us | __rcu_read_unlock(); > 1) 5.346 us | } > 1) 9.885 us | } > 1) 0.346 us | __update_load_avg_se(); > 1) 0.385 us | __update_load_avg_cfs_rq(); > 1) 0.231 us | update_cfs_shares(); > 1) 0.346 us | account_entity_enqueue(); > 1) 0.269 us | check_spread(); > 1) 0.231 us | __rcu_read_lock(); > 1) 0.231 us | __rcu_read_unlock(); > 1) 0.231 us | hrtick_update(); > 1) + 30.462 us | } > 1) + 37.692 us | } > 1) | optimized_callback() { > 1) | opt_pre_handler() { > 1) | pre_handler_kretprobe() { > 1) | _raw_spin_lock_irqsave() { > 1) 0.231 us | preempt_count_add(); > 1) 0.461 us | do_raw_spin_lock(); > 1) 4.693 us | } /* _raw_spin_lock_irqsave */ > 1) | _raw_spin_unlock_irqrestore() { > 1) 0.307 us | do_raw_spin_unlock(); > 1) 0.270 us | preempt_count_sub(); > 1) 4.461 us | } > 1) | p_ttwu_do_wakeup_entry [sidkm]() { > 1) | _raw_read_trylock() { > 1) 0.231 us | preempt_count_add(); > 1) 0.539 us | do_raw_read_trylock(); > 1) 4.769 us | } > 1) | p_ed_validate_from_running [sidkm]() { > 1) | p_validate_task_from_running [sidkm]() { > 1) 0.231 us | __rcu_read_lock(); > 1) 0.538 us | p_rb_find_ed_pid [sidkm](); > 1) | p_cmp_tasks [sidkm]() { > 1) 0.577 us | p_ed_pcfi_validate_sp [sidkm](); > 1) | p_cmp_creds [sidkm]() { > > > > > > > > thanks and best regards > ethan > > > > > > > > > At 2021-07-16 01:39:45, "Adam Zabrocki" <pi3@....com.pl> wrote: > >Can you try LKRG from git TOT ? > > > >On Thu, Jul 15, 2021 at 08:52:49PM +0800, youyan wrote: > >> Hi all > >> I am sorry ,do not notice picture can not direct dispaly on mail list。I also describe it in words. > >> cpu0 cpu1 wait for the lock ,which is holded on cpu2. > >> cpu2 wait kretprobe_table_locks[hash].lock which is hold cpu3 > >> cpu3 wait for the p_rb_hash[i].p_lock.lock. > >> the value of p_rb_hash[i].p_lock.lock is 0x01. 0x01 also mean this lock is holded throuh read lock. > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> 在 2021-07-15 20:20:50,"youyan" <hyouyan@....com> 写道: > >> > >> Hi all > >> I met a deadlock issue, p_rb_hash[i].p_lock.lock is not unlocked. lkrg version is 0.8, software is android 10 ,hardware is unisoc SL8541E。 > >> fellow picture is trace32 stack callback and register。 > >> 1:cpu 0 > >> > >> > >> > >> > >> 2:cpu1 > >> 3:cpu 2 > >> 4:cpu3 > >> > >> > >> Above situation,I think where use read_lock for p_rb_hash[i].p_lock.lock ,but not unlock.Or after lock,there is some code may cause schedule. Go throuh lkrg code, I can not find this situation code. > >> Repeating this issue need at least two weeks. > >> Have anybody met this similar issue?? > >> > >> > >> thanks and best regards > >> ethan > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > > > > > > > > > > > > > >-- > >pi3 (pi3ki31ny) - pi3 (at) itsec pl > >http://pi3.com.pl -- pi3 (pi3ki31ny) - pi3 (at) itsec pl http://pi3.com.pl
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.