Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFrh3J9EfO3GLOvfNCSG7Ru=Q4goiVXXSO-vL83ukfH_3d1_tg@mail.gmail.com>
Date: Wed, 16 Feb 2022 13:37:07 -0500
From: Satadru Pramanik <satadru@...il.com>
To: Rich Felker <dalias@...ifal.cx>
Cc: musl@...ts.openwall.com
Subject: Re: Re: musl getaddr info breakage on older kernels

>
>
>
> - Whether any network traffic occurs when it fails (in the real
>   environment not a replicated one elsewhere).
>
>
There is no network traffic in the real environment.


> - Whether it fails or succeeds under strace (in the real
>   environment not a replicated one elsewhere).
>
> It succeeds in strace (in the real environment)



> - Whether the real environment involves Docker or not.
>
> The real environment does not involve docker.



> - What's in resolv.conf (in the real environment not a replicated one
>   elsewhere) and what nameserver software (if known) is running on the
>   nameserver(s) listed in there.
>
> The nameserver is picked up from dhcp. The contents of the file are as
follows:
nameserver 192.168.0.1
search lan.
options single-request timeout:1 attempts:5


> - Anything else that might be relevant.
>
> DNS server is dnsmasq running on a current OpenWRT device.


> It's really hard to offer any productive advice when the problem is
> unclear.
>
> Apologies for the confusion.
I'm really just trying to debug this getaddrinfo breakage on this older
hardware. The docker containers setup is something we use to build packages
for this hardware, and our frustration is that the software works perfectly
fine in the docker containers, but not on the hardware.

> Any other suggestions on how to track down this issue?
>
> Rather than stepping through, I would put a single breakpoint at a
> place you want to see whether execution reaches before running the
> test program, then start it and see if the breakpoint fires or not.
> Then remove the breakpoint, add a different one, and repeat. For
> example, see if __res_msend is ever called, and if so, whether
> particular lines of it are reached (or just put breakpoints on some of
> the functions it calls, like socket, bind, recvfrom, poll, etc. to see
> if they're called).
>
> It might also be useful to put a breakpoint on clock_gettime and then
> 'finish' to see what it returns (in case the problem is something
> time64-related).
>
>
The only breakpoint which fixed the execution was for line 20 (which
invokes getaddrinfo). Stepping through the __kernel_vsyscall and then
continuing is the only way it does not result in failure.

Any later breakpoints fail.

I went though the other breakpoints as requested.
clock_gettime did not fire.

Breakpoint 1 at 0x5c2f7: file ../src_musl/compat/time32/clock_gettime32.c,
line 9.
__res_msend, setsockopt also did not fire.
The ones that did fire were: socket, bind, recvfrom, poll, __res_msend_rc,
memset, sendto, __get_resolv_conf, pthread_setcancelstate,
__pthread_setcancelstate, __lookup_serv, __lookup_name, memcpy

When breaking on socket, stepping through the __kernel_vsyscall call after
socket and then continuing succeeds.

Is it possible that the socket is not waiting long enough for a response
from __kernel_vsyscall? Has that changed?
Breaking, stepping, and continuing on every other function above fails.

The gdb log is attached.

Regards,

Satadru

Content of type "text/html" skipped

Download attachment "gdb.out.txt.gz" of type "application/x-gzip" (11679 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.