Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220207210223.GZ7074@brightrain.aerifal.cx>
Date: Mon, 7 Feb 2022 16:02:23 -0500
From: Rich Felker <dalias@...ifal.cx>
To: Satadru Pramanik <satadru@...il.com>
Cc: musl@...ts.openwall.com
Subject: Re: Re: musl getaddr info breakage on older kernels

On Mon, Feb 07, 2022 at 02:19:05PM -0500, Satadru Pramanik wrote:
> The test programs are being run from...
> glibc 2.23 -> bash (crosh shell)
> crosh shell -> invokes ruby -> invokes bash to run the test programs.
> 
> tcpdump on the router shows no network activity at all when running
> the test program with tcpdump -i any -vvv host (IP address)

There's reliably no network traffic when you run the test program not
under strace? Is there any difference in how you're invoking it other
than strace not being there? I'm running out of possible explanations
unless there's some hidden details we don't know about.

> When I run the test pogram with strace though I see this:
> 14:06:24.617860 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
> (17), length 56)
>     192.168.0.121.46846 > office.lan.53: [udp sum ok] 16051+ A? google.com.
> (28)
> 14:06:24.622352 IP (tos 0x0, ttl 64, id 15884, offset 0, flags [DF], proto
> UDP (17), length 72)
>     office.lan.53 > 192.168.0.121.46846: [bad udp cksum 0x8210 -> 0x7bc1!]
> 16051 q: A? google.com. 1/0/0 google.com. [1m32s] A 142.251.40.110 (44)
> 14:06:24.688610 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
> (17), length 56)
>     192.168.0.121.42267 > office.lan.53: [udp sum ok] 35406+ A? google.com.
> (28)
> 14:06:24.688931 IP (tos 0x0, ttl 64, id 15887, offset 0, flags [DF], proto
> UDP (17), length 72)
>     office.lan.53 > 192.168.0.121.42267: [bad udp cksum 0x8210 -> 0x4209!]
> 35406 q: A? google.com. 1/0/0 google.com. [1m32s] A 142.251.40.110 (44)
> 14:06:24.689018 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP
> (17), length 56)
>     192.168.0.121.42267 > office.lan.53: [udp sum ok] 13657+ AAAA?
> google.com. (28)
> 14:06:24.689186 IP (tos 0x0, ttl 64, id 15888, offset 0, flags [DF], proto
> UDP (17), length 84)
>     office.lan.53 > 192.168.0.121.42267: [bad udp cksum 0x821c -> 0xc77e!]
> 13657 q: AAAA? google.com. 1/0/0 google.com. [20s] AAAA
> 2607:f8b0:4006:80b::200e (56)
> 
> On Sun, Feb 6, 2022 at 9:40 PM Rich Felker <dalias@...ifal.cx> wrote:
> 
> > On Sun, Feb 06, 2022 at 08:29:16PM -0500, Satadru Pramanik wrote:
> > > Here are illustrative logs of output and strace logs.
> > >
> > > Note that while the musl toolchain is built in a container on a much more
> > > powerful machine, this "musl_getaddrinfo_test" app is built locally on
> > the
> > > machine with the 3.8 kernel.
> > >
> > > I ran the following to get the output on the smaller i686 machine
> > > immediately after the app is built.
> > > Apologies for the ruby code wrapping the shell commands.
> > >
> > >     @musl_ver = `#{CREW_MUSL_PREFIX}/lib/libc.so 2>&1 >/dev/null | head
> > -2
> > > | tail -1 | awk '{print $2}'`.chomp
> > >     puts 'Testing the musl resolver to see if it can resolve google.com:
> > > '.lightblue
> > >     system "./musl_getaddrinfo_test google.com set_ai_family 2>&1 |tee
> > -a
> > > /tmp/musl_#{@...l_ver}_getaddrinfo_test_google.com_set_ai_family.txt "
> > >     system "./musl_getaddrinfo_test google.com 2>&1 |tee -a
> > > /tmp/musl_#{@...l_ver}_getaddrinfo_test_google.com.txt"
> > >     system "strace -o
> > >
> > /tmp/musl_#{@...l_ver}_getaddrinfo_test_google.com_set_ai_family_STRACE.txt
> > > ../musl_getaddrinfo_test google.com set_ai_family"
> > >     system "strace -o
> > > /tmp/musl_#{@...l_ver}_getaddrinfo_test_google.com_STRACE.txt
> > > ../musl_getaddrinfo_test google.com"
> > >
> > > And here is the output for each run before running again via strace. Note
> > > how IPv6 addresses show up sporadically, and for 1.2.2 nothing at all
> > shows
> > > up, but everything works fine according to the strace logs. (Strace is
> > > built against glibc 2.23.)
> > >
> > > ==>
> > > musl_1.2.0-git-17-g33338ebc_getaddrinfo_test_google.com_set_ai_family.txt
> > > <==
> > > AF_INET: 142.251.40.110
> > >
> > > ==> musl_1.2.0-git-17-g33338ebc_getaddrinfo_test_google.com.txt <==
> > > AF_INET: 142.251.40.110
> > >
> > > ==>
> > > musl_1.2.0-git-39-g5cf1ac24_getaddrinfo_test_google.com_set_ai_family.txt
> > > <==
> > > AF_INET: 142.251.40.142
> > >
> > > ==> musl_1.2.0-git-39-g5cf1ac24_getaddrinfo_test_google.com.txt <==
> > > getaddrinfo: Try again
> > >
> > > ==>
> > > musl_1.2.0-git-40-g1b4e84c5_getaddrinfo_test_google.com_set_ai_family.txt
> > > <==
> > > AF_INET: 142.251.40.206
> > >
> > > ==> musl_1.2.0-git-40-g1b4e84c5_getaddrinfo_test_google.com.txt <==
> > > AF_INET6: 2607:f8b0:4006:81f::200e
> > > AF_INET: 142.251.40.206
> > >
> > > ==>
> > > musl_1.2.0-git-6-g2f2348c9_getaddrinfo_test_google.com_set_ai_family.txt
> > <==
> > > AF_INET: 142.250.65.206
> > >
> > > ==> musl_1.2.0-git-6-g2f2348c9_getaddrinfo_test_google.com.txt <==
> > > AF_INET: 142.250.65.206
> > >
> > > ==> musl_1.2.1_getaddrinfo_test_google.com_set_ai_family.txt <==
> > > AF_INET: 142.251.40.110
> > >
> > > ==> musl_1.2.1_getaddrinfo_test_google.com.txt <==
> > > getaddrinfo: Try again
> > >
> > > ==> musl_1.2.2_getaddrinfo_test_google.com_set_ai_family.txt <==
> > > getaddrinfo: Try again
> > >
> > > ==> musl_1.2.2_getaddrinfo_test_google.com.txt <==
> > > getaddrinfo: Try again
> > >
> > > Regards,
> >
> > OK, I don't see anything in the strace suggesting a cause. The kernel
> > version (or whether a container was used) present on the system where
> > you built musl or the test programs should make no difference
> > whatsoever; musl has no build dependencies on the host kernel or
> > kernel headers or anything like that (and doesn't even need to be
> > built on a Linux host).
> >
> > A couple questions:
> >
> > Are the test programs on the i686 machine running under Docker or any
> > other container environment?
> >
> > Can you tcpdump the traffic between the test program and the dnsmasq
> > during a failing run, with verbose display of the packet contents
> > (-vvv or something like that)?
> >
> > I don't see any plausible explanation for the result varying between
> > runs and with timing like this unless dnsmasq is doing something
> > odd/wrong. I thought it might be related to something blocking time64
> > syscalls but that doesn't seem to be the case -- according to the
> > strace logs they're getting ENOSYS as expected with fallback to the
> > legacy 32-bit clock_gettime etc. which is fine.
> >

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.