|
Message-ID: <CAFrh3J-L-i4ZnXfSTEe0HRQsmVF8rtCbKZwVmyfuU_PuJHv_vA@mail.gmail.com> Date: Mon, 14 Feb 2022 14:00:24 -0500 From: Satadru Pramanik <satadru@...il.com> To: Rich Felker <dalias@...ifal.cx> Cc: musl@...ts.openwall.com Subject: Re: Re: musl getaddr info breakage on older kernels > > > Are you sure the "running under strace" has no differences in how the > test program is invoked aside from just using strace? I'm running strace & ltrace from the command prompt. For what it is worth, the process tree is a little weird on ChromeOS: [image: Screenshot 2022-02-14 at 13.58.03.png] > Rather than > running it via the ruby machinery, can you just test under plain > manual execution from the same shell instance for both? > > The previous email showed it running via manual execution from the same shell instance, invoked without and with ltrace. i.e. I have the same issue. > Can you report what Docker version you're using, and try executing > with Docker's seccomp sandboxing disabled? docker version Client: Docker Engine - Community Version: 20.10.12 API version: 1.41 Go version: go1.16.12 Git commit: e91ed57 Built: Mon Dec 13 11:45:41 2021 OS/Arch: linux/amd64 Context: default Experimental: true Server: Docker Engine - Community Engine: Version: 20.10.12 API version: 1.41 (minimum version 1.12) Go version: go1.16.12 Git commit: 459d0df Built: Mon Dec 13 11:44:05 2021 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.12 GitCommit: 7b11cfaabd73bb80907dd23182b9347b4245eb5d runc: Version: 1.0.2 GitCommit: v1.0.2-0-g52b36a2 docker-init: Version: 0.19.0 GitCommit: de40ad0 This is the docker command I used, which may disable the seccomp sandboxing? #!/bin/bash docker pull satmandu/crewbuild:alex-i686.m58 docker pull tonistiigi/binfmt docker run --privileged --rm tonistiigi/binfmt --install all docker run --security-opt seccomp=unconfined --platform linux/386 --cap-add SYS_PTRACE --rm -v $(pwd)/pkg_cache:/usr/local/tmp/packages -v $(pwd):/output -h $(hostname)-i686 -it satmandu/crewbuild:alex-i686.m58 /usr/local/bin/setarch i686 sudo -i -u chronos /usr/local/bin/bash -i ../musl_getaddrinfo_test google.com AF_INET: 142.250.80.46 AF_INET6: 2607:f8b0:4006:80b::200e Commands inside that docker invocation to test this: crew upgrade ; yes | crew install ltrace CREW_TESTING_REPO=https://github.com/satmandu/chromebrew.git CREW_TESTING_BRANCH=musl_testing CREW_TESTING=1 crew update crew upgrade cd /usr/local/tmp yes | crew build -k musl_getaddrinfo_test cd crew/musl_getaddrinfo_test.* ../musl_getaddrinfo_test google.com When I ran that, I got this: ../musl_getaddrinfo_test google.com AF_INET: 142.250.80.46 AF_INET6: 2607:f8b0:4006:80b::200e > This shouldn't happen, but > it's plausible that your old kernel has bugs where seccomp filtering > gets bypassed when the process is running under strace, thereby > working around a buggy seccomp filter in Docker. > > Is it possible there are seccomp issues with the old kernel that just weren't triggered by an older version of musl? > If you know how to use gdb, you could also try setting some > breakpoints to see what code is or isn't reached. > > This would be ideal. I'm still trying to get gdb built with musl on this older setup. The gdb I have built against glibc doesn't seem to be particularly helpful when I'm running it with this program built against the musl libc. Is there a static build of i686 gdb built against musl someone has available which might be helpful here? Satadru > > On Mon, Feb 7, 2022 at 4:02 PM Rich Felker <dalias@...ifal.cx> wrote: > > > > > On Mon, Feb 07, 2022 at 02:19:05PM -0500, Satadru Pramanik wrote: > > > > The test programs are being run from... > > > > glibc 2.23 -> bash (crosh shell) > > > > crosh shell -> invokes ruby -> invokes bash to run the test programs. > > > > > > > > tcpdump on the router shows no network activity at all when running > > > > the test program with tcpdump -i any -vvv host (IP address) > > > > > > There's reliably no network traffic when you run the test program not > > > under strace? Is there any difference in how you're invoking it other > > > than strace not being there? I'm running out of possible explanations > > > unless there's some hidden details we don't know about. > > > > > > > When I run the test pogram with strace though I see this: > > > > 14:06:24.617860 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], > proto > > > UDP > > > > (17), length 56) > > > > 192.168.0.121.46846 > office.lan.53: [udp sum ok] 16051+ A? > > > google.com. > > > > (28) > > > > 14:06:24.622352 IP (tos 0x0, ttl 64, id 15884, offset 0, flags [DF], > > > proto > > > > UDP (17), length 72) > > > > office.lan.53 > 192.168.0.121.46846: [bad udp cksum 0x8210 -> > > > 0x7bc1!] > > > > 16051 q: A? google.com. 1/0/0 google.com. [1m32s] A 142.251.40.110 > (44) > > > > 14:06:24.688610 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], > proto > > > UDP > > > > (17), length 56) > > > > 192.168.0.121.42267 > office.lan.53: [udp sum ok] 35406+ A? > > > google.com. > > > > (28) > > > > 14:06:24.688931 IP (tos 0x0, ttl 64, id 15887, offset 0, flags [DF], > > > proto > > > > UDP (17), length 72) > > > > office.lan.53 > 192.168.0.121.42267: [bad udp cksum 0x8210 -> > > > 0x4209!] > > > > 35406 q: A? google.com. 1/0/0 google.com. [1m32s] A 142.251.40.110 > (44) > > > > 14:06:24.689018 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], > proto > > > UDP > > > > (17), length 56) > > > > 192.168.0.121.42267 > office.lan.53: [udp sum ok] 13657+ AAAA? > > > > google.com. (28) > > > > 14:06:24.689186 IP (tos 0x0, ttl 64, id 15888, offset 0, flags [DF], > > > proto > > > > UDP (17), length 84) > > > > office.lan.53 > 192.168.0.121.42267: [bad udp cksum 0x821c -> > > > 0xc77e!] > > > > 13657 q: AAAA? google.com. 1/0/0 google.com. [20s] AAAA > > > > 2607:f8b0:4006:80b::200e (56) > > > > > > > > On Sun, Feb 6, 2022 at 9:40 PM Rich Felker <dalias@...ifal.cx> > wrote: > > > > > > > > > On Sun, Feb 06, 2022 at 08:29:16PM -0500, Satadru Pramanik wrote: > > > > > > Here are illustrative logs of output and strace logs. > > > > > > > > > > > > Note that while the musl toolchain is built in a container on a > much > > > more > > > > > > powerful machine, this "musl_getaddrinfo_test" app is built > locally > > > on > > > > > the > > > > > > machine with the 3.8 kernel. > > > > > > > > > > > > I ran the following to get the output on the smaller i686 machine > > > > > > immediately after the app is built. > > > > > > Apologies for the ruby code wrapping the shell commands. > > > > > > > > > > > > @musl_ver = `#{CREW_MUSL_PREFIX}/lib/libc.so 2>&1 >/dev/null > | > > > head > > > > > -2 > > > > > > | tail -1 | awk '{print $2}'`.chomp > > > > > > puts 'Testing the musl resolver to see if it can resolve > > > google.com: > > > > > > '.lightblue > > > > > > system "./musl_getaddrinfo_test google.com set_ai_family > 2>&1 > > > |tee > > > > > -a > > > > > > > /tmp/musl_#{@...l_ver}_getaddrinfo_test_google.com_set_ai_family.txt > > > " > > > > > > system "./musl_getaddrinfo_test google.com 2>&1 |tee -a > > > > > > /tmp/musl_#{@...l_ver}_getaddrinfo_test_google.com.txt" > > > > > > system "strace -o > > > > > > > > > > > > > > > /tmp/musl_#{@...l_ver}_getaddrinfo_test_google.com_set_ai_family_STRACE.txt > > > > > > ../musl_getaddrinfo_test google.com set_ai_family" > > > > > > system "strace -o > > > > > > /tmp/musl_#{@...l_ver}_getaddrinfo_test_google.com_STRACE.txt > > > > > > ../musl_getaddrinfo_test google.com" > > > > > > > > > > > > And here is the output for each run before running again via > strace. > > > Note > > > > > > how IPv6 addresses show up sporadically, and for 1.2.2 nothing > at all > > > > > shows > > > > > > up, but everything works fine according to the strace logs. > (Strace > > > is > > > > > > built against glibc 2.23.) > > > > > > > > > > > > ==> > > > > > > > > > > musl_1.2.0-git-17-g33338ebc_getaddrinfo_test_google.com_set_ai_family.txt > > > > > > <== > > > > > > AF_INET: 142.251.40.110 > > > > > > > > > > > > ==> musl_1.2.0-git-17-g33338ebc_getaddrinfo_test_google.com.txt > <== > > > > > > AF_INET: 142.251.40.110 > > > > > > > > > > > > ==> > > > > > > > > > > musl_1.2.0-git-39-g5cf1ac24_getaddrinfo_test_google.com_set_ai_family.txt > > > > > > <== > > > > > > AF_INET: 142.251.40.142 > > > > > > > > > > > > ==> musl_1.2.0-git-39-g5cf1ac24_getaddrinfo_test_google.com.txt > <== > > > > > > getaddrinfo: Try again > > > > > > > > > > > > ==> > > > > > > > > > > musl_1.2.0-git-40-g1b4e84c5_getaddrinfo_test_google.com_set_ai_family.txt > > > > > > <== > > > > > > AF_INET: 142.251.40.206 > > > > > > > > > > > > ==> musl_1.2.0-git-40-g1b4e84c5_getaddrinfo_test_google.com.txt > <== > > > > > > AF_INET6: 2607:f8b0:4006:81f::200e > > > > > > AF_INET: 142.251.40.206 > > > > > > > > > > > > ==> > > > > > > > > > > musl_1.2.0-git-6-g2f2348c9_getaddrinfo_test_google.com_set_ai_family.txt > > > > > <== > > > > > > AF_INET: 142.250.65.206 > > > > > > > > > > > > ==> musl_1.2.0-git-6-g2f2348c9_getaddrinfo_test_google.com.txt > <== > > > > > > AF_INET: 142.250.65.206 > > > > > > > > > > > > ==> musl_1.2.1_getaddrinfo_test_google.com_set_ai_family.txt <== > > > > > > AF_INET: 142.251.40.110 > > > > > > > > > > > > ==> musl_1.2.1_getaddrinfo_test_google.com.txt <== > > > > > > getaddrinfo: Try again > > > > > > > > > > > > ==> musl_1.2.2_getaddrinfo_test_google.com_set_ai_family.txt <== > > > > > > getaddrinfo: Try again > > > > > > > > > > > > ==> musl_1.2.2_getaddrinfo_test_google.com.txt <== > > > > > > getaddrinfo: Try again > > > > > > > > > > > > Regards, > > > > > > > > > > OK, I don't see anything in the strace suggesting a cause. The > kernel > > > > > version (or whether a container was used) present on the system > where > > > > > you built musl or the test programs should make no difference > > > > > whatsoever; musl has no build dependencies on the host kernel or > > > > > kernel headers or anything like that (and doesn't even need to be > > > > > built on a Linux host). > > > > > > > > > > A couple questions: > > > > > > > > > > Are the test programs on the i686 machine running under Docker or > any > > > > > other container environment? > > > > > > > > > > Can you tcpdump the traffic between the test program and the > dnsmasq > > > > > during a failing run, with verbose display of the packet contents > > > > > (-vvv or something like that)? > > > > > > > > > > I don't see any plausible explanation for the result varying > between > > > > > runs and with timing like this unless dnsmasq is doing something > > > > > odd/wrong. I thought it might be related to something blocking > time64 > > > > > syscalls but that doesn't seem to be the case -- according to the > > > > > strace logs they're getting ENOSYS as expected with fallback to the > > > > > legacy 32-bit clock_gettime etc. which is fine. > > > > > > > > > > > Content of type "text/html" skipped Download attachment "Screenshot 2022-02-14 at 13.58.03.png" of type "image/png" (94083 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.