Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFrh3J-L-i4ZnXfSTEe0HRQsmVF8rtCbKZwVmyfuU_PuJHv_vA@mail.gmail.com>
Date: Mon, 14 Feb 2022 14:00:24 -0500
From: Satadru Pramanik <satadru@...il.com>
To: Rich Felker <dalias@...ifal.cx>
Cc: musl@...ts.openwall.com
Subject: Re: Re: musl getaddr info breakage on older kernels

>
>
> Are you sure the "running under strace" has no differences in how the
> test program is invoked aside from just using strace?


I'm running strace & ltrace from the command prompt. For what it is worth,
the process tree is a little weird on ChromeOS:
[image: Screenshot 2022-02-14 at 13.58.03.png]


> Rather than
> running it via the ruby machinery, can you just test under plain
> manual execution from the same shell instance for both?
>
> The previous email showed it running via manual execution from the same
shell instance, invoked without and with ltrace.
i.e. I have the same issue.


> Can you report what Docker version you're using, and try executing
> with Docker's seccomp sandboxing disabled?

docker version
Client: Docker Engine - Community
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.12
 Git commit:        e91ed57
 Built:             Mon Dec 13 11:45:41 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.12
  Git commit:       459d0df
  Built:            Mon Dec 13 11:44:05 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

This is the docker command I used, which may disable the seccomp sandboxing?

#!/bin/bash
docker pull satmandu/crewbuild:alex-i686.m58
docker pull tonistiigi/binfmt
docker run --privileged --rm tonistiigi/binfmt --install all
docker run --security-opt seccomp=unconfined  --platform linux/386
--cap-add SYS_PTRACE --rm -v $(pwd)/pkg_cache:/usr/local/tmp/packages -v
$(pwd):/output -h $(hostname)-i686 -it satmandu/crewbuild:alex-i686.m58
/usr/local/bin/setarch i686 sudo -i -u chronos /usr/local/bin/bash -i

../musl_getaddrinfo_test  google.com
AF_INET: 142.250.80.46
AF_INET6: 2607:f8b0:4006:80b::200e

Commands inside that docker invocation to test this:
crew upgrade ; yes | crew install ltrace
CREW_TESTING_REPO=https://github.com/satmandu/chromebrew.git
CREW_TESTING_BRANCH=musl_testing CREW_TESTING=1 crew update
crew upgrade
cd /usr/local/tmp
yes | crew build -k musl_getaddrinfo_test
cd crew/musl_getaddrinfo_test.*
../musl_getaddrinfo_test  google.com

When I ran that, I got this:
../musl_getaddrinfo_test  google.com
AF_INET: 142.250.80.46
AF_INET6: 2607:f8b0:4006:80b::200e


> This shouldn't happen, but
> it's plausible that your old kernel has bugs where seccomp filtering
> gets bypassed when the process is running under strace, thereby
> working around a buggy seccomp filter in Docker.
>
> Is it possible there are seccomp issues with the old kernel that just
weren't triggered by an older version of musl?


> If you know how to use gdb, you could also try setting some
> breakpoints to see what code is or isn't reached.
>
> This would be ideal. I'm still trying to get gdb built with musl on this
older setup. The gdb I have built against glibc doesn't seem to be
particularly helpful when I'm running it with this program built against
the musl libc.

Is there a static build of i686 gdb built against musl someone has
available which might be helpful here?

Satadru


> > On Mon, Feb 7, 2022 at 4:02 PM Rich Felker <dalias@...ifal.cx> wrote:
> >
> > > On Mon, Feb 07, 2022 at 02:19:05PM -0500, Satadru Pramanik wrote:
> > > > The test programs are being run from...
> > > > glibc 2.23 -> bash (crosh shell)
> > > > crosh shell -> invokes ruby -> invokes bash to run the test programs.
> > > >
> > > > tcpdump on the router shows no network activity at all when running
> > > > the test program with tcpdump -i any -vvv host (IP address)
> > >
> > > There's reliably no network traffic when you run the test program not
> > > under strace? Is there any difference in how you're invoking it other
> > > than strace not being there? I'm running out of possible explanations
> > > unless there's some hidden details we don't know about.
> > >
> > > > When I run the test pogram with strace though I see this:
> > > > 14:06:24.617860 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> proto
> > > UDP
> > > > (17), length 56)
> > > >     192.168.0.121.46846 > office.lan.53: [udp sum ok] 16051+ A?
> > > google.com.
> > > > (28)
> > > > 14:06:24.622352 IP (tos 0x0, ttl 64, id 15884, offset 0, flags [DF],
> > > proto
> > > > UDP (17), length 72)
> > > >     office.lan.53 > 192.168.0.121.46846: [bad udp cksum 0x8210 ->
> > > 0x7bc1!]
> > > > 16051 q: A? google.com. 1/0/0 google.com. [1m32s] A 142.251.40.110
> (44)
> > > > 14:06:24.688610 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> proto
> > > UDP
> > > > (17), length 56)
> > > >     192.168.0.121.42267 > office.lan.53: [udp sum ok] 35406+ A?
> > > google.com.
> > > > (28)
> > > > 14:06:24.688931 IP (tos 0x0, ttl 64, id 15887, offset 0, flags [DF],
> > > proto
> > > > UDP (17), length 72)
> > > >     office.lan.53 > 192.168.0.121.42267: [bad udp cksum 0x8210 ->
> > > 0x4209!]
> > > > 35406 q: A? google.com. 1/0/0 google.com. [1m32s] A 142.251.40.110
> (44)
> > > > 14:06:24.689018 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> proto
> > > UDP
> > > > (17), length 56)
> > > >     192.168.0.121.42267 > office.lan.53: [udp sum ok] 13657+ AAAA?
> > > > google.com. (28)
> > > > 14:06:24.689186 IP (tos 0x0, ttl 64, id 15888, offset 0, flags [DF],
> > > proto
> > > > UDP (17), length 84)
> > > >     office.lan.53 > 192.168.0.121.42267: [bad udp cksum 0x821c ->
> > > 0xc77e!]
> > > > 13657 q: AAAA? google.com. 1/0/0 google.com. [20s] AAAA
> > > > 2607:f8b0:4006:80b::200e (56)
> > > >
> > > > On Sun, Feb 6, 2022 at 9:40 PM Rich Felker <dalias@...ifal.cx>
> wrote:
> > > >
> > > > > On Sun, Feb 06, 2022 at 08:29:16PM -0500, Satadru Pramanik wrote:
> > > > > > Here are illustrative logs of output and strace logs.
> > > > > >
> > > > > > Note that while the musl toolchain is built in a container on a
> much
> > > more
> > > > > > powerful machine, this "musl_getaddrinfo_test" app is built
> locally
> > > on
> > > > > the
> > > > > > machine with the 3.8 kernel.
> > > > > >
> > > > > > I ran the following to get the output on the smaller i686 machine
> > > > > > immediately after the app is built.
> > > > > > Apologies for the ruby code wrapping the shell commands.
> > > > > >
> > > > > >     @musl_ver = `#{CREW_MUSL_PREFIX}/lib/libc.so 2>&1 >/dev/null
> |
> > > head
> > > > > -2
> > > > > > | tail -1 | awk '{print $2}'`.chomp
> > > > > >     puts 'Testing the musl resolver to see if it can resolve
> > > google.com:
> > > > > > '.lightblue
> > > > > >     system "./musl_getaddrinfo_test google.com set_ai_family
> 2>&1
> > > |tee
> > > > > -a
> > > > > >
> /tmp/musl_#{@...l_ver}_getaddrinfo_test_google.com_set_ai_family.txt
> > > "
> > > > > >     system "./musl_getaddrinfo_test google.com 2>&1 |tee -a
> > > > > > /tmp/musl_#{@...l_ver}_getaddrinfo_test_google.com.txt"
> > > > > >     system "strace -o
> > > > > >
> > > > >
> > >
> /tmp/musl_#{@...l_ver}_getaddrinfo_test_google.com_set_ai_family_STRACE.txt
> > > > > > ../musl_getaddrinfo_test google.com set_ai_family"
> > > > > >     system "strace -o
> > > > > > /tmp/musl_#{@...l_ver}_getaddrinfo_test_google.com_STRACE.txt
> > > > > > ../musl_getaddrinfo_test google.com"
> > > > > >
> > > > > > And here is the output for each run before running again via
> strace.
> > > Note
> > > > > > how IPv6 addresses show up sporadically, and for 1.2.2 nothing
> at all
> > > > > shows
> > > > > > up, but everything works fine according to the strace logs.
> (Strace
> > > is
> > > > > > built against glibc 2.23.)
> > > > > >
> > > > > > ==>
> > > > > >
> > >
> musl_1.2.0-git-17-g33338ebc_getaddrinfo_test_google.com_set_ai_family.txt
> > > > > > <==
> > > > > > AF_INET: 142.251.40.110
> > > > > >
> > > > > > ==> musl_1.2.0-git-17-g33338ebc_getaddrinfo_test_google.com.txt
> <==
> > > > > > AF_INET: 142.251.40.110
> > > > > >
> > > > > > ==>
> > > > > >
> > >
> musl_1.2.0-git-39-g5cf1ac24_getaddrinfo_test_google.com_set_ai_family.txt
> > > > > > <==
> > > > > > AF_INET: 142.251.40.142
> > > > > >
> > > > > > ==> musl_1.2.0-git-39-g5cf1ac24_getaddrinfo_test_google.com.txt
> <==
> > > > > > getaddrinfo: Try again
> > > > > >
> > > > > > ==>
> > > > > >
> > >
> musl_1.2.0-git-40-g1b4e84c5_getaddrinfo_test_google.com_set_ai_family.txt
> > > > > > <==
> > > > > > AF_INET: 142.251.40.206
> > > > > >
> > > > > > ==> musl_1.2.0-git-40-g1b4e84c5_getaddrinfo_test_google.com.txt
> <==
> > > > > > AF_INET6: 2607:f8b0:4006:81f::200e
> > > > > > AF_INET: 142.251.40.206
> > > > > >
> > > > > > ==>
> > > > > >
> > >
> musl_1.2.0-git-6-g2f2348c9_getaddrinfo_test_google.com_set_ai_family.txt
> > > > > <==
> > > > > > AF_INET: 142.250.65.206
> > > > > >
> > > > > > ==> musl_1.2.0-git-6-g2f2348c9_getaddrinfo_test_google.com.txt
> <==
> > > > > > AF_INET: 142.250.65.206
> > > > > >
> > > > > > ==> musl_1.2.1_getaddrinfo_test_google.com_set_ai_family.txt <==
> > > > > > AF_INET: 142.251.40.110
> > > > > >
> > > > > > ==> musl_1.2.1_getaddrinfo_test_google.com.txt <==
> > > > > > getaddrinfo: Try again
> > > > > >
> > > > > > ==> musl_1.2.2_getaddrinfo_test_google.com_set_ai_family.txt <==
> > > > > > getaddrinfo: Try again
> > > > > >
> > > > > > ==> musl_1.2.2_getaddrinfo_test_google.com.txt <==
> > > > > > getaddrinfo: Try again
> > > > > >
> > > > > > Regards,
> > > > >
> > > > > OK, I don't see anything in the strace suggesting a cause. The
> kernel
> > > > > version (or whether a container was used) present on the system
> where
> > > > > you built musl or the test programs should make no difference
> > > > > whatsoever; musl has no build dependencies on the host kernel or
> > > > > kernel headers or anything like that (and doesn't even need to be
> > > > > built on a Linux host).
> > > > >
> > > > > A couple questions:
> > > > >
> > > > > Are the test programs on the i686 machine running under Docker or
> any
> > > > > other container environment?
> > > > >
> > > > > Can you tcpdump the traffic between the test program and the
> dnsmasq
> > > > > during a failing run, with verbose display of the packet contents
> > > > > (-vvv or something like that)?
> > > > >
> > > > > I don't see any plausible explanation for the result varying
> between
> > > > > runs and with timing like this unless dnsmasq is doing something
> > > > > odd/wrong. I thought it might be related to something blocking
> time64
> > > > > syscalls but that doesn't seem to be the case -- according to the
> > > > > strace logs they're getting ENOSYS as expected with fallback to the
> > > > > legacy 32-bit clock_gettime etc. which is fine.
> > > > >
> > >
>
>
>

Content of type "text/html" skipped

Download attachment "Screenshot 2022-02-14 at 13.58.03.png" of type "image/png" (94083 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.