|
Message-ID: <CAFrh3J8EPezoUEkmCkAYGC3OOcdA8eoWH-SA7v0qtZtjK=nSvA@mail.gmail.com>
Date: Thu, 17 Feb 2022 11:36:31 -0500
From: Satadru Pramanik <satadru@...il.com>
To: Rich Felker <dalias@...ifal.cx>
Cc: musl@...ts.openwall.com
Subject: Re: Re: musl getaddr info breakage on older kernels
This machine is a EOL Samsung Series 5 Chromebook
<https://www.chromium.org/chromium-os/developer-information-for-chrome-os-devices/samsung-series-5-chromebook/>
code
named Alex
<https://www.chromium.org/chromium-os/developer-information-for-chrome-os-devices/#:~:text=Series%205%20Chromebook-,Alex,-x86%2Dalex%20%26%20x86>
.
It is the target device for our i686 builds for Chromebrew.
It is running a 3.8.11 kernel, and I believe the kernel source for that is
here:
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-3.8
Getting a signed kernel update for an EOL kernel for an EOL machine is
close to impossible from Google, so we're just trying to work around these
issues in userspace to maintain some functionality for any users who may
still be using the device.
The simplest workaround possible would be ideal. It is interesting though
that the sample program works fine when built against near-stock glibc
2.23, no?
Satadru
On Thu, Feb 17, 2022 at 11:05 AM Rich Felker <dalias@...ifal.cx> wrote:
> On Thu, Feb 17, 2022 at 10:53:52AM -0500, Rich Felker wrote:
> > On Thu, Feb 17, 2022 at 09:49:45AM -0500, Satadru Pramanik wrote:
> > > Apologies for not being as familiar with gdb as I ought to be.
> > > I used the __clock_gettime64 breakpoint and did a backtrace and finish
> > > repeatedly.
> > > I couldn't figure out how to best get the timespec struct info.
> > >
> > > Alternately if you want to throw out a sample test program for me to
> build
> > > and run, and what gdb commands to run to get the right info, happy to
> do
> > > that too.
> > >
> > > gdb output is attached.
> >
> > If gdb reported it correctly, clock_gettime returned 403, which should
> > be impossible. It can only return 0 or -1. Incidentally, 403 is the
> > syscall number for SYS_clock_gettime64, which suggests your kernel is
> > simply *returning the syscall number* instead of -ENOSYS for syscalls
> > that don't exist on it. Is this a stock kernel (3.8 IIRC) or does it
> > have any sort of weird vendor patching? Any LSMs loaded?
> >
> > If you'd like to run a test just to make sure we're accurately seeing
> > what's happening, the attached should work. It should print 0 followed
> > by the current time in seconds and nanoseconds.
>
> It looks like you hit the bug introduced in commit
> 554086d85e71f30abe46fc014fea31929a7c6a8a and fixed in commit
> 8142b215501f8b291a108a202b3a053a265b03dd. It looks like, since the
> former was a CVE fix, somebody backported it to the kernel you're
> using, but they failed to backport the fix-for-the-fix, so you have a
> kernel that operates dangerously incorrectly for syscall numbers it's
> unaware of.
>
> This really needs to be fixed in the kernel if you can. On our side
> (musl) we probably need to find out if such kernels are actually out
> in the wild, and if so, whether there's any reasonable way to detect
> the false success and treat it as failure.
>
> > > On Thu, Feb 17, 2022 at 8:46 AM Rich Felker <dalias@...ifal.cx> wrote:
> > >
> > > > On Thu, Feb 17, 2022 at 08:30:47AM -0500, Satadru Pramanik wrote:
> > > > > *This is a failure:*
> > > > > tcpdump -i any -vvv host 192.168.0.115
> > > > > tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1),
> capture
> > > > > size 262144 bytes
> > > > > 08:29:38.043849 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> proto
> > > > UDP
> > > > > (17), length 56)
> > > > > 192.168.0.115.60625 > office.lan.53: [udp sum ok] 0+ A?
> google.com.
> > > > (28)
> > > > > 08:29:38.044237 IP (tos 0x0, ttl 64, id 11463, offset 0, flags
> [DF],
> > > > proto
> > > > > UDP (17), length 72)
> > > > > office.lan.53 > 192.168.0.115.60625: [bad udp cksum 0x820a ->
> > > > 0x5c7d!]
> > > > > 0 q: A? google.com. 1/0/0 google.com. [2m15s] A 142.250.80.110
> (44)
> > > > > 08:29:38.047754 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
> proto
> > > > UDP
> > > > > (17), length 56)
> > > > > 192.168.0.115.60625 > office.lan.53: [udp sum ok] 0+ AAAA?
> > > > google.com.
> > > > > (28)
> > > > > 08:29:38.048078 IP (tos 0x0, ttl 64, id 11464, offset 0, flags
> [DF],
> > > > proto
> > > > > UDP (17), length 84)
> > > > > office.lan.53 > 192.168.0.115.60625: [bad udp cksum 0x8216 ->
> > > > 0xb42f!]
> > > > > 0 q: AAAA? google.com. 1/0/0 google.com. [4m26s] AAAA
> > > > > 2607:f8b0:4006:80d::200e (56)
> > > > > 08:29:38.048955 IP (tos 0xc0, ttl 64, id 59728, offset 0, flags
> [none],
> > > > > proto ICMP (1), length 112)
> > > > > 192.168.0.115 > office.lan: ICMP 192.168.0.115 udp port 60625
> > > > > unreachable, length 92
> > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > >
> > > > OK, this shows that the client has requested both answers and the
> > > > nameserver replied almost immediately (about 0.5ms later), but when
> > > > the second reply arrives (to the AAAA), the client has already closed
> > > > the listening port, despite only a few ms having passed. The only way
> > > > I see this could happen is by "timing out". This suggests that
> > > > something is wrong with telling time.
> > > >
> > > > Can you either put a breakpoint in __clock_gettime64 (this is the
> name
> > > > you have to use for a breakpoint -- sorry I messed it up last time)
> > > > and then see what it returns when you "finish" it and what's in the
> > > > timespec struct after that? Or just write a test program to call
> > > > clock_gettime(CLOCK_REALTIME, &ts) (note: you do NOT need or want to
> > > > use the time64 symbol name here) and print the results (return value
> > > > and contents of the timespec struct).
> > > >
> > > >
> > > >
> > > > > IP (tos 0x0, ttl 64, id 11464, offset 0, flags [DF], proto
> UDP
> > > > > (17), length 84)
> > > > > office.lan.53 > 192.168.0.115.60625: [udp sum ok] 0 q: AAAA?
> > > > google.com.
> > > > > 1/0/0 google.com. [4m26s] AAAA 2607:f8b0:4006:80d::200e (56)
> > > > > 08:29:39.476101 IP (tos 0x0, ttl 64, id 12690, offset 0, flags
> [DF],
> > > > proto
> > > > > TCP (6), length 52)
> > > > > 192.168.0.115.51204 > lga34s35-in-f3.1e100.net.80: Flags [.],
> cksum
> > > > > 0xa666 (correct), seq 1466707759, ack 3358943837, win 115, options
> > > > > [nop,nop,TS val 198422160 ecr 2351261566], length 0
> > > > > 08:29:39.478914 IP (tos 0x80, ttl 122, id 6227, offset 0, flags
> [none],
> > > > > proto TCP (6), length 52)
> > > > > lga34s35-in-f3.1e100.net.80 > 192.168.0.115.51204: Flags [.],
> cksum
> > > > > 0xa5b7 (correct), seq 1, ack 1, win 282, options [nop,nop,TS val
> > > > 2351306585
> > > > > ecr 198377148], length 0
> > > > > ^C
> > > > > 7 packets captured
> > > > > 7 packets received by filter
> > > > > 0 packets dropped by kernel
> > > >
> >
> >
>
> > #include <time.h>
> > #include <stdio.h>
> > int main()
> > {
> > struct timespec ts;
> > printf("%d", clock_gettime(CLOCK_REALTIME, &ts));
> > printf(" %lld %.9ld\n", (long long)ts.tv_sec, ts.tv_nsec);
> > }
>
>
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.