Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAASffouyCGH0SRF2MB8724nyVSBxyGQHT_Danh76Ouen-6JY0A@mail.gmail.com>
Date: Fri, 11 Jul 2025 07:25:26 +1000
From: Stephen Von Takach <steve@...ce.technology>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com, Viv Briffa <viv@...ce.technology>
Subject: Re: unlink on NFS volume fails silently

Given the unlink code is exactly the same, it must be an issue with readdir
and not an issue with unlink, as per the
https://gitlab.alpinelinux.org/alpine/aports/-/issues/10960
So the issue on musl is that we're not attempting to remove all the files.


Stephen von Takach Dukai

Engineering Lead

PlaceOS

Australia, Hong Kong, London, New York

p: +61 408 419 954

e: steve@...ce.technology


On Fri, 11 Jul 2025 at 01:44, Rich Felker <dalias@...c.org> wrote:

> On Thu, Jul 10, 2025 at 02:58:30PM +1000, Stephen Von Takach wrote:
> > Yeah I see your point and this was closed as a kernel issue:
> > https://gitlab.alpinelinux.org/alpine/aports/-/issues/10960
>
> OK, is your issue unlink falsely succeeding, or readdir skipping
> entries? The latter is a known bug in the kernel NFS client. One of my
> comments on the tracker suggests:
>
>   "The nordirplus option mentioned in one of those tracker threads
>   might be a workaround."
>
> I'm not sure if this is the case, but it might be worth trying.
>
> Note that it's *expected* that an already-in-progress iteration of a
> directory may return entries that were already deleted. The
> unacceptable thing is the opposite: when it skips some entries that
> have not been deleted as a consequence of other things being deleted.
>
> > We're running these two containers on the same kernel and seeing the same
> > behaviour as that alpine issue.
> > Happy to continue working around the issue by using debian userspace to
> > build our service.
> >
> > It does seems crazy that there is clearly an issue, possibly a kernel
> issue
> > that is being handwaved away by all parties
>
> It's not "handwaved away" by us. We have determined that there is a
> bug in a component we have no control over, and for which we have no
> sound means of working around.
>
> I'm happy to work together on tracking down the cause to get it fixed,
> but that requires cooperation from someone who's able to reproduce it,
> documenting the exact circumstances under which it occurs (NFS server
> vendor/version, NFS mount options) and either producing a minimal test
> program to reproduce the issue under those conditions, or being
> willing to run a proposed test by someone else.
>
> Even if using Debian/glibc *seems* to make things work for you, I
> think it would be beneficial for you to try to get to the root cause
> of the problem and get it fixed. What we previously found on the
> above-linked ticket was that glibc is not doing anything special that
> should rule out that bug, only that the particular filename
> sizes/counts in the test didn't trigger the bug with glibc.
>
> Again, I don't know if this is the same bug you're hitting (this is
> the first time in the thread you've mentioned readdir if I'm not
> mistaken, as opposed to just unlink) or if there's a second bug in
> play here. If you could at least clarify that, it would be a big help
> to anyone investigating it in the future.
>
> Rich
>

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.