Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190720044840.GZ1506@brightrain.aerifal.cx>
Date: Sat, 20 Jul 2019 00:48:40 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: time_t progress/findings

On Thu, Jul 18, 2019 at 04:52:37PM -0400, Rich Felker wrote:
> On Thu, Jul 18, 2019 at 12:37:45PM -0400, Rich Felker wrote:
> > Second bit of progress here: stat. First change can be done before any
> > actual time64 work is done or even decided upon: changing all the
> > stat-family functions to use fstatat (possibly with AT_EMPTY_PATH) as
> > their backend, and changing fstatat to do proper fallbacks if
> > SYS_fstatat is missing. Now there's a single point of potential stat
> > conversion rather than 4 functions.
> > 
> > Next, add an internal, arch-provided kstat type and make fstatat
> > translate from this to the public stat type. This eliminates the need
> > for all the mips*/syscall_arch.h hacks.
> 
> This step admits a few questions about how to do it best, inspired in
> part by a related question:
> 
> What should the new time64 stat structures look like?
> 
> There are at least three possible goals:
> 
> 1. Make them as clean and uniform as possible, same for all archs.
> 
> 2. Avoid increasing the size at all cost so as to maximize
>    memory-safety of mismatched interfaces between libc consumers
>    defined in terms of struct stat.
> 
> 3. Make the start of the new struct match the old struct to minimize
>    behavioral errors under mismatched interfaces between libc
>    consumers defined in terms of struct stat.
> 
> Choice 2 is pretty much out because I think it's impossible on at
> least one arch, and would impose really ugly constraints (making
> timespec 24-byte, relying on non-64bit-alignment) on others. In many
> ways choice 3 is actually more appealing, because when third-party
> libraries *do* use stat in public interfaces, it's usually understood
> that the same party both allocates and fills it in, and shares the
> contents with the other party.
> 
> There are actually 2 subvariants of choice 3: either keep exposing the
> 32-bit time in the old locations so that mismatched consumers just
> work, or fill it in with something like INT_MIN (year~=1902) so that
> breakage is caught quickly.
> 
> Now, back to kstat and the above-quoted text. If we go with option 3,
> we don't actually need a kstat struct. The existing stat syscalls just
> write into the beginning of the buffer, and then we copy the result to
> the time64 timespecs at the end that make up the new public interface.
> This results in the smallest code, and the least amount of new
> per-arch definitions. But it doesn't clean up the existing mips
> stat-translation hell (currently buried in mips*/syscall_arch.h), and
> it imposes assumptions about the relationship between kernel types and
> public libc types.
> 
> On the other hand, if we make archs define a struct kstat and always
> translate everything, the code is a bit larger, but we:
> 
> - don't impose any particular choice 1/2/3 above.
> - make it easy to cleanup the mips brokenness.
> - facilitate future musl archs/ABIs (e.g. a ".2 ABI") where userspace
>   stat has nothing to do with the legacy kernel stat structs.
> 
> So I'm leaning strongly towards just always doing the translation,
> even though I'm also leaning towards choice 3 above that won't require
> it. If nothing else, it allows me to do the prep work that will set
> the stage for time64 transition now, without having finalize the
> decisions about how time64 will look.

Another data point in favor of choice 3: libc actually has some
functions of its own that pass stat structures to callbacks: ftw and
nftw. With choice 3, these don't need any change; a legacy binary
calling them will get back stat structures it can read (with some
extra 64-bit timespecs afterwards that it's not aware of). With any
other choice, these functions would need painful replacements, and
just wrapping them is not easy because they lack a context argument to
pass through.

Since similar usage is likely common in third-party library code, I
think this is a really strong argument in favor of choice 3. FWIW the
existing glibc proposal looks like option 1, and they weren't aware of
this problem until I reported it just now.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.