Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20190720214602.GA1506@brightrain.aerifal.cx>
Date: Sat, 20 Jul 2019 17:46:02 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: time_t progress/findings

On Sat, Jul 20, 2019 at 12:48:40AM -0400, Rich Felker wrote:
> On Thu, Jul 18, 2019 at 04:52:37PM -0400, Rich Felker wrote:
> > On Thu, Jul 18, 2019 at 12:37:45PM -0400, Rich Felker wrote:
> > > Second bit of progress here: stat. First change can be done before any
> > > actual time64 work is done or even decided upon: changing all the
> > > stat-family functions to use fstatat (possibly with AT_EMPTY_PATH) as
> > > their backend, and changing fstatat to do proper fallbacks if
> > > SYS_fstatat is missing. Now there's a single point of potential stat
> > > conversion rather than 4 functions.
> > > 
> > > Next, add an internal, arch-provided kstat type and make fstatat
> > > translate from this to the public stat type. This eliminates the need
> > > for all the mips*/syscall_arch.h hacks.
> > 
> > This step admits a few questions about how to do it best, inspired in
> > part by a related question:
> > 
> > What should the new time64 stat structures look like?
> > 
> > There are at least three possible goals:
> > 
> > 1. Make them as clean and uniform as possible, same for all archs.
> > 
> > 2. Avoid increasing the size at all cost so as to maximize
> >    memory-safety of mismatched interfaces between libc consumers
> >    defined in terms of struct stat.
> > 
> > 3. Make the start of the new struct match the old struct to minimize
> >    behavioral errors under mismatched interfaces between libc
> >    consumers defined in terms of struct stat.
> > 
> > Choice 2 is pretty much out because I think it's impossible on at
> > least one arch, and would impose really ugly constraints (making
> > timespec 24-byte, relying on non-64bit-alignment) on others. In many
> > ways choice 3 is actually more appealing, because when third-party
> > libraries *do* use stat in public interfaces, it's usually understood
> > that the same party both allocates and fills it in, and shares the
> > contents with the other party.
> > 
> > There are actually 2 subvariants of choice 3: either keep exposing the
> > 32-bit time in the old locations so that mismatched consumers just
> > work, or fill it in with something like INT_MIN (year~=1902) so that
> > breakage is caught quickly.
> > 
> > Now, back to kstat and the above-quoted text. If we go with option 3,
> > we don't actually need a kstat struct. The existing stat syscalls just
> > write into the beginning of the buffer, and then we copy the result to
> > the time64 timespecs at the end that make up the new public interface.
> > This results in the smallest code, and the least amount of new
> > per-arch definitions. But it doesn't clean up the existing mips
> > stat-translation hell (currently buried in mips*/syscall_arch.h), and
> > it imposes assumptions about the relationship between kernel types and
> > public libc types.
> > 
> > On the other hand, if we make archs define a struct kstat and always
> > translate everything, the code is a bit larger, but we:
> > 
> > - don't impose any particular choice 1/2/3 above.
> > - make it easy to cleanup the mips brokenness.
> > - facilitate future musl archs/ABIs (e.g. a ".2 ABI") where userspace
> >   stat has nothing to do with the legacy kernel stat structs.
> > 
> > So I'm leaning strongly towards just always doing the translation,
> > even though I'm also leaning towards choice 3 above that won't require
> > it. If nothing else, it allows me to do the prep work that will set
> > the stage for time64 transition now, without having finalize the
> > decisions about how time64 will look.
> 
> Another data point in favor of choice 3: libc actually has some
> functions of its own that pass stat structures to callbacks: ftw and
> nftw. With choice 3, these don't need any change; a legacy binary
> calling them will get back stat structures it can read (with some
> extra 64-bit timespecs afterwards that it's not aware of). With any
> other choice, these functions would need painful replacements, and
> just wrapping them is not easy because they lack a context argument to
> pass through.
> 
> Since similar usage is likely common in third-party library code, I
> think this is a really strong argument in favor of choice 3. FWIW the
> existing glibc proposal looks like option 1, and they weren't aware of
> this problem until I reported it just now.

Related find: for struct rusage, we can satisfy both (2) and (3)
simultaneously. This means no new structures or symbols are needed for
getrusage, wait3, and wait4. The existing 32-bit musl structs left 16
slots for extensibility at the end, so we can just put the new 64-bit
time fields there, and still fill in the 32-bit ones too for legacy
callers.

This is basically what I was already planning for utmp, except that
it's less interesting for utmp because the functions are stubs.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.