Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <87h7nva74t.fsf@oldenburg2.str.redhat.com>
Date: Tue, 05 Jan 2021 13:41:06 +0100
From: Florian Weimer <fweimer@...hat.com>
To: libc-coord@...ts.openwall.com
Subject: Future directions for *_r functions

POSIX defines a few *_r functions like getgrgid_r.  glibc and other
libcs implement many more such functions.

I dislike these interfaces, for a couple of reasons.

They are difficult to use.  The caller has to perform an ERANGE dance to
increase the buffer size.  Not all callers do this correctly.

The ERANGE dance can be very costly if getting to the point where the
code discovers that the buffer is too small involves network activity.
(Particularly relevant to large response sets with gethostbyname2_r,
which is not in POSIX: the large response is only visible after TCP
fallback.)

Adding ERANGE support for *implementations* which currently lack it
(e.g., they simply drop overly long input) may cause callers to discard
even more data, or result in infinite loops (because an ERANGE error
won't result in an advanced read pointer).

The passed-in buffer allows to avoid malloc, but most implementations
need to call malloc internally anyway.  The buffer needs to be untyped
memory for functions like getgrgid_r which needs storage space for
something that is not a char array (see gr_mem in struct group).  So it
is not clear if one can actually avoid malloc in strictly conforming
applications.

Some of the _r functions are not obviously thread-safe because they have
a hidden file pointer (think getgrent_r, not POSIX, but also widely
implemented).


I see a couple of ways forward here.

We could make the non-_r variants thread-safe and document that,
including a way to determine thread safety of those functions.  In this
case, it may make sense to add matching dup*ent and free*ent functions,
to help programmers to extend the lifetime of a function result.

We could add new functions variants that use malloc, similar to how
getaddrinfo replaced gethostbyname in POSIX.  This probably needs
free*ent functions at least.

For the get*ent iteration functions, we could make the file stream (or
other handle) explicit, in an argument, then thread safety could be
achieved by storing iteration data and buffers inside that file stream
object.  This will not work for interfaces where there is no such
natural file stream argument, obviously.

We could add the explicit handle argument to the non-iterating lookup
functions, too.  But I think we'd still need dup*ent and free*ent as
well.  The advantage would be that we could keep file descriptors open
across calls, something that is not possible with the traditional
functions (thread-safe or not) because too many applications assume that
C libraries do not do that, and only fopen etc. keep descriptors open.

Maybe there are other options.


Personally, I'm leaning towards the first option (thread-safe non-_r
variants plus dup*ent and free*ent helpers).  That's largely based on my
exposure to the current glibc implementation and the interfaces it
provides to programmers.

For implementations that support a Name Service Switch with loadable
service modules, there is a separate question what the backing API for
those modules should look like.  But that can be a separate discussion,
I think.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.