|
Message-ID: <87h7nva74t.fsf@oldenburg2.str.redhat.com> Date: Tue, 05 Jan 2021 13:41:06 +0100 From: Florian Weimer <fweimer@...hat.com> To: libc-coord@...ts.openwall.com Subject: Future directions for *_r functions POSIX defines a few *_r functions like getgrgid_r. glibc and other libcs implement many more such functions. I dislike these interfaces, for a couple of reasons. They are difficult to use. The caller has to perform an ERANGE dance to increase the buffer size. Not all callers do this correctly. The ERANGE dance can be very costly if getting to the point where the code discovers that the buffer is too small involves network activity. (Particularly relevant to large response sets with gethostbyname2_r, which is not in POSIX: the large response is only visible after TCP fallback.) Adding ERANGE support for *implementations* which currently lack it (e.g., they simply drop overly long input) may cause callers to discard even more data, or result in infinite loops (because an ERANGE error won't result in an advanced read pointer). The passed-in buffer allows to avoid malloc, but most implementations need to call malloc internally anyway. The buffer needs to be untyped memory for functions like getgrgid_r which needs storage space for something that is not a char array (see gr_mem in struct group). So it is not clear if one can actually avoid malloc in strictly conforming applications. Some of the _r functions are not obviously thread-safe because they have a hidden file pointer (think getgrent_r, not POSIX, but also widely implemented). I see a couple of ways forward here. We could make the non-_r variants thread-safe and document that, including a way to determine thread safety of those functions. In this case, it may make sense to add matching dup*ent and free*ent functions, to help programmers to extend the lifetime of a function result. We could add new functions variants that use malloc, similar to how getaddrinfo replaced gethostbyname in POSIX. This probably needs free*ent functions at least. For the get*ent iteration functions, we could make the file stream (or other handle) explicit, in an argument, then thread safety could be achieved by storing iteration data and buffers inside that file stream object. This will not work for interfaces where there is no such natural file stream argument, obviously. We could add the explicit handle argument to the non-iterating lookup functions, too. But I think we'd still need dup*ent and free*ent as well. The advantage would be that we could keep file descriptors open across calls, something that is not possible with the traditional functions (thread-safe or not) because too many applications assume that C libraries do not do that, and only fopen etc. keep descriptors open. Maybe there are other options. Personally, I'm leaning towards the first option (thread-safe non-_r variants plus dup*ent and free*ent helpers). That's largely based on my exposure to the current glibc implementation and the interfaces it provides to programmers. For implementations that support a Name Service Switch with loadable service modules, there is a separate question what the backing API for those modules should look like. But that can be a separate discussion, I think. Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.