|
Message-ID: <30426668-6555-4150-b6ec-e85c7e7f8ba4@linaro.org> Date: Wed, 25 Sep 2024 14:06:22 -0300 From: Adhemerval Zanella Netto <adhemerval.zanella@...aro.org> To: libc-coord@...ts.openwall.com, Schrodinger ZHU Yifan <i@...yi.fan>, Florian Weimer <fweimer@...hat.com>, "Jason A. Donenfeld" <Jason@...c4.com> Subject: Re: getrandom via vDSO On 23/09/24 18:18, Schrodinger ZHU Yifan wrote: > Hi, > > It's Yifan, a regular committer to the LLVM-libc project. > > We have recently merged the vDSO functionality into our codebase so I am checking existing syscall wrappers to apply vDSO if related symbols are feasible. > > __vdso_getrandom/__kernel_getrandom has been recently introduced into the linux kernel, I wonder what should be a proper plan to utilize it. The additional userspace state structure looks a bit scary to me. It seems to me that the state requires proper maintenance across fork and clone, and has potential security implications. This reminds me of history of pid caching, so I am hesitating on whether I should go ahead to implement getrandom optimization when the vDSO symbols are available. (P.S. this time, kernel provides desired flags that can help us to keep the region safe to some extend: https://lore.kernel.org/linux-mm/20240703183115.1075219-2-Jason@zx2c4.com/T/ <https://lore.kernel.org/linux-mm/20240703183115.1075219-2-Jason@zx2c4.com/T/>) > > Given that glibc is already working on a patch, I wonder if we have consensus on the following questions: Keep in mind that the mostly of proposed glibc [1] implementation arise from the implicit requirement to keep getrandom() async-signal-safe and fork handling. This why we have used a mmap allocator for the state tracker itself (since glibc malloc is not async-signal-safe). Both the vdso selftest [2] and a proposed Go implementation [3] uses a way simple allocation scheme. If async-signal-safeness and fork handling is really required (as I saw on your llvm-libc proposal [4]) you need to take care of some cases: 1. You need to handle function reentracy (for the SA_NODEFER case). The glibc proposal handles by 'grabbing' the per-thread allocated buffer to a local variable, and setting a sentinel value. A reentrant call will then check the per-thread buffer and fallback to syscall if it is already 'taken'. 2. Even though the state allocation uses mmap and has the reentracy handling above, for glibc we still need to handle of concurrent _Fork() (where one thread calls where another thread is allocating a state). Since _Fork() is suppose the be async-signal-safe, taking a lock if trick, and getrandom is also expected to work on the child process. We handle it with a hard hammer of blocking all signals (including internal ones). We block even internal signal for avoid further issues, although I think we can only block user-visible ones. 3. You can use either a maximum number of states or grow the state list as we did. However, you need to keep in mind that reallocate the state list in place (with mremap) is not fully fork-safe: if a fork()/_Fork() is called concurrent just before the mremap returned buffer is write on the global allocator, the program will see an inconsistent state. We handle it with by reallocate the buffer with a new mmap call, writing the state atomically, and unmap the old state tracker. 4. The thread exit state release must be done with the signal blocked, to avoid creating a new free-state block during thread release. The glibc pthread implementation does it to fix a potential race condition on pthread_kill [5]. You can not simply implement it with __cxa_thread_atexit_impl, since afaik POSIX state that signal should not be blocked when the pthread implementation call it. I think some of the requirements might be hard to implement with an overlay libc design as llvm-libc (specially the item 4.). [1] https://patchwork.sourceware.org/project/glibc/patch/20240918140500.26365-2-Jason@zx2c4.com/ [2] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/vDSO/vdso_test_getrandom.c [3] https://go-review.googlesource.com/c/go/+/614835 [4] https://github.com/llvm/llvm-project/pull/109870 [5] https://sourceware.org/bugzilla/show_bug.cgi?id=12889 > > 1. Is there a clear performance demand such that it becomes necessarily to avoid syscall inside |getrandom|? > 2. Is it certain that such performance demand will pay off given all the additional complexity required to maintain per-thread/per-process state (with async-signal-safety)? > 3. Is it clear that unwrapped clone/fork and other edge cases will not complicate the situation? > Besides Florian points, the performance difference is really abysmal for small buffer (like uint32_t or uint64_t), where syscall overhead dominates. Also, the vgetrandom provides the same security guarantees as the syscall ones, so we can be used a CSRNG (Jason can give you more details). So you need to balance the runtime requirements, along on where to deploy. Besides the code complexity of state and opaque state management, the vDSO requires more memory than syscall (although the VM_DROPPABLE is optimized for memory pressure situations, it still requires page tables entries). > > Best, > Yifan > > image > Schrodinger ZHU Yifan, Ph.D. Student > Computer Science Department, University of Rochester > > *Personal Email:* i@...yi.fan > *Work Email:* yifanzhu@...hester.edu > *Website:* https://www.cs.rochester.edu/~yzhu104/Main.html > *Github:* SchrodingerZhu > *GPG Fingerprint:* BA02CBEB8CB5D8181E9368304D2CC545A78DBCC3
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.