|
Message-ID: <CALCETrVzugn9vUjQfHcPrHcRUJk+wgBtvhUz7U0=H2tWxZzAWg@mail.gmail.com> Date: Mon, 27 Jul 2015 18:38:08 -0700 From: Andy Lutomirski <luto@...capital.net> To: Rich Felker <dalias@...c.org> Cc: "musl@...ts.openwall.com" <musl@...ts.openwall.com>, Alexander Larsson <alexander.larsson@...il.com> Subject: Re: Re: Using direct socket syscalls on x86_32 where available? On Mon, Jul 27, 2015 at 6:21 PM, Rich Felker <dalias@...c.org> wrote: > On Mon, Jul 27, 2015 at 06:04:11PM -0700, Andy Lutomirski wrote: >> On Mon, Jul 27, 2015 at 5:45 PM, Rich Felker <dalias@...c.org> wrote: >> > On Mon, Jul 27, 2015 at 04:56:51PM -0700, Andy Lutomirski wrote: >> >> On 07/26/2015 09:59 AM, Rich Felker wrote: >> >> >On Sat, Jul 25, 2015 at 10:54:28AM -0700, Andy Lutomirski wrote: >> >> >>On x86_32, the only way to call socket(2), etc is using socketcall. >> >> >>This is slated to change in Linux 4.3: >> >> >> >> >> >>https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asm&id=9dea5dc921b5f4045a18c63eb92e84dc274d17eb >> >> >> >> >> >>If userspace adapts by preferring the direct syscalls when available, >> >> >>it'll make it easier for seccomp to filter new userspace programs >> >> >>(and, ideally, eventually disallow socketcall for sandbox-aware code). >> >> >> >> >> >>Would musl be willing to detect these syscalls and use them if available? >> >> >> >> >> >>(Code to do this probably shouldn't be committed until that change >> >> >>lands in Linus' tree, just in case the syscall numbers change in the >> >> >>mean time.) >> >> > >> >> >My preference would be not to do this, since it seems to be enlarging >> >> >the code and pessimizing normal usage for the sake of a very special >> >> >usage scenario. At the very least there would be at least one extra >> >> >syscall to probe at first usage, and that probe could generate a >> >> >termination on existing seccomp setups. :-p >> >> >> >> There will be some tiny performance benefit for newer kernels: it >> >> avoids a silly indirection that has a switch statement along six >> >> stores into memory, validation of the userspace address, and then >> >> six loads to pull the syscall args back out of memory. It's not a >> >> big deal, but the new syscalls really will be slightly faster. >> > >> > Unless you're going to try the new syscalls first and fallback on >> > ENOSYS every time... >> > >> >> >So far we don't probe and >> >> >store results for any fallbacks though; we just do the fallback on >> >> >error every time. This is because all of the existing fallbacks are in >> >> >places where we actually want new functionality a new syscall offers, >> >> >and the old ones are not able to provide it precisely but require poor >> >> >emulation, and in these cases it's expected that the user not be using >> >> >old kernels that can't give correct semantics. But in the case of >> >> >these socket calls there's no semantic difference or reason for us to >> >> >be preferring the 'new' calls. It's just a duplicate API for the same >> >> >thing. >> >> >> >> One way to implement it would be to favor the new syscalls but to >> >> set some variable the first time one of them returns ENOSYS. Once >> >> that happens, either all of them could fall back to socketcall or >> >> just that one syscall could. >> > >> > ...right, a global. Which requires a barrier to access it. A barrier >> > costs a lot more than a few loads or a switch. >> >> Not on x86, and this is as x86-specific as it gets. In fact, I bet > > Is x86 really the only arch that needs socketcall multiplexing? If so > that makes transitioning more attractive. I thought at least a few > others needed it too. > I'll try to figure out whether there are others and submit patches. >> the totally untested code below is actually safe on pretty much any >> architecture that has free C11-style relaxed loads (and this code >> could even be switched to use actual C11 relaxed loads): >> >> volatile int socket_is_okay = true; >> >> if (socket_is_okay) { >> ret = socket(...); >> if (ret < 0) { >> if (ret == -ENOSYS) { >> socket_is_okay = false; >> } else { >> errno = -ret; >> return -1; >> } >> >> return ret; >> } else { >> usual socketcall code here; >> } > > This is probably workable with volatile there. Without volatile the > x86 memory model does not help you; the compiler can make > transformations that would make it unsafe even if the machine code you > expected the compiler to generate would be safe. But I still don't > like hacks like this. It's a big mess to keep it from getting used on > non-x86 where it would be invalid/unsafe. Why's it unsafe on non-x86? I think it's safe if all those volatile accesses are replaced with standard C11 relaxed accesses. The only thing that code requires for correctness is that a relaxed read never returns a result that never was nor will be written. > >> >> Or you could just avoid implementing it and see if anyone complains. >> >> It's plausible that xdg-app might start requiring the new syscalls >> >> (although it would presumably not kill you if tried to use >> >> socketcall). >> >> >> >> Alex, if glibc started using the new syscalls, would you want to >> >> require them inside xdg-app? >> > >> > I don't see any reason to require them except forcing policy. And I >> > don't see any reason for adding them to the kernel to begin with. >> > While we would have been better off with proper syscalls for each one >> > rather than this multiplexed mess if it had been done right from the >> > beginning, having to support both is even worse than the existing >> > multiplexed socketcall. >> >> Worse for libc implementations, certainly. On the other hand, the >> ability to cleanly limit address families and such is genuinely >> useful, and deployed software does it on x86_64. It's not really >> possible with current kernels on x86_32, but, with these patches, it >> becomes possible on x86_32 as long as libc implementations play along >> and sandbox implementations are willing to force their payloads to use >> new enough libc implementations. >> >> If I were porting something like Sandstorm to x86_32 and glibc >> supported the new syscalls, this would be a no-brainer for me. I'd >> simply block socketcall entirely (returning -ENOSYS) in the container, >> and anyone providing an app that wants to use sockets has to link >> against new glibc. > > Doing that would create a hard dependency on latest glibc and latest > kernel, which would be a show-stopper for use on Debian, etc. :-) It only requires the payload to depend on the latest glibc, though, and the payload might be a binary from elsewhere. --Andy
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.