|
Message-ID: <CALCETrWpKWv2dJkGO_Wz0nFVKAnD+=N_iP8Y8iPFgFPMhyTY-g@mail.gmail.com> Date: Mon, 27 Jul 2015 18:04:11 -0700 From: Andy Lutomirski <luto@...capital.net> To: Rich Felker <dalias@...c.org> Cc: "musl@...ts.openwall.com" <musl@...ts.openwall.com>, Alexander Larsson <alexander.larsson@...il.com> Subject: Re: Re: Using direct socket syscalls on x86_32 where available? On Mon, Jul 27, 2015 at 5:45 PM, Rich Felker <dalias@...c.org> wrote: > On Mon, Jul 27, 2015 at 04:56:51PM -0700, Andy Lutomirski wrote: >> On 07/26/2015 09:59 AM, Rich Felker wrote: >> >On Sat, Jul 25, 2015 at 10:54:28AM -0700, Andy Lutomirski wrote: >> >>On x86_32, the only way to call socket(2), etc is using socketcall. >> >>This is slated to change in Linux 4.3: >> >> >> >>https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asm&id=9dea5dc921b5f4045a18c63eb92e84dc274d17eb >> >> >> >>If userspace adapts by preferring the direct syscalls when available, >> >>it'll make it easier for seccomp to filter new userspace programs >> >>(and, ideally, eventually disallow socketcall for sandbox-aware code). >> >> >> >>Would musl be willing to detect these syscalls and use them if available? >> >> >> >>(Code to do this probably shouldn't be committed until that change >> >>lands in Linus' tree, just in case the syscall numbers change in the >> >>mean time.) >> > >> >My preference would be not to do this, since it seems to be enlarging >> >the code and pessimizing normal usage for the sake of a very special >> >usage scenario. At the very least there would be at least one extra >> >syscall to probe at first usage, and that probe could generate a >> >termination on existing seccomp setups. :-p >> >> There will be some tiny performance benefit for newer kernels: it >> avoids a silly indirection that has a switch statement along six >> stores into memory, validation of the userspace address, and then >> six loads to pull the syscall args back out of memory. It's not a >> big deal, but the new syscalls really will be slightly faster. > > Unless you're going to try the new syscalls first and fallback on > ENOSYS every time... > >> >So far we don't probe and >> >store results for any fallbacks though; we just do the fallback on >> >error every time. This is because all of the existing fallbacks are in >> >places where we actually want new functionality a new syscall offers, >> >and the old ones are not able to provide it precisely but require poor >> >emulation, and in these cases it's expected that the user not be using >> >old kernels that can't give correct semantics. But in the case of >> >these socket calls there's no semantic difference or reason for us to >> >be preferring the 'new' calls. It's just a duplicate API for the same >> >thing. >> >> One way to implement it would be to favor the new syscalls but to >> set some variable the first time one of them returns ENOSYS. Once >> that happens, either all of them could fall back to socketcall or >> just that one syscall could. > > ...right, a global. Which requires a barrier to access it. A barrier > costs a lot more than a few loads or a switch. Not on x86, and this is as x86-specific as it gets. In fact, I bet the totally untested code below is actually safe on pretty much any architecture that has free C11-style relaxed loads (and this code could even be switched to use actual C11 relaxed loads): volatile int socket_is_okay = true; if (socket_is_okay) { ret = socket(...); if (ret < 0) { if (ret == -ENOSYS) { socket_is_okay = false; } else { errno = -ret; return -1; } return ret; } else { usual socketcall code here; } > >> Or you could just avoid implementing it and see if anyone complains. >> It's plausible that xdg-app might start requiring the new syscalls >> (although it would presumably not kill you if tried to use >> socketcall). >> >> Alex, if glibc started using the new syscalls, would you want to >> require them inside xdg-app? > > I don't see any reason to require them except forcing policy. And I > don't see any reason for adding them to the kernel to begin with. > While we would have been better off with proper syscalls for each one > rather than this multiplexed mess if it had been done right from the > beginning, having to support both is even worse than the existing > multiplexed socketcall. Worse for libc implementations, certainly. On the other hand, the ability to cleanly limit address families and such is genuinely useful, and deployed software does it on x86_64. It's not really possible with current kernels on x86_32, but, with these patches, it becomes possible on x86_32 as long as libc implementations play along and sandbox implementations are willing to force their payloads to use new enough libc implementations. If I were porting something like Sandstorm to x86_32 and glibc supported the new syscalls, this would be a no-brainer for me. I'd simply block socketcall entirely (returning -ENOSYS) in the container, and anyone providing an app that wants to use sockets has to link against new glibc. Keep in mind that socket(2) with unrestricted address family is a big attack surface and is historically full of nasty vulnerabilities. --Andy
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.