|
Message-ID: <def16bfb-369f-865d-5c45-d3368415765d@efficios.com> Date: Fri, 16 Sep 2022 16:36:46 +0200 From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com> To: Florian Weimer <fw@...eb.enyo.de>, "carlos@...hat.com" <carlos@...hat.com>, libc-alpha <libc-alpha@...rceware.org>, szabolcs.nagy@....com, libc-coord@...ts.openwall.com Subject: RSEQ symbols: __rseq_size, __rseq_flags vs __rseq_feature_size Hi Florian, I wanted to clarify by email what we each have in mind with respect to exposing the RSEQ feature set available to the outside world through libc symbols. I have 3 different possible approaches in mind, shown below with 3 examples: #include <stdint.h> #undef likely #define likely(x) __builtin_expect(!!(x), 1) #undef __aligned #define __aligned(x) __attribute__((__aligned__(x))) #undef offsetof #define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER) #undef sizeof_field #define sizeof_field(TYPE, MEMBER) sizeof((((TYPE *)0)->MEMBER)) #undef offsetofend #define offsetofend(TYPE, MEMBER) \ (offsetof(TYPE, MEMBER) + sizeof_field(TYPE, MEMBER)) #define __RSEQ_FLAG_FEATURE_EXTENDED 0x2 #define __RSEQ_FLAG_FEATURE_VM_VCPU_ID 0x4 typedef uint32_t __u32; typedef uint64_t __u64; /* Original: size=32 bytes */ struct rseq_orig { uint32_t cpu_id_start; uint32_t cpu_id; uint64_t rseq_cs; uint32_t flags; uint32_t padding[3]; } __aligned(32); /* Extended */ struct rseq_ext { uint32_t cpu_id_start; uint32_t cpu_id; uint64_t rseq_cs; uint32_t flags; /* New */ uint32_t node_id; uint32_t vm_vcpu_id; uint32_t padding[1]; } __aligned(32); unsigned int __rseq_flags; unsigned int __rseq_size; unsigned int __rseq_feature_size; /* A) Check extended feature flag and size. One mask and two comparisons. */ void fA(void) { if (likely((__rseq_flags & __RSEQ_FLAG_FEATURE_EXTENDED) && __rseq_size >= offsetofend(struct rseq_ext, vm_vcpu_id))) { /* Use rseq with vcpu_id. */ asm volatile ("ud2\n\t"); } else { /* Fallback. */ asm volatile ("int3\n\t"); } } /* * B) Check rseq feature size. Feature number only limited by size of * uint32_t. One comparison. */ void fB(void) { if (likely(__rseq_feature_size >= offsetofend(struct rseq_ext, vm_vcpu_id))) { /* Use rseq with vcpu_id. */ asm volatile ("ud2\n\t"); } else { /* Fallback. */ asm volatile ("int3\n\t"); } } /* * C) Check only rseq flags. 32 features at most. One mask and one * comparison. */ void fC(void) { if (likely(__rseq_flags & __RSEQ_FLAG_FEATURE_VM_VCPU_ID)) { /* Use rseq with vcpu_id. */ asm volatile ("ud2\n\t"); } else { /* Fallback. */ asm volatile ("int3\n\t"); } Here is the resulting objdump: rseq-flags.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <fA>: 0: f6 05 00 00 00 00 02 testb $0x2,0x0(%rip) # 7 <fA+0x7> 7: 74 0f je 18 <fA+0x18> 9: 83 3d 00 00 00 00 1b cmpl $0x1b,0x0(%rip) # 10 <fA+0x10> 10: 76 06 jbe 18 <fA+0x18> 12: 0f 0b ud2 14: c3 retq 15: 0f 1f 00 nopl (%rax) 18: cc int3 19: c3 retq 1a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 0000000000000020 <fB>: 20: 83 3d 00 00 00 00 1b cmpl $0x1b,0x0(%rip) # 27 <fB+0x7> 27: 76 07 jbe 30 <fB+0x10> 29: 0f 0b ud2 2b: c3 retq 2c: 0f 1f 40 00 nopl 0x0(%rax) 30: cc int3 31: c3 retq 32: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1) 39: 00 00 00 00 3d: 0f 1f 00 nopl (%rax) 0000000000000040 <fC>: 40: f6 05 00 00 00 00 04 testb $0x4,0x0(%rip) # 47 <fC+0x7> 47: 74 07 je 50 <fC+0x10> 49: 0f 0b ud2 4b: c3 retq 4c: 0f 1f 40 00 nopl 0x0(%rax) 50: cc int3 51: c3 retq I can think of 4 approaches that applications will use to detect availability of their specific rseq feature for each rseq critical section: 1) Dynamically check whether the feature is implemented at runtime with conditional branches. Those using this approach will probably not want to have the overhead of the two comparisons in approach (A) above. Applications and libraries should probably use their own copy of the glibc symbols for speed purposes. 2) Implement the entire function as IFUNC and select whether a rseq or non-rseq implementation should be used at C startup. The tradeoff here is code size vs speed, and using IFUNC for things like malloc may add additional constraints on the startup order. 3) Code rewrite (dynamic code patching) between rseq and non-rseq code. This may be frowned upon in the security area and may not always be possible depending on the context. 3) JIT compilation of specialized rseq vs non-rseq code. Not generally available in C. I suspect that glibc may rely on approaches 1+2 depending on the situation, and many applications may use approach (1) for simplicity reasons. Ideally I would like to keep approach (1) fast, so I'd prefer to keep the check to one single conditional branch. This eliminates approach (A) and leaves approaches (B) and (C). Approach (B) has the advantage of not limiting us to 32 features, but its downside is that we need to introduce a new __rseq_feature_size symbol to the libc ABI. Approach (C) has the advantage of using __rseq_flags which is already exposed, but limits us to 32 features. Did you have in mind an approach like (A), (B) or (C) for exposing the rseq feature set or something else entirely ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.