|
Message-ID: <20201217002348.GO534@brightrain.aerifal.cx> Date: Wed, 16 Dec 2020 19:23:49 -0500 From: Rich Felker <dalias@...c.org> To: Jesse DeGuire <jesse.a.deguire@...il.com> Cc: musl@...ts.openwall.com Subject: Re: Musl on Cortex-M Devices On Wed, Dec 16, 2020 at 06:43:15PM -0500, Jesse DeGuire wrote: > Hey everyone, > > I'm working on putting together a Clang-based toolchain to use with > Microchip PIC32 (MIPS32) and SAM (Cortex-M) microcontrollers as an > alternative to their paid XC32 toolchain and I'd like to use Musl as > the C library. Currently, I'm trying to get it to build for a few > different Cortex-M devices and have found that Musl builds fine for > ARMv7-M, but not for ARMv6-M or v8-M Baseline because Musl uses > instructions not supported on the Thumb ISA subset used by v6-M and > v8-M Baseline devices. I'm using the v1.2.1 tag version of Musl, but > can easily switch to HEAD if needed. I am using a Python script to > build Musl (and eventually the rest of the toolchain), which you can > see on GitHub at the following link. It's a bit of a mess at the > moment, but the build_musl() function is what I'm currently using to > build Musl. I had assumed the thumb1[-ish?] subset wouldn't be interesting, but if it is, let's have a look. > https://github.com/jdeguire/buildPic32Clang/blob/master/buildPic32Clang.py > > Anyway, I have managed to get Musl to build for v6-M, v7-M, and v8-M > Baseline and have attached a diff to this email. If you like, I can go > into more detail as to why I made the changes I made; however, many > changes were merely the result of my attempts to correct errors > reported by Clang due to it encountering instruction sequences not > supported on ARMv6-M. Are there places where clang's linker is failing to make substitutions that the GNU one can do, that would make this simpler? For example I know the GNU one can replace bx rn by mov pc,rn if needed (based on a relocation the assembler emits on the insn). > A number of errors were simply the result of > ARMv6-M requiring one to use the "S" variant of an instruction that > sets status flags (such as "ADDS" vs "ADD" or "MOVS" vs "MOV"). A few > files I had to change from a "lower case s" to a "capital-S" file so > that I could use macros to check for either the Thumb1 ISA > ("__thumb2__ || !__thumb__") or for an M-Profile device > ("!__ARM_ARCH_ISA_ARM"). Is __ARM_ARCH_ISA_ARM universally available (even on old compilers)? If not this may need an alternate detection. But I'd like to drop as much as possible and just make the code compatible rather than having 2 versions of it. I don't think there are any places where the performance or size is at all relevant. > The changes under > "src/thread/arm/__set_thread_area.c" are different in that I made > those because I don't believe Cortex-M devices could handle what was > there (no M-Profile device has Coprocessor 15, for example) and so I Unless this is an ISA level that can't be executed on a normal (non-M) ARM profile, it still needs all the backends that might be needed and runtime selection of which to use. This is okay. I believe Linux for nommu ARM has a syscall for get_tp, which is rather awful but probably needs to be added as a backend. The right way to do this would have been with trap-and-emulate (of cp15) I think... > sort of punted for now and figured that a user's fault handler could > handle those "_kuser" calls given that there probably won't be a > kernel to do so. I don't think you can use the kuser_helper versions here because (1) they're at ARM addresses, not thumb ones, and (2) they're in the address range that Cortex M uses for special stuff. > Unfortunately, I'm not far enough yet in my project to build code that > would actually run on one of the micros I have with me, so I haven't > yet been able to properly try out these changes. Also, I should > mention that my eventual plan is to make a "baremetal" layer that > would provide stubs to handle syscalls or thread-related things that > you wouldn't normally have on a microcontroller, but I want to get > Musl building without that first because I think I'll be able to > utilize a lot of what's already present when I get to that point. > Hopefully, what's here is still somewhat useful as a starting point > should Musl want to support Cortex-M. I'll try to update this diff as > a result of feedback or my own eventual testing (I'm a long way from > that, though!). Thanks for the update. Please try to attach as UTF-8 rather than UTF-16 next time so it's more readable/accessible. I had to guess it was UTF-16 and switch the mime encoding headers to read it. Further comments inline below: > diff --git a/arch/arm/atomic_arch.h b/arch/arm/atomic_arch.h > index 9e3937cc..21f102ad 100644 > --- a/arch/arm/atomic_arch.h > +++ b/arch/arm/atomic_arch.h > @@ -64,11 +64,18 @@ static inline int a_cas(volatile int *p, int t, int s) > > #ifndef a_barrier > #define a_barrier a_barrier > +#if __ARM_ARCH_ISA_ARM > static inline void a_barrier() > { > register uintptr_t ip __asm__("ip") = __a_barrier_ptr; > __asm__ __volatile__( BLX " ip" : "+r"(ip) : : "memory", "cc", "lr" ); > } > +#else > +static inline void a_barrier() > +{ > + __asm__ __volatile__ ("dmb" : : : "memory"); > +} > +#endif > #endif There's a condition above with this code already; you just need to make it used for an additional case. > #define a_crash a_crash > diff --git a/arch/arm/crt_arch.h b/arch/arm/crt_arch.h > index 99508b1d..c48b14b0 100644 > --- a/arch/arm/crt_arch.h > +++ b/arch/arm/crt_arch.h > @@ -3,13 +3,25 @@ __asm__( > ".global " START " \n" > ".type " START ",%function \n" > START ": \n" > +#if defined(__thumb2__) || !defined(__thumb__) > " mov fp, #0 \n" > " mov lr, #0 \n" > +#else > +" movs a3, #0 \n" > +" mov fp, a3 \n" > +" mov lr, a3 \n" > +#endif > " ldr a2, 1f \n" > " add a2, pc, a2 \n" > " mov a1, sp \n" > +#if defined(__thumb2__) || !defined(__thumb__) > "2: and ip, a1, #-16 \n" > " mov sp, ip \n" > +#else > +"2: subs a3, #16 \n" > +" ands a1, a3 \n" > +" mov sp, a1 \n" > +#endif > " bl " START "_c \n" > ".weak _DYNAMIC \n" > ".hidden _DYNAMIC \n" This can just use the new code unconditionally. > diff --git a/arch/arm/pthread_arch.h b/arch/arm/pthread_arch.h > index e689ea21..a89fb554 100644 > --- a/arch/arm/pthread_arch.h > +++ b/arch/arm/pthread_arch.h > @@ -1,5 +1,5 @@ > -#if ((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \ > - || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7 > +#if __ARM_ARCH_ISA_ARM && (((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \ > + || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7) That's probably not the right condition. > static inline pthread_t __pthread_self() > { > diff --git a/configure b/configure > old mode 100755 > new mode 100644 > diff --git a/crt/arm/crtn.S b/crt/arm/crtn.S > index dc020f92..83e20da2 100644 > --- a/crt/arm/crtn.S > +++ b/crt/arm/crtn.S > @@ -1,9 +1,17 @@ > .syntax unified > > .section .init > +#if __ARM_ARCH_ISA_ARM > pop {r0,lr} > bx lr > +#else > + pop {r0,pc} > +#endif > > .section .fini > +#if __ARM_ARCH_ISA_ARM > pop {r0,lr} > bx lr > +#else > + pop {r0,pc} > +#endif Ideally the linker could handle this. > diff --git a/src/ldso/arm/tlsdesc.S b/src/ldso/arm/tlsdesc.S > index 3ae133c9..1843d97d 100644 > --- a/src/ldso/arm/tlsdesc.S > +++ b/src/ldso/arm/tlsdesc.S > @@ -12,13 +12,19 @@ __tlsdesc_static: > .hidden __tlsdesc_dynamic > .type __tlsdesc_dynamic,%function > __tlsdesc_dynamic: > +#if defined(__thumb2__) || !defined(__thumb__) > push {r2,r3,ip,lr} > +#else > + push {r2-r4,lr} > + mov r2,ip > + push {r2} > +#endif AFAICT ip is not special here. If it's a pain to push/pop, just use a different register. > ldr r1,[r0] > ldr r2,[r1,#4] // r2 = offset > ldr r1,[r1] // r1 = modid > > -#if ((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \ > - || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7 > +#if __ARM_ARCH_ISA_ARM && (((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \ > + || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7) > mrc p15,0,r0,c13,c0,3 > #else > ldr r0,1f > @@ -36,19 +42,34 @@ __tlsdesc_dynamic: > bx r0 > #endif > #endif > +#if defined(__thumb2__) || !defined(__thumb__) > ldr r3,[r0,#-4] // r3 = dtv > ldr ip,[r3,r1,LSL #2] > sub r0,ip,r0 > +#else > + mov r4,r0 > + subs r4,#4 > + ldr r3,[r4] > + lsls r4,r1,#2 > + ldr r4,[r3,r4] > + subs r0,r4,r0 > +#endif > add r0,r0,r2 // r0 = r3[r1]-r0+r2 > +#if defined(__thumb2__) || !defined(__thumb__) > #if __ARM_ARCH >= 5 > pop {r2,r3,ip,pc} > #else > pop {r2,r3,ip,lr} > bx lr > #endif > +#else > + pop {r2} > + mov ip,r2 > + pop {r2-r4,pc} > +#endif > > -#if ((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \ > - || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7 > +#if __ARM_ARCH_ISA_ARM && (((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \ > + || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7) > #else > .align 2 > 1: .word __a_gettp_ptr - 2b > diff --git a/src/process/arm/vfork.s b/src/process/arm/vfork.s > index d7ec41b3..b6f0260e 100644 > --- a/src/process/arm/vfork.s > +++ b/src/process/arm/vfork.s > @@ -3,7 +3,7 @@ > .type vfork,%function > vfork: > mov ip, r7 > - mov r7, 190 > + movs r7, 190 > svc 0 > mov r7, ip > .hidden __syscall_ret OK. > diff --git a/src/setjmp/arm/longjmp.S b/src/setjmp/arm/longjmp.S > index 8df0b819..745c9ba2 100644 > --- a/src/setjmp/arm/longjmp.S > +++ b/src/setjmp/arm/longjmp.S > @@ -5,6 +5,7 @@ > .type longjmp,%function > _longjmp: > longjmp: > +#if defined(__thumb2__) || !defined(__thumb__) > mov ip,r0 > movs r0,r1 > moveq r0,#1 > @@ -16,7 +17,7 @@ longjmp: > ldr r2,1f > ldr r1,[r1,r2] > > -#if __ARM_ARCH < 8 > +#if __ARM_ARCH_ISA_ARM && __ARM_ARCH < 8 > tst r1,#0x260 > beq 3f > // HWCAP_ARM_FPA > @@ -31,7 +32,7 @@ longjmp: > .fpu softvfp > .eabi_attribute 10, 0 > .eabi_attribute 27, 0 > -#if __ARM_ARCH < 8 > +#if __ARM_ARCH_ISA_ARM && __ARM_ARCH < 8 > // HWCAP_ARM_IWMMXT > 2: tst r1,#0x200 > beq 3f > @@ -42,6 +43,36 @@ longjmp: > ldcl p1, cr14, [ip], #8 > ldcl p1, cr15, [ip], #8 > #endif > +#else > + mov ip,r0 > + movs r0,r1 > + bne 4f > + movs r0,#1 > +4: mov r1,ip > + adds r1,#16 > + ldmia r1!, {r2-r7} > + mov lr,r7 > + mov sp,r6 > + mov r11,r5 > + mov r10,r4 > + mov r9,r3 > + mov r8,r2 > + mov ip,r1 > + subs r1,#40 > + ldmia r1!, {r4-r7} > + > + adr r1,1f > + ldr r2,1f > + ldr r1,[r1,r2] > + movs r2,#0x40 > + tst r1,r2 > + beq 2f > + .fpu vfp > + vldmia ip!, {d8-d15} > + .fpu softvfp > + .eabi_attribute 10, 0 > + .eabi_attribute 27, 0 > +#endif > 2: > 3: bx lr This should probably be unified somehow. > diff --git a/src/setjmp/arm/setjmp.S b/src/setjmp/arm/setjmp.S > index 45731d22..440c166d 100644 > --- a/src/setjmp/arm/setjmp.S > +++ b/src/setjmp/arm/setjmp.S > @@ -8,6 +8,7 @@ > __setjmp: > _setjmp: > setjmp: > +#if defined(__thumb2__) || !defined(__thumb__) > mov ip,r0 > stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp} > mov r2,sp > @@ -18,7 +19,7 @@ setjmp: > ldr r2,1f > ldr r1,[r1,r2] > > -#if __ARM_ARCH < 8 > +#if __ARM_ARCH_ISA_ARM && __ARM_ARCH < 8 > tst r1,#0x260 > beq 3f > // HWCAP_ARM_FPA > @@ -33,7 +34,7 @@ setjmp: > .fpu softvfp > .eabi_attribute 10, 0 > .eabi_attribute 27, 0 > -#if __ARM_ARCH < 8 > +#if __ARM_ARCH_ISA_ARM && __ARM_ARCH < 8 > // HWCAP_ARM_IWMMXT > 2: tst r1,#0x200 > beq 3f > @@ -44,6 +45,30 @@ setjmp: > stcl p1, cr14, [ip], #8 > stcl p1, cr15, [ip], #8 > #endif > +#else > + stmia r0!,{r4-r7} > + mov r1,r8 > + mov r2,r9 > + mov r3,r10 > + mov r4,r11 > + mov r5,sp > + mov r6,lr > + stmia r0!,{r1-r6} > + mov ip,r0 > + movs r0,#0 > + > + adr r1,1f > + ldr r2,1f > + ldr r1,[r1,r2] > + movs r2,#0x40 > + tst r1,r2 > + beq 2f > + .fpu vfp > + vstmia ip!, {d8-d15} > + .fpu softvfp > + .eabi_attribute 10, 0 > + .eabi_attribute 27, 0 > +#endif > 2: > 3: bx lr Same here. > diff --git a/src/signal/arm/restore.s b/src/signal/arm/restore.s > index fb086d9b..2b7621b1 100644 > --- a/src/signal/arm/restore.s > +++ b/src/signal/arm/restore.s > @@ -4,12 +4,12 @@ > .hidden __restore > .type __restore,%function > __restore: > - mov r7,#119 > + movs r7,#119 > swi 0x0 > > .global __restore_rt > .hidden __restore_rt > .type __restore_rt,%function > __restore_rt: > - mov r7,#173 > + movs r7,#173 > swi 0x0 Probably OK, unles gdb does something ridiculous wanting the exact insn sequence... > diff --git a/src/signal/arm/sigsetjmp.S b/src/signal/arm/sigsetjmp.S > index 69ebbf49..85a71666 100644 > --- a/src/signal/arm/sigsetjmp.S > +++ b/src/signal/arm/sigsetjmp.S > @@ -9,16 +9,36 @@ __sigsetjmp: > bne 1f > b setjmp > > +#if defined(__thumb2__) || !defined(__thumb__) > 1: str lr,[r0,#256] > str r4,[r0,#260+8] > +#else > +1: mov ip,r2 > + mov r2,lr > + adds r0,#200 > + str r2,[r0,#56] > + str r4,[r0,#60+8] > + subs r0,#200 > + mov r2,ip > +#endif > mov r4,r0 > > bl setjmp > > mov r1,r0 > mov r0,r4 > +#if defined(__thumb2__) || !defined(__thumb__) > ldr lr,[r0,#256] > ldr r4,[r0,#260+8] > +#else > + mov ip,r2 > + adds r0,#200 > + ldr r2,[r0,#56] > + mov lr,r2 > + ldr r4,[r0,#60+8] > + subs r0,#200 > + mov r2,ip > +#endif Ideally should be unified. > .hidden __sigsetjmp_tail > b __sigsetjmp_tail > diff --git a/src/string/arm/memcpy.S b/src/string/arm/memcpy.S > index 869e3448..c5d17533 100644 > --- a/src/string/arm/memcpy.S > +++ b/src/string/arm/memcpy.S > @@ -43,6 +43,8 @@ > * building as thumb 2 and big-endian. > */ > > +#if defined(__thumb2__) || !defined(__thumb__) > + > .syntax unified > > .global memcpy > @@ -477,3 +479,4 @@ copy_last_3_and_return: > ldmfd sp!, {r0, r4, lr} > bx lr > > +#endif // defined(__thumb2__) || !defined(__thumb__) The logic to enable the C memcpy seems to be missing. > diff --git a/src/thread/arm/__set_thread_area.c b/src/thread/arm/__set_thread_area.c > index 09de65aa..cb5d757d 100644 > --- a/src/thread/arm/__set_thread_area.c > +++ b/src/thread/arm/__set_thread_area.c > @@ -26,7 +26,11 @@ extern hidden uintptr_t __a_barrier_ptr, __a_cas_ptr, __a_gettp_ptr; > > int __set_thread_area(void *p) > { > -#if !__ARM_ARCH_7A__ && !__ARM_ARCH_7R__ && __ARM_ARCH < 7 > +#if !__ARM_ARCH_ISA_ARM > + __a_gettp_ptr = __a_gettp_kuser; > + __a_cas_ptr = __a_cas_kuser; > + __a_barrier_ptr = __a_barrier_kuser; > +#elif !__ARM_ARCH_7A__ && !__ARM_ARCH_7R__ && __ARM_ARCH < 7 > if (__hwcap & HWCAP_TLS) { > size_t *aux; > __a_cas_ptr = __a_cas_v7; HWCAP needs to be honored here. You also need to add the new logic to use the syscall for getting thread pointer, I think. > diff --git a/src/thread/arm/__unmapself.s b/src/thread/arm/__unmapself.s > index 29c2d07b..fe0406e3 100644 > --- a/src/thread/arm/__unmapself.s > +++ b/src/thread/arm/__unmapself.s > @@ -3,7 +3,7 @@ > .global __unmapself > .type __unmapself,%function > __unmapself: > - mov r7,#91 > + movs r7,#91 > svc 0 > - mov r7,#1 > + movs r7,#1 > svc 0 OK, but this file can't be used on nommu anyway. Instead the generic C version has to be used. > diff --git a/src/thread/arm/atomics.S b/src/thread/arm/atomics.S > index da50508d..d49809d9 100644 > --- a/src/thread/arm/atomics.S > +++ b/src/thread/arm/atomics.S > @@ -1,3 +1,5 @@ > +#if __ARM_ARCH_ISA_ARM > + > .syntax unified > .text > > @@ -104,3 +106,5 @@ __a_cas_ptr: > .hidden __a_gettp_ptr > __a_gettp_ptr: > .word __a_gettp_cp15 > + > +#endif This file should not be conditional. If it can't be assembled in default mode it just needs ISA level override directives. Which functions are actually called is selected at runtime. > diff --git a/src/thread/arm/clone.S b/src/thread/arm/clone.S > index bb0965da..70d29f70 100644 > --- a/src/thread/arm/clone.S > +++ b/src/thread/arm/clone.S > @@ -4,24 +4,30 @@ > .hidden __clone > .type __clone,%function > __clone: > - stmfd sp!,{r4,r5,r6,r7} > - mov r7,#120 > + push {r4,r5,r6,r7} > + movs r7,#120 > mov r6,r3 > mov r5,r0 > mov r0,r2 > +#if defined(__thumb2__) || !defined(__thumb__) > and r1,r1,#-16 > +#else > + movs r2,#0 > + subs r2,#16 > + ands r1,r2 > +#endif > ldr r2,[sp,#16] > ldr r3,[sp,#20] > ldr r4,[sp,#24] > svc 0 > tst r0,r0 > beq 1f > - ldmfd sp!,{r4,r5,r6,r7} > + pop {r4,r5,r6,r7} > bx lr > > 1: mov r0,r6 > bl 3f > -2: mov r7,#1 > +2: movs r7,#1 > svc 0 > b 2b I think this can all be unconditional, no #if's. > diff --git a/src/thread/arm/syscall_cp.S b/src/thread/arm/syscall_cp.S > index e607dd42..f53cbb3a 100644 > --- a/src/thread/arm/syscall_cp.S > +++ b/src/thread/arm/syscall_cp.S > @@ -11,7 +11,7 @@ > .type __syscall_cp_asm,%function > __syscall_cp_asm: > mov ip,sp > - stmfd sp!,{r4,r5,r6,r7} > + push {r4,r5,r6,r7} > __cp_begin: > ldr r0,[r0] > cmp r0,#0 > @@ -19,11 +19,16 @@ __cp_begin: > mov r7,r1 > mov r0,r2 > mov r1,r3 > +#if defined(__thumb2__) || !defined(__thumb__) > ldmfd ip,{r2,r3,r4,r5,r6} > +#else > + mov r2,ip > + ldmfd r2,{r2,r3,r4,r5,r6} > +#endif > svc 0 > __cp_end: > - ldmfd sp!,{r4,r5,r6,r7} > + pop {r4,r5,r6,r7} > bx lr > __cp_cancel: > - ldmfd sp!,{r4,r5,r6,r7} > + pop {r4,r5,r6,r7} > b __cancel > diff --git a/tools/install.sh b/tools/install.sh > old mode 100755 > new mode 100644 This can be unconditional, and maybe a register other than ip can be used to make it simpler..? Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.