musl - Re: [RFC v3 1/1] xtensa: add port

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240507162707.GP10433@brightrain.aerifal.cx>
Date: Tue, 7 May 2024 12:27:07 -0400
From: Rich Felker <dalias@...c.org>
To: Max Filippov <jcmvbkbc@...il.com>
Cc: musl@...ts.openwall.com
Subject: Re: [RFC v3 1/1] xtensa: add port

On Tue, May 07, 2024 at 08:30:57AM -0700, Max Filippov wrote:
> On Mon, May 6, 2024 at 6:37 PM Rich Felker <dalias@...c.org> wrote:
> >
> > On Mon, May 06, 2024 at 05:40:06PM -0700, Max Filippov wrote:
> > > On Mon, May 6, 2024 at 4:58 PM Rich Felker <dalias@...c.org> wrote:
> > > >
> > > > On Mon, May 06, 2024 at 04:28:18PM -0700, Max Filippov wrote:
> > > > > On Mon, May 6, 2024 at 3:55 PM Rich Felker <dalias@...c.org> wrote:
> > > > > >
> > > > > > On Mon, May 06, 2024 at 03:40:49PM -0700, Max Filippov wrote:
> > > > > > > On Mon, May 6, 2024 at 3:15 PM Rich Felker <dalias@...c..org> wrote:
> > > > > > > >
> > > > > > > > On Mon, May 06, 2024 at 02:47:45PM -0700, Max Filippov wrote:
> > > > > > > > > On Mon, May 6, 2024 at 1:57 PM Rich Felker <dalias@...c..org> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, May 06, 2024 at 11:01:12AM -0700, Max Filippov wrote:
> > > > > > > > > > > diff --git a/arch/xtensa/reloc.h b/arch/xtensa/reloc.h
> > > > > > > > > > > new file mode 100644
> > > > > > > > > > > index 000000000000..cd7a455a2d9c
> > > > > > > > > > > --- /dev/null
> > > > > > > > > > > +++ b/arch/xtensa/reloc.h
> > > > > > > > > > > @@ -0,0 +1,32 @@
> > > > > > > > > > > +#if __FDPIC__
> > > > > > > > > > > +#define ABI_SUFFIX "-fdpic"
> > > > > > > > > > > +#else
> > > > > > > > > > > +#define ABI_SUFFIX ""
> > > > > > > > > > > +#endif
> > > > > > > > > > > +
> > > > > > > > > > > +#define LDSO_ARCH "xtensa" ABI_SUFFIX
> > > > > > > > > >
> > > > > > > > > > The ldso name is still missing endianness, if it's intended that both
> > > > > > > > > > be supported. It needs to completely identify the ABI whenever there
> > > > > > > > > > are incompatible ABI variants.
> > > > > > > > >
> > > > > > > > > For each xtensa core there's only one fixed endianness and code
> > > > > > > > > built for one xtensa core is not supposed to be used for any other
> > > > > > > > > core, so it's not an issue, right?
> > > > > > > >
> > > > > > > > Yes, it is an issue. The ldsonames for ABIs must be globally unique.
> > > > > > > > They are intended to be installable in a filesystem shared between
> > > > > > > > multiple archs, possibly even unrelated archs executed via qemu-user
> > > > > > > > or similar.
> > > > > > >
> > > > > > > That means an unbound number of libraries, one per xtensa core
> > > > > > > configuration and the solution that comes to mind is using xtensa
> > > > > > > core name as a part of ABI name. This is a bit complicated by the
> > > > > > > fact that core names are not guaranteed to be globally unique, but
> > > > > > > does that sound reasonable in general?
> > > > > >
> > > > > > Can you describe what the parameter space of core configurations is?
> > > > >
> > > > > The extensible set of architectural options plus the extensible core
> > > > > instruction set plus variable instruction encoding.
> > > > > As I said earlier the Tensilica's own approach to it is not to try to figure
> > > > > out what configurations are compatible with each other but to treat each
> > > > > configuration as a separate base ABI and have things like call0/windowed
> > > > > be the variations of that base ABI.
> > > > >
> > > > > > Does it actually make mutually incompatible ABIs? If they have the
> > > > > > same instruction encoding, endianness, calling convention, etc. they
> > > > > > should not be incompatible, but maybe I'm missing something unique to
> > > > > > how xtensa works..?
> > > > >
> > > > > No, most of them fall into one of the big groups of ABIs compatible with
> > > > > each other, but it is usually hard to say which ones, especially with the
> > > > > little information that we have as the end users. And the number of
> > > > > groups grows over time and is not limited.
> > > >
> > > > That kind of thing doesn't need different ldso names. The name needs
> > > > to identify the linkage boundary, not any ISA extensions the
> > > > application or libc/ldso might be using.
> > >
> > > I'm not sure I understand what "linkage boundary" means. A barrier that
> > > would prevent linking two pieces of code that cannot work together by
> > > design?
> > >
> > > > As an analogy, you could
> > > > build i386 musl with -march for intel/sse2 or for amd/k6/3dnow, and
> > > > these would be mutually incompatible ISA extensions, but there's
> > > > nothing incompatible about the ABI/linkage.
> > >
> > > I'm not sure how this is compatible with the
> > >
> > > > > > > > They are intended to be installable in a filesystem shared between
> > > > > > > > multiple archs, possibly even unrelated archs executed via qemu-user
> > > > > > > > or similar.
> > >
> > > If a library built for sse2 cannot run on k6 but may still have the
> > > same name, that breaks the filesystem where it is installed, either
> > > for intel or for amd, right? In that case that filesystem can only function
> > > on intel or on amd. If intel and amd are analogues for two specific
> > > xtensa core configurations there's no need to differentiate on the
> > > endianness, because each core configuration has single fixed
> > > endianness.
> >
> > It means you can install a libc that is compatible with either one by
> > refraining from building it with extensions that preclude using it
> > with both.
> 
> This part is not feasible for xtensa: there's no single base instruction
> set compatible with all xtensa cores of the same endianness.

Is there any basic primer on this I could read? I've searched around
wikipedia, qemu wiki, and various other stuff a bit but haven't found
answers to any of the questions I would consider relevant.

Some exploratory questions I'd like to be able to answer:

- If I use objdump, is it able to decode a baseline set of
  instructions without being given a specific cpu type? If not, what
  keeps it from doing so? (Policy or some kind of ambiguity?)

- If I use gas to assemble an assembly source file, does the output
  depend on a particular configured cpu model? If so, in what way?

I'm sure there are others like this I'm not thinking of at the moment,
that would help clarify.

> > > > Likewise, on arm you might
> > > > have some chips that don't support thumb and others that don't support
> > > > 32-bit arm instructions, but either way the linkage is compatible and
> > > > you can call between them on any environment that supports both.
> > >
> > > On xtensa systems one cannot choose to build little- or big-endian code
> > > for the given core like it is possible to choose whether to build FDPIC or
> > > non-FDPIC code.
> >
> > Indeed. Little- and big-endian are incompatible ABIs via how they
> > define the representation of types differently. FDPIC and non-FDPIC
> > are incompatible ABIs via how the calling convention and
> > representation of function pointers differ.
> >
> > Do these answers help clarify what linkage boundary means above?
> 
> I believe that in accordance with how Tensilica treats xtensa cores,
> core configuration should be one of the linkage boundaries, along with
> the FDPIC/non-FDPIC and call0/windowed. So ldso names would look
> like xtensa-dc233c-fdpic.

Is there actually anything about the dc233c cpu variant that makes up
part of the linkage boundary? If not, it doesn't belong in the ldso
name.

The big problem here (not unique to xtensa) is impedance mismatch
between the concepts used by the vendor responsible for the arch and
what musl does. Figuring this out is a big part of making a port
upstreamable. Given that it's been hard to communicate this, perhaps
we should work, from the musl side, on having more rigorous
formulations of the relevant definitions.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.