musl - Re: [PATCH] split __libc_start_main.c into two files (Wasm)

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20171219210337.GU1627@brightrain.aerifal.cx>
Date: Tue, 19 Dec 2017 16:03:37 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] split __libc_start_main.c into two files (Wasm)

On Tue, Dec 19, 2017 at 05:46:22PM +0000, Nicholas Wilson wrote:
> On 19 December 2017 15:56, Rich Felker wrote:
> > This is not ELF-specific, and it's not different from the discussion
> > we had before. You seem to be under the impression that exit "gets
> > called" because __libc_start_main is called. This is not true. exit is
> > only called if main returns. (In fact, if you have LTO, the linker
> > will even optimize out exit like you wanted if main does not return.)
> 
> From our point of view, exit being called *is* a consequence of
> __libc_start_main. Everything must return; there's no way to hold up
> the JavaScript main loop in a browser (unless you spin at 100% CPU
> and freeze the entire browser page, which is totally unacceptable).
> Wasm is run in a single-threaded asynchronous environment, really
> just like normal JavaScript, where every function is non-blocking.

OK. That's why I asked if you intend for main not to be called at all.
My idea was that main would be called, but would be expected to go
into some sort of event loop that yields (implemented by recording
state and returning back to JS control), allowing execution of an
actual C application in the browser JS context. But it seems at
present you're only interested in use of library code written in C.

> >> However, given that we need to call __init_libc
> >> directly anyway, we may as well save some code and just register
> >> __init_libc with the Wasm start mechanism.
> > From my perspective, doing things in gratuitously arch-dependent ways
> > to "save some code" doesn't make sense when you're trading a small
> > (trivial relative size) amount of code for a permanent interface
> > boundary and maintenance burden.
> 
> I mean, we have to call __init_libc directly anyway, so may as well
> just depend on a single internal interface, rather than using a
> second internal interface as well like __libc_start_init. By "short"
> I mean "use the absolute minimum of internal Musl dependencies".
> Because of the requirement not to call exit, I can't see a way of
> not calling *any* internal Musl functions, so getting it down to
> just __init_libc is the least-intrusive we can be.

Yes. Calling just __init_libc is what you would do in that case, and I
think it works as long as there are no interface boundaries subject to
version mismatch. That is, if you're ok with the situation where, if
the internal contract for __init_libc changed or if it got refactored
differently, the corresponding wasm glue would have to be changed
accordingly, and they're always static-linked together as part of the
same module so that you couldn't end up with mismatched versions.

> Responding to Szabolcs:
> 
> On 19 December 2017 15:27, Szabolcs Nagy wrote:
> > the correctness of the runtime is only guaranteed if you
> > go via the symbol __libc_start_main otherwise future
> > changes to that function can easily break your wasm port.
> 
> Yes, we'll have to make sure each time we update our fork that it
> still works. That's acceptable to me, we understand that the
> dependency is purely internal to libc, it's a not a public interface
> that Musl has to support forever. I accept that as long as we have a
> fork, we risk having to do some maintenance whenever we update to
> the latest Musl release. (But, with many people at Google and
> Mozilla working on Wasm long-term, I'm not worried that Wasm support
> will stagnate.)
> 
> That's why I was hoping to eventually merge the Wasm support
> upstream, so that changes to interfaces used internally within Musl
> can take Wasm into account, rather than risk using internal
> interfaces without your knowledge.

I think merging upstream is unlikely, at least in the short term, for
at least two reasons:

1. Cost (to me as maintainer). Having it upstream implies a burden to
   ensure that changes to musl proper don't introduce regressions to
   any of the supported archs, which requires ability to test, etc.
   Need for specialized tooling to build and test, along with poking
   through some interface boundaries (__init_libc), make these issues
   rather more impactful than for an average arch. On the other hand
   if the wasm port is treated as a consumer of musl releases, it's
   easy to have any of those issues resolved after the release, when
   syncing up to a new upstream.

2. Policy/scope. The stated (e.g. on website) principle of musl arch
   support is that (short of fenv which is only hardfloat and
   differences in long double type and ILP32 vs LP64) they all provide
   the same C- and POSIX-conforming environment and that application
   compatibility should not differ significantly across archs. Aside
   from being a nice guarantee to users, it's a convenient limitation
   of scope of what needs to be considered for maintenance in the
   upstream project, and in particular omits things like ports to
   specific bare-metal targets, which while interesting, would become
   huge scope creep if I were involved in handling them.

While the convenience of doing this probably isn't yet at the level it
ideally should be at, it's my intent that musl can fairly easily use
"overlay" style arch glue for exotic targets that aren't upstream.
This is the model midipix (midipix.org) is using, where all the arch
files are provided by a separately-maintained package that can be
extracted into the musl source tree. (Note: we do have nt32/nt64 arch
targets in upstream configure so that patching of configure is not
needed; same for wasm would make sense I think.)

How does this sound?

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.