|
Message-ID: <20170826151323.llqvlpwqkiv4lmhp@riva.ucam.org> Date: Sat, 26 Aug 2017 16:13:23 +0100 From: Colin Watson <cjwatson@...ian.org> To: Rich Felker <dalias@...c.org> Cc: "A. Wilcox" <awilfox@...lielinux.org>, musl@...ts.openwall.com, man-db-devel@...gnu.org Subject: Re: Re: man-db 2.7.6.1: Test failures under musl libc On Sat, Aug 26, 2017 at 09:28:08AM -0400, Rich Felker wrote: > On Sat, Aug 26, 2017 at 01:04:26PM +0100, Colin Watson wrote: > > man-db can't reasonably do without //IGNORE, certainly not if you want > > reliability. Can you try building man-db with GNU libiconv? The build > > system uses AM_ICONV already, so should have enough options to let you > > do this. > > > > (I'd take a patch to the build system to have it detect this situation > > and emit an error earlier if //IGNORE isn't available.) > > Can you explain? This seems wrong; maybe I misunderstand //IGNORE but > I can't come up with any plausible scenario where a conversion with > //IGNORE would produce usable output. No, it definitely did help in some cases. Here's the NEWS entry from when I added that: o apropos, lexgrog, man, mandb, and whatis ignore encoding conversion errors for the last possible encoding of the source page. This helps, for example, with pages including misencoded non-ASCII names of authors; it usually seems better to allow these pages to pass with small errors than to break them entirely. That was nine years ago so I no longer have specific examples to hand, but that's the sort of thing my past self wouldn't have bothered doing without having run into it in practice. :-) I seem to remember the case of non-ASCII authors' names in otherwise-ASCII pages being quite common, and especially back then the toolchain wasn't always happy to accept UTF-8 at every stage in every environment. (This is all after manconv has made its best guess as to the input encoding using stricter checks; the choice at this point is normally between mostly-correct output or an error. For many programs I agree that an error would be more appropriate, but for a program whose job is to display documentation I prefer to make a best effort to do so.) This is actually a bit less critical than I remembered. I still think it's worthwhile in general, but I'd also take a patch to use //IGNORE only when an iconv implementation that supports it is in use. > Also please be aware that the encoding on a system using musl is > always UTF-8 (musl only supports UTF-8 locales), so conversion of > man pages to another locale that can't represent their contents is > out-of-scope. Well, you also have the C locale which isn't really true UTF-8. But anyway, as noted above, the use of //IGNORE here is not intended for the case where we are totally unable to represent any of the contents, but rather for the case of small unrepresentable sections. -- Colin Watson [cjwatson@...ian.org]
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.