Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1366683267.18069.155@driftwood>
Date: Mon, 22 Apr 2013 21:14:27 -0500
From: Rob Landley <rob@...dley.net>
To: musl@...ts.openwall.com
Cc: musl@...ts.openwall.com
Subject: Re: Best place to discuss other lightweight libraries?

On 04/22/2013 07:26:21 PM, Luca Barbato wrote:
> On 04/23/2013 01:06 AM, Rich Felker wrote:
> > On Tue, Apr 23, 2013 at 12:42:01AM +0200, Luca Barbato wrote:
> >> On 04/22/2013 11:52 PM, Rich Felker wrote:
> >>>> For this there aren't solution that won't cause different  
> problems I'm
> >>>> afraid.
> >>>
> >>> Sure there are. I get the impression you can tell I was talking  
> about
> >>> libav/ffmpeg's log interface. :-) The obvious solution is to bind  
> log
> >>> contexts to the object you're acting on. See this nice talk:
> >>>
> >>>  
> http://misko.hevery.com/2008/11/21/clean-code-talks-global-state-and-singletons/
> >>>
> >>> If I remember right, part of the problem way back was that there  
> were
> >>> deep function calls that had no context available to them, and  
> that
> >>> didn't _need_ a context for anything but logging warnings or  
> whatnot.
> >>
> >> In the specific case yes. I tried to foster proper return error
> >> propagation, so you get something more meaningful than EINVAL/-1  
> and
> >> that is usually enough in those specific cases.
> >>
> >> The general problem is that the library user wants to be the only  
> one
> >> having a say on what goes where so single point overrides are  
> useful.
> >
> > The problem with your comment here is the phrase "library user". Who
> > is the library user? You may be thinking from a standpoint (in our
> > example) of MPlayer, but what if instead the application you were
> > looking at were a file manager that itself had no awareness of video
> > files, and only ended up processing them as part of a library pulled
> > in from a plugin for file previews? Obviously there is no way the  
> app
> > can be aware of where the log messages should go since it's not  
> aware
> > the library even exists. The user is the library that depends on the
> > library doing the logging, not the app, and it's very possible that
> > there is more than once such library. In which case, you have  
> madness.
> 
> Usually (at least that's what I do in those case) the global logger is
> overridden to use the outer library logger then you -end-user-  
> override
> it as well and then everything goes where you want.
> 
> The other widespread solution is to eat stderr and if something  
> appears
> show to the user, crude but not so bad.
> 
> >> When you start using those libraries in situations in which you'd  
> like
> >> to have per-$situation logging then you start to scream.
> >
> > Keep in mind it might not even be "per-situation logging". It might  
> be
> > something like one plugin needing to send the message back up to the
> > application as a popup message to display, and another plugin just
> > wanting to render the message text as a file preview in place of an
> > image...
> 
> Yeah, logging messages properly is terrible.
> 
> >>> Yes, basically. Dependency on glib means your library will impose
> >>> bloat and it will preclude robustness.
> >>
> >> Yet glib gives you oh-so-many-features (I fell for it once), sadly  
> there
> >> aren't many utility libs that provide sort of useful data  
> structures,
> >
> > If you want the data structures, I think that means you should be
> > using C++, not C.
> 
> C++ stock data structures are too runtime-dependent, crafting your own
> means getting the worst of both words if you aren't extremely careful  
> =\
> 
> Hopefully the new crop of system languages would try to capitalize on
> the experience...

What new crop of system languages?

C is a portable assembly language with minimal abstraction between the  
programmer and what the hardware is actually doing. It uses static  
typing, static memory allocation, and if you really care you can  
explicitly specify integer sizes (uint16_t or LP64) and handle  
endianness and alignment and so on down to memory mapped bitmasks. It  
provides simple container types based on pointer math: arrays are  
simple pointer arithmetic, and structs concatenate a group of variables  
so each member name corresponds to a fixed offset and size (static,  
determined at compile time) where the value is to be found relative to  
the pointer to the start of the struct.

Scripting languages like python/lua/ruby have opaque abstractions where  
you honestly don't need to know how it's implemented. They replace  
poitners with references, and build garbage collection and dynamic  
typing on top of that. Their built-in container types are resizeable,  
including an array variant and a dictionary variant. The dictionaries  
aggregate via keyword/value association, so you can add and remove  
members on the fly.

In C, types are a property of pointers. In scripting languages, types  
are a property of objects, meaning _references_have_no_type_. You  
dereference to find out what type it is. So when you implement  
functions, you find out what type it is when you try to use it, but  
asking for a member and performing an operation on that member. If the  
member isn't there, or doesn't support that operation, it throws an  
exception. You can catch that exception and handle it however you like,  
up to and including adjusting the object to add the member in question  
so it _can_ succeed. But if you don't catch the exception locally, no  
problem: it's all garbage collected. References that fall out of scope  
are naturally freed by the system.

These are two fundamentally different ways of programming. scripting  
languages are dynamic, everything interesting determineda t runtime, to  
the point where they don't even have a compilation step. You set the  
executable bit on your source code. (Is there a bytecode compilation  
step at load time with an optimized interpreter doing batched code  
translation with buffering that Sun's marketing department called "just  
in time" or some such nonsense but which Apple's 68k emulator for the  
PowerPC was already doing in 1994? Maybe. Again: it doesn't matter, the  
abstractions are opaque, it all just works.)

So with C: pointers, everything statically compiled to machine  
language, no abstraction. With scripting langauges: references,  
interpreted at runtime, opaque abstraction and often multiple different  
but compatible implementations (python/jython).

Then you have C++, which can't decide which it is. C is a local peak in  
language design space. scripting languages are another. C++ is in the  
valley between them, neither fish nor fowl, with the worst  
characteristics of both. It's a static language like C, statically  
typed and based on pointers, with thick layers of _leaky_ abstractions.  
If anything goes wrong, you have to know all the implementation details  
of the standard template library in order to debug it. Your global  
constructors are called before main() and those have zeroed memory but  
when you new() an object it doesn't have zeroed memory and you must  
initialize every single member in the constructor and of coure you  
can't memset(this, 0, sizeof(this)) because there's magic data in the  
object for RTTI and virtual methods which you can't _see_ but which you  
can trivially damage if you don't know the magic invisible  
implementation details.

ALL of C++ is magic invisible implementation details. The only way to  
safely use the language is to know enough about it you could have  
written the compiler and all the libraries. Otherwise, it's going to  
break and you won't know why, although following magic "design  
patterns" from your local cargo cult leader may help shield you from  
the wrath of the compiler for another day, if you're lucky and turn  
widdershins twice every tuesday before noon but after having the  
_right_ cup of coffee while wearing lucky socks.

C++ saw scripting languages and tried to ape their features  
(Exceptions!) but doing dynamic typing at compile time is every bit as  
stupid as doing dynamic memory management at compile time, and their  
attempt (templates) is TURING COMPLETE AT COMPILE TIME meaning you can  
write 10 lines of C++ that will keep the compiler busy until your hard  
drive fills up, and detecting this is equivalent to solving the halting  
problem. Even when it does NOT do that, a couple lines of C++ template  
making your binary ten times larger is considered _normal_.

Note: Java is also in the no man's land between C and scripting  
languages, but it's in the foothills of scripting languages instead of  
the foothills of C: it did dynamic memory management but _kept_ static  
typing, then realized how dumb taht was and punched holes in its type  
system with "interfaces", and then made code generators to spit out  
reams of interface code and designed new tools (Eclipse!) to handle  
multi-million line code bases for 2 year old projects made by 3 people.  
Alas, when Y2K happened and all that Cobol needed to be rewritten Java  
was the hot new fad (Pogs! Beanie Babies! Bitcoin!) and looked good in  
comparison to cobol, so it's the new mainframe punchcard language. Oh  
well.

Steve Yegge eviscerated Java so I don't have to here:
http://steve-yegge.blogspot.com/2007/12/codes-worst-enemy.html

So back to the "new generation of system languages": C is a portable  
assembly language. It's a category killer in that niche, the best there  
is at what it does that's already killed off competitors like Pascal.  
The only real survivors are derivatives of C whose main selling point  
is that they CONTAIN THE WHOLE OF C, VERBATIM. (By that logic a mud pie  
is a good beverage, because each mud pie contains a glass of water.)

Scripting langauges (even ugly ones like Javascript, Perl, and PHP)  
rely on opaque abstractions independent of what the hardware is doing.  
Java is the new Cobol.

Which direction is your new system language going in?

> > strl* considered harmful, for 3 reasons:
...
> > I'm aware some people like strl* and we have them in musl because  
> it's
> > better to provide them than to deal with people rolling their own  
> and
> > doing it in wrong and insecure ways. But I still would recommend
> > against using them in new code.

Toybox has xstrncpy(): If the string doesn't fit in the buffer, kill  
the program with an error message.

Rob

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.