|
Message-ID: <CAKbZUD3O_NeiK3WiyQDsOD5u2KrPptMynx6CK9PXRkRh_NRmhQ@mail.gmail.com> Date: Sun, 16 Jul 2023 20:29:14 +0100 From: Pedro Falcato <pedro.falcato@...il.com> To: musl@...ts.openwall.com Subject: Re: strcmp() guarantees and assumptions On Sun, Jul 16, 2023 at 8:24 PM Pedro Falcato <pedro.falcato@...il.com> wrote: > > On Sun, Jul 16, 2023 at 7:00 PM Robert Clausecker <fuz@....su> wrote: > > > > Hi NRK, > > > > Thank you for your response. > > > > Am Sun, Jul 16, 2023 at 11:49:45PM +0600 schrieb NRK: > > > Hi Robert, > > > > > > > Or to phrase it differently, is the following a legal implementation of > > > > strcmp()? > > > > > > > > int strcmp(char *a, char *b) { > > > > size_t la = strlen(a), lb = strlen(b); > > > > > > > > if (la != lb) > > > > return ((la > lb) - (lb > la)); > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > > > I don't see how this can ever be a valid strcmp implementation. The > > > return value of the comparison functions must be about the first > > > mismatching byte, not about the string lengths. > > > > > > | The sign of a nonzero value returned by the comparison functions is > > > | determined by the sign of the difference between the values of the > > > | first pair of characters that differ in the objects being compared. > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > Yes, sorry. The code would have to be extended to call memcmp() on the > > common prefix in case there is a mismatch in length. E.g. > > > > if (la != lb) > > return (memcmp(la, lb, la > lb ? lb + 1 : la + 1)); > > > > > ref: https://port70.net/~nsz/c/c11/n1570.html#7.24.4p1 > > > > > > > Or is it generally agreed upon that libc implementations support > > > > strcmp() calls on unterminated strings? > > > > > > memchr (since C11) has the following requirement: > > > > > > | The implementation shall behave as if it reads the characters > > > | sequentially and stops as soon as a matching character is found. > > > > > > I don't believe any such requirement exists for strcmp, so unless > > > someone proves otherwise, I'd say it's fair game for libc to assume that > > > the strings are nul-terminated. > > > > That's good to hear. Any idea on the “what do existing libc > > implementations permit” bit? > > Looks like it's permissive. > At the moment, musl does (non-SIMD, obviously) unsigned long loads *as > long as they're aligned* (you don't want to page fault! and reads > don't have side effects unless it's MMIO or something, and that's > non-standard) and does standard(tm) bit tricks to find null bytes in > that same word. Oops, sorry, had a brainfart there and misread your strcmp as strlen. In any case, it is AFAIK permissive as you could tell from implementations such as bionic's ssse3-strcmp-atom.S. -- Pedro
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.