Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <165a4213044e5de5fe904948f6d1aec5@mail.tecnico.ulisboa.pt>
Date: Fri, 20 Jan 2017 09:56:16 +0000
From: Bernardo Pascoal Figueiredo
 <bernardopascoalfigueiredo@...nico.ulisboa.pt>
To: musl@...ts.openwall.com
Subject: Re: Re: Implementation of GLOB_TILDE

Sending a new patch without strlcpy and strlcat.
Thanks for the help on IRC.

On 2017-01-18 19:51, Bernardo Pascoal Figueiredo wrote:
> On 2017-01-17 19:39, Rich Felker wrote:
>> On Tue, Jan 17, 2017 at 03:49:03PM +0000, Bernardo Pascoal Figueiredo 
>> wrote:
>>> >I have a few questions though:
>>> > * How do I contribute to musl? Should I just send patches to this
>>> >mailing list
>> 
>> This is the preferred way, yes.
>> 
>>> > * I defined GLOB_TILDE as 0x100, but I think this won't work on
>>> >architectureshttps://webmail.tecnico.ulisboa.pt/rc/?_task=mail&_action=compose&_id=5497761105881de774e18d#
>>> >   that have sizeof(int) == 2, as the flags argument in glob is an int.
>> 
>> That's not an issue. (1) POSIX requires >=32bit int; musl is even more
>> restrictive. (2) 0xff is 8 bits, not 16. Even ISO C requires 0x100 to
>> fit in int. You do need to match whatever value glibc uses, though; I
>> haven't checked that.
> 
> They use 1 << 12 which is 0x1000. I fixed this in the patch below.
> Does this really matter? Is musl supposed to be binary compatible with 
> glibc?
> OpenBSD's libc uses 0x800 for example.
> 
>> 
>>> > * I think it's best to define GLOB_TILDE in glob.h inside a '#if
>>> >   defined(_GNU_SOURCE) || defined(_BSD_SOURCE)' what do you think?
>>> >
>>> > * I had to copy strlcat and strlcpy to glob.c so I could use
>>> >them. I had to do
>>> >   this because musl isn't compile as _GNU_SOURCE or _BSD_SOURCE
>>> >so string.h
>>> >   doesn't expose these functions. How should I fix this?
>> 
>> Just don't use them. The same thing can be achieved much better with
>> portable functions strnlen and memcpy.
>> 
>> Note that even if you added #define for _GNU_SOURCE to this file, you
>> couldn't reference these functions because then a function in the
>> standard namespace would depend on symbols in nonstandard namespace.
>> 
> 
> What I was thinking was to have private musl implementations of strlcat 
> and
> strlcpy and have the public strlcat and strlcpy call these private 
> ones.
> That way it'd be possible to use strlcat and strlcpy within musl.
> I'd prefer that to strnlen and memcpy because that's what strlcat and 
> strlcpy
> already do, so I'd be duplicating it's functionality. This is way more 
> error
> prone.
> 
>>> diff --git a/src/regex/glob.c b/src/regex/glob.c
>>> index 5b6ff124..f40da380 100644
>>> --- a/src/regex/glob.c
>>> +++ b/src/regex/glob.c
>>> @@ -8,6 +8,9 @@
>>>  #include <errno.h>
>>>  #include <stddef.h>
>>>  #include "libc.h"
>>> +#include <stdbool.h>
>> 
>> Just use int for boolean values; bool is not idiomatic in musl.
> 
> Fixed
> 
>> 
>>> +/*"~" or "~/(...)" case*/
>>> +static bool expand_tilde_cur_user(const char *pat_after_tilde, char 
>>> *new_pat, size_t new_pat_size)
>>> +{
>>> +    char *home;
>>> +    struct passwd pw_store, *pw_result;
>>> +    char pw_buf[1024];
>>> +
>>> +    /*FIXME: add check for issetugid as in libc of openbsd?*/
>>> +    home = getenv("HOME");
>>> +    if(home == NULL) {
>>> +        getpwuid_r(getuid(), &pw_store, pw_buf, sizeof(pw_buf), 
>>> &pw_result);
>>> +        if(pw_result == NULL) {
>>> +            return false;
>>> +        }
>>> +        home = pw_store.pw_dir;
>>> +    }
>>> +
>>> +    return glob_strlcpy(new_pat, home, new_pat_size) < new_pat_size
>>> +        && glob_strlcat(new_pat, pat_after_tilde, new_pat_size) < 
>>> new_pat_size;
>>> +}
>>> +
>>> +/* "~user/(...) case*/
>>> +static bool expand_tilde_named_user(const char *pat_after_tilde, 
>>> char *new_pat, size_t new_pat_size)
>>> +{
>>> +    struct passwd pw_store, *pw_result;
>>> +    char pw_buf[1024], username[1024];
>>> +    const char *slash_pos = strchr(pat_after_tilde, '/');
>>> +    if(slash_pos == NULL) {
>>> +        return false;
>>> +    }
>>> +
>>> +    ptrdiff_t pat_username_size = slash_pos - pat_after_tilde;
>>> +    if(pat_username_size <= 0 || pat_username_size >= 
>>> sizeof(username)) {
>>> +        return false;
>>> +    }
>>> +    strncpy(username, pat_after_tilde, pat_username_size);
>>> +    username[pat_username_size] = '\0';
>>> +
>>> +    getpwnam_r(username, &pw_store, pw_buf, sizeof(pw_buf), 
>>> &pw_result);
>>> +    if (pw_result == NULL)
>>> +        return false;
>>> +
>>> +    return glob_strlcpy(new_pat, pw_store.pw_dir, new_pat_size) < 
>>> new_pat_size
>>> +        && glob_strlcat(new_pat, slash_pos, new_pat_size) < 
>>> new_pat_size;
>>> +}
>> 
>> It should be possible to reduce this code considerably, and to avoid
>> some of the large stack buffers. Certainly username[] does not need to
>> be 1k; I believe there's a macro for the max supported in limits.h or
>> such and it's something like 16 or 32 bytes.
> 
> I searched and found there is a define for LOGIN_NAME_MAX which is 256.
> According to the useradd man page the maximum number of characters of 
> an
> username is 32.
> There is also in limits.h _POSIX_NAME_MAX which is 14 and 
> _POSIX_LOGIN_NAME_MAX
> which is 9.
> The openbsd pwd.h has a define for _PW_NAME_LEN which is 31 (without 
> '\0').
> 
> Which alternative is the best?
> 
> (I updated the username arrays to be LOGIN_NAME_MAX)
> 
> Relative to the pw_buf I don't think musl's code has any variable that 
> defines
> the maximum supported passwd line.
> I used 1024 because that's what I saw in the openbsd code. They have a 
> global
> char [] of size _PW_BUF_LEN which is 1024.
> 
> Reduce the code in which way? Try to refactor some code of
> expand_tilde_cur_user and expand_tilde_named_user, of make the code 
> "less
> verbose"?
> 
>> 
>>>  int glob(const char *restrict pat, int flags, int (*errfunc)(const 
>>> char *path, int err), glob_t *restrict g)
>>>  {
>>> -    const char *p=pat, *d;
>>> +    const char *p, *d;
>>> +    char new_pat[PATH_MAX + 1];
>>>      struct match head = { .next = NULL }, *tail = &head;
>>>      size_t cnt, i;
>>>      size_t offs = (flags & GLOB_DOOFFS) ? g->gl_offs : 0;
>>>      int error = 0;
>>> +
>>> +    /*even if expanding fails(e.g. expansion make pat too big)
>>> +     * we should try to match the ~ or ~user literally*/
>>> +    bool should_expand_tilde = (flags & GLOB_TILDE) && (pat[0] == 
>>> '~');
>>> +    if(should_expand_tilde && expand_tilde(pat, new_pat, 
>>> sizeof(new_pat))) {
>>> +        p = new_pat;
>>> +    } else {
>>> +        p = pat;
>>> +    }
>> 
>> Don't introduce gratuitous variables for expressions used just once.
>> The conditions going into that bool can just be part of the if.
> If I put it inside the if the line would be too big and those two 
> checks go
> together.
> I think it's alot clearer.
> 
>> 
>> I haven't checked what happens when the input pat is too long to fit
>> in new_pat. This needs to be an error condition, not silent
>> truncation, but error conditions can't be handled until further down;
>> see commit 769f53598e781ffc89191520f3f8a93cb58db91f for why. Also,
>> new_pat should probably be a VLA whose length is 1 unless tilde
>> expansion is needed, and PATH_MAX otherwise, so that glob doesn't blow
>> an extra page on the stack when tilde expansion is not in use.
> 
> I changed it to char new_pat[tilde_flag ? PATH_MAX : 1] where 
> tilde_flag =
> flags & GLOB_TILDE.
> (new_pat no longer uses an extra page.)
> 
> Alternatively I can put most of the code of glob in another function 
> and only
> create a new pat buffer when we should try to expand the tilde.
> This would be good to reduce the scope of new_pat.
> 
> When the expanded pattern is too big to fit new_pat, the expansion 
> fails and
> the unexpanded pattern is used.
> According to glibc GLOB_TILDE should never make glob return an error. 
> If
> anything related to the expansion of tilde fails it's supposed to 
> continue glob
> with the pattern unexpanded.
> At least that how I interpret the manual:
> "
> If the username is invalid, or the home directory cannot be determined, 
> then no
> substitution is performed.
> "
> What I'm trying to say is that GLOB_TILDE never produces an error, so 
> the code
> that zeros the glob_t structure always runs.
> 
> If the code that checks for leading '/' is put just before the call to
> match_dir, it'd be ok to put the code that calls expand_tilde after the 
> glob_t
> zeroing and strlen checking and before the leading '/' code.
> The "downside" of this is that the caller of glob has a ton of leading 
> '/' it
> could now fail the strlen check, but I think that's the fault of who is 
> calling
> glob with "/////////////////////(...)"
> A patch for that is in attachment 
> "move_leading_slash_check_down.patch".
> 
> 
>> 
>> Rich
> 
> I also send the patch with the things I fixed as attachment. It's
> still not okay.

View attachment "glob_tilde.patch" of type "text/x-diff" (3593 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.