|
Message-ID: <165a4213044e5de5fe904948f6d1aec5@mail.tecnico.ulisboa.pt>
Date: Fri, 20 Jan 2017 09:56:16 +0000
From: Bernardo Pascoal Figueiredo
<bernardopascoalfigueiredo@...nico.ulisboa.pt>
To: musl@...ts.openwall.com
Subject: Re: Re: Implementation of GLOB_TILDE
Sending a new patch without strlcpy and strlcat.
Thanks for the help on IRC.
On 2017-01-18 19:51, Bernardo Pascoal Figueiredo wrote:
> On 2017-01-17 19:39, Rich Felker wrote:
>> On Tue, Jan 17, 2017 at 03:49:03PM +0000, Bernardo Pascoal Figueiredo
>> wrote:
>>> >I have a few questions though:
>>> > * How do I contribute to musl? Should I just send patches to this
>>> >mailing list
>>
>> This is the preferred way, yes.
>>
>>> > * I defined GLOB_TILDE as 0x100, but I think this won't work on
>>> >architectureshttps://webmail.tecnico.ulisboa.pt/rc/?_task=mail&_action=compose&_id=5497761105881de774e18d#
>>> > that have sizeof(int) == 2, as the flags argument in glob is an int.
>>
>> That's not an issue. (1) POSIX requires >=32bit int; musl is even more
>> restrictive. (2) 0xff is 8 bits, not 16. Even ISO C requires 0x100 to
>> fit in int. You do need to match whatever value glibc uses, though; I
>> haven't checked that.
>
> They use 1 << 12 which is 0x1000. I fixed this in the patch below.
> Does this really matter? Is musl supposed to be binary compatible with
> glibc?
> OpenBSD's libc uses 0x800 for example.
>
>>
>>> > * I think it's best to define GLOB_TILDE in glob.h inside a '#if
>>> > defined(_GNU_SOURCE) || defined(_BSD_SOURCE)' what do you think?
>>> >
>>> > * I had to copy strlcat and strlcpy to glob.c so I could use
>>> >them. I had to do
>>> > this because musl isn't compile as _GNU_SOURCE or _BSD_SOURCE
>>> >so string.h
>>> > doesn't expose these functions. How should I fix this?
>>
>> Just don't use them. The same thing can be achieved much better with
>> portable functions strnlen and memcpy.
>>
>> Note that even if you added #define for _GNU_SOURCE to this file, you
>> couldn't reference these functions because then a function in the
>> standard namespace would depend on symbols in nonstandard namespace.
>>
>
> What I was thinking was to have private musl implementations of strlcat
> and
> strlcpy and have the public strlcat and strlcpy call these private
> ones.
> That way it'd be possible to use strlcat and strlcpy within musl.
> I'd prefer that to strnlen and memcpy because that's what strlcat and
> strlcpy
> already do, so I'd be duplicating it's functionality. This is way more
> error
> prone.
>
>>> diff --git a/src/regex/glob.c b/src/regex/glob.c
>>> index 5b6ff124..f40da380 100644
>>> --- a/src/regex/glob.c
>>> +++ b/src/regex/glob.c
>>> @@ -8,6 +8,9 @@
>>> #include <errno.h>
>>> #include <stddef.h>
>>> #include "libc.h"
>>> +#include <stdbool.h>
>>
>> Just use int for boolean values; bool is not idiomatic in musl.
>
> Fixed
>
>>
>>> +/*"~" or "~/(...)" case*/
>>> +static bool expand_tilde_cur_user(const char *pat_after_tilde, char
>>> *new_pat, size_t new_pat_size)
>>> +{
>>> + char *home;
>>> + struct passwd pw_store, *pw_result;
>>> + char pw_buf[1024];
>>> +
>>> + /*FIXME: add check for issetugid as in libc of openbsd?*/
>>> + home = getenv("HOME");
>>> + if(home == NULL) {
>>> + getpwuid_r(getuid(), &pw_store, pw_buf, sizeof(pw_buf),
>>> &pw_result);
>>> + if(pw_result == NULL) {
>>> + return false;
>>> + }
>>> + home = pw_store.pw_dir;
>>> + }
>>> +
>>> + return glob_strlcpy(new_pat, home, new_pat_size) < new_pat_size
>>> + && glob_strlcat(new_pat, pat_after_tilde, new_pat_size) <
>>> new_pat_size;
>>> +}
>>> +
>>> +/* "~user/(...) case*/
>>> +static bool expand_tilde_named_user(const char *pat_after_tilde,
>>> char *new_pat, size_t new_pat_size)
>>> +{
>>> + struct passwd pw_store, *pw_result;
>>> + char pw_buf[1024], username[1024];
>>> + const char *slash_pos = strchr(pat_after_tilde, '/');
>>> + if(slash_pos == NULL) {
>>> + return false;
>>> + }
>>> +
>>> + ptrdiff_t pat_username_size = slash_pos - pat_after_tilde;
>>> + if(pat_username_size <= 0 || pat_username_size >=
>>> sizeof(username)) {
>>> + return false;
>>> + }
>>> + strncpy(username, pat_after_tilde, pat_username_size);
>>> + username[pat_username_size] = '\0';
>>> +
>>> + getpwnam_r(username, &pw_store, pw_buf, sizeof(pw_buf),
>>> &pw_result);
>>> + if (pw_result == NULL)
>>> + return false;
>>> +
>>> + return glob_strlcpy(new_pat, pw_store.pw_dir, new_pat_size) <
>>> new_pat_size
>>> + && glob_strlcat(new_pat, slash_pos, new_pat_size) <
>>> new_pat_size;
>>> +}
>>
>> It should be possible to reduce this code considerably, and to avoid
>> some of the large stack buffers. Certainly username[] does not need to
>> be 1k; I believe there's a macro for the max supported in limits.h or
>> such and it's something like 16 or 32 bytes.
>
> I searched and found there is a define for LOGIN_NAME_MAX which is 256.
> According to the useradd man page the maximum number of characters of
> an
> username is 32.
> There is also in limits.h _POSIX_NAME_MAX which is 14 and
> _POSIX_LOGIN_NAME_MAX
> which is 9.
> The openbsd pwd.h has a define for _PW_NAME_LEN which is 31 (without
> '\0').
>
> Which alternative is the best?
>
> (I updated the username arrays to be LOGIN_NAME_MAX)
>
> Relative to the pw_buf I don't think musl's code has any variable that
> defines
> the maximum supported passwd line.
> I used 1024 because that's what I saw in the openbsd code. They have a
> global
> char [] of size _PW_BUF_LEN which is 1024.
>
> Reduce the code in which way? Try to refactor some code of
> expand_tilde_cur_user and expand_tilde_named_user, of make the code
> "less
> verbose"?
>
>>
>>> int glob(const char *restrict pat, int flags, int (*errfunc)(const
>>> char *path, int err), glob_t *restrict g)
>>> {
>>> - const char *p=pat, *d;
>>> + const char *p, *d;
>>> + char new_pat[PATH_MAX + 1];
>>> struct match head = { .next = NULL }, *tail = &head;
>>> size_t cnt, i;
>>> size_t offs = (flags & GLOB_DOOFFS) ? g->gl_offs : 0;
>>> int error = 0;
>>> +
>>> + /*even if expanding fails(e.g. expansion make pat too big)
>>> + * we should try to match the ~ or ~user literally*/
>>> + bool should_expand_tilde = (flags & GLOB_TILDE) && (pat[0] ==
>>> '~');
>>> + if(should_expand_tilde && expand_tilde(pat, new_pat,
>>> sizeof(new_pat))) {
>>> + p = new_pat;
>>> + } else {
>>> + p = pat;
>>> + }
>>
>> Don't introduce gratuitous variables for expressions used just once.
>> The conditions going into that bool can just be part of the if.
> If I put it inside the if the line would be too big and those two
> checks go
> together.
> I think it's alot clearer.
>
>>
>> I haven't checked what happens when the input pat is too long to fit
>> in new_pat. This needs to be an error condition, not silent
>> truncation, but error conditions can't be handled until further down;
>> see commit 769f53598e781ffc89191520f3f8a93cb58db91f for why. Also,
>> new_pat should probably be a VLA whose length is 1 unless tilde
>> expansion is needed, and PATH_MAX otherwise, so that glob doesn't blow
>> an extra page on the stack when tilde expansion is not in use.
>
> I changed it to char new_pat[tilde_flag ? PATH_MAX : 1] where
> tilde_flag =
> flags & GLOB_TILDE.
> (new_pat no longer uses an extra page.)
>
> Alternatively I can put most of the code of glob in another function
> and only
> create a new pat buffer when we should try to expand the tilde.
> This would be good to reduce the scope of new_pat.
>
> When the expanded pattern is too big to fit new_pat, the expansion
> fails and
> the unexpanded pattern is used.
> According to glibc GLOB_TILDE should never make glob return an error.
> If
> anything related to the expansion of tilde fails it's supposed to
> continue glob
> with the pattern unexpanded.
> At least that how I interpret the manual:
> "
> If the username is invalid, or the home directory cannot be determined,
> then no
> substitution is performed.
> "
> What I'm trying to say is that GLOB_TILDE never produces an error, so
> the code
> that zeros the glob_t structure always runs.
>
> If the code that checks for leading '/' is put just before the call to
> match_dir, it'd be ok to put the code that calls expand_tilde after the
> glob_t
> zeroing and strlen checking and before the leading '/' code.
> The "downside" of this is that the caller of glob has a ton of leading
> '/' it
> could now fail the strlen check, but I think that's the fault of who is
> calling
> glob with "/////////////////////(...)"
> A patch for that is in attachment
> "move_leading_slash_check_down.patch".
>
>
>>
>> Rich
>
> I also send the patch with the things I fixed as attachment. It's
> still not okay.
View attachment "glob_tilde.patch" of type "text/x-diff" (3593 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.