musl - Re: ldso : dladdr support

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120820020626.GD27715@brightrain.aerifal.cx>
Date: Sun, 19 Aug 2012 22:06:26 -0400
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: ldso : dladdr support

On Sun, Aug 19, 2012 at 06:42:30PM +0200, musl wrote:
> Hi,
> 
> This patch fixes a bug in dladdr: sym var was not incremented across gnu hash chain iteration).
> I also reworked the dladdr implem to share more code between sysv and gnu hash.
> I still haven't found a better way to get the symbol table size. Do you?
> 
> This patch uses the new decode_vec function, but as I told you in my
> previous mail, I'm not sure this the way to go.
> Could you tell me what you think?

Yeah, I'm not really happy with it either. Trying to think of
something better...

> diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c
> index f55c6f1..bf1ec6b 100644
> --- a/src/ldso/dynlink.c
> +++ b/src/ldso/dynlink.c
> @@ -1,3 +1,4 @@
> +#define _GNU_SOURCE
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
> @@ -28,12 +29,14 @@ typedef Elf32_Phdr Phdr;
>  typedef Elf32_Sym Sym;
>  #define R_TYPE(x) ((x)&255)
>  #define R_SYM(x) ((x)>>8)
> +#define ELF_ST_TYPE ELF32_ST_TYPE
>  #else
>  typedef Elf64_Ehdr Ehdr;
>  typedef Elf64_Phdr Phdr;
>  typedef Elf64_Sym Sym;
>  #define R_TYPE(x) ((x)&0xffffffff)
>  #define R_SYM(x) ((x)>>32)
> +#define ELF_ST_TYPE ELF64_ST_TYPE
>  #endif

These definitions are actually the same. I would just
#define ST_TYPE(x) ((x)&15)

> -static uint32_t hash(const char *s0)
> +static uint32_t sysv_hash(const char *s0)
>  {
>  	const unsigned char *s = (void *)s0;
>  	uint_fast32_t h = 0;
> @@ -105,7 +117,16 @@ static uint32_t hash(const char *s0)
>  	return h & 0xfffffff;
>  }
>  
> -static Sym *lookup(const char *s, uint32_t h, struct dso *dso)
> +static uint32_t gnu_hash (const char *s0)
> +{
> +	const unsigned char *s = (void *)s0;
> +	uint_fast32_t h = 5381;
> +	for (; *s; s++)
> +		h = h*33 + *s;
> +	return h & 0xffffffff;
> +}

The final &0xffffffff is a no-op. Note that the one in sysv_hash is
not a no-op; sysv_hash's result is 28 bits, not 32.

Re-reading this code also raised another issue: I'm not entirely
convinced that 0 is not a possible hash value, which may invalidate
what I said before about using h==0 to indicate "not yet computed". Of
course, it may not matter; if one in 4 billion symbol names get their
hashes repeatedly recomputed rather than being reused, it's not going
to make any difference to overall performance...

>  	/* Only trust user/env if kernel says we're not suid/sgid */
> -	if ((aux[0]&0x7800)!=0x7800 || aux[AT_UID]!=aux[AT_EUID]
> -	  || aux[AT_GID]!=aux[AT_EGID] || aux[AT_SECURE]) {
> +	if ((found&0x1e0)!=0x1e0 || aux[5]!=aux[6]
> +	  || aux[7]!=aux[8] || aux[9]) {
>  		env_path = 0;
>  		env_preload = 0;

Looking at this, I agree that the new decode_vec idea is not a good
direction. It's obfuscating the code badly.

For now, how about leaving the old decode_vec alone and just adding a
new one with a different name for getting to "high" entries. I wonder
if it would be possible, rather than using a list of wanted entries,
to use a base/count rather than always working zero-based like
decode_vec does. This would allow the resulting indices to still
actually mean something so we don't wind up with magic numbers all
over the code..

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.