Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120820020626.GD27715@brightrain.aerifal.cx>
Date: Sun, 19 Aug 2012 22:06:26 -0400
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: ldso : dladdr support

On Sun, Aug 19, 2012 at 06:42:30PM +0200, musl wrote:
> Hi,
> 
> This patch fixes a bug in dladdr: sym var was not incremented across gnu hash chain iteration).
> I also reworked the dladdr implem to share more code between sysv and gnu hash.
> I still haven't found a better way to get the symbol table size. Do you?
> 
> This patch uses the new decode_vec function, but as I told you in my
> previous mail, I'm not sure this the way to go.
> Could you tell me what you think?

Yeah, I'm not really happy with it either. Trying to think of
something better...

> diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c
> index f55c6f1..bf1ec6b 100644
> --- a/src/ldso/dynlink.c
> +++ b/src/ldso/dynlink.c
> @@ -1,3 +1,4 @@
> +#define _GNU_SOURCE
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
> @@ -28,12 +29,14 @@ typedef Elf32_Phdr Phdr;
>  typedef Elf32_Sym Sym;
>  #define R_TYPE(x) ((x)&255)
>  #define R_SYM(x) ((x)>>8)
> +#define ELF_ST_TYPE ELF32_ST_TYPE
>  #else
>  typedef Elf64_Ehdr Ehdr;
>  typedef Elf64_Phdr Phdr;
>  typedef Elf64_Sym Sym;
>  #define R_TYPE(x) ((x)&0xffffffff)
>  #define R_SYM(x) ((x)>>32)
> +#define ELF_ST_TYPE ELF64_ST_TYPE
>  #endif

These definitions are actually the same. I would just
#define ST_TYPE(x) ((x)&15)

> -static uint32_t hash(const char *s0)
> +static uint32_t sysv_hash(const char *s0)
>  {
>  	const unsigned char *s = (void *)s0;
>  	uint_fast32_t h = 0;
> @@ -105,7 +117,16 @@ static uint32_t hash(const char *s0)
>  	return h & 0xfffffff;
>  }
>  
> -static Sym *lookup(const char *s, uint32_t h, struct dso *dso)
> +static uint32_t gnu_hash (const char *s0)
> +{
> +	const unsigned char *s = (void *)s0;
> +	uint_fast32_t h = 5381;
> +	for (; *s; s++)
> +		h = h*33 + *s;
> +	return h & 0xffffffff;
> +}

The final &0xffffffff is a no-op. Note that the one in sysv_hash is
not a no-op; sysv_hash's result is 28 bits, not 32.

Re-reading this code also raised another issue: I'm not entirely
convinced that 0 is not a possible hash value, which may invalidate
what I said before about using h==0 to indicate "not yet computed". Of
course, it may not matter; if one in 4 billion symbol names get their
hashes repeatedly recomputed rather than being reused, it's not going
to make any difference to overall performance...

>  	/* Only trust user/env if kernel says we're not suid/sgid */
> -	if ((aux[0]&0x7800)!=0x7800 || aux[AT_UID]!=aux[AT_EUID]
> -	  || aux[AT_GID]!=aux[AT_EGID] || aux[AT_SECURE]) {
> +	if ((found&0x1e0)!=0x1e0 || aux[5]!=aux[6]
> +	  || aux[7]!=aux[8] || aux[9]) {
>  		env_path = 0;
>  		env_preload = 0;

Looking at this, I agree that the new decode_vec idea is not a good
direction. It's obfuscating the code badly.

For now, how about leaving the old decode_vec alone and just adding a
new one with a different name for getting to "high" entries. I wonder
if it would be possible, rather than using a list of wanted entries,
to use a base/count rather than always working zero-based like
decode_vec does. This would allow the resulting indices to still
actually mean something so we don't wind up with magic numbers all
over the code..

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.