|
Message-ID: <55188AEA.9010803@openwall.com>
Date: Mon, 30 Mar 2015 02:29:46 +0300
From: Alexander Cherepanov <ch3root@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Generic parsing functions -- prototype
Hi!
I've tried to create some prototype of generic parsing functions. Not
much is implemented. But it's enough to for 7z format (more or less). It
looks like this:
----------------------------------------------------------------------
#define HASH_FORMAT "$7z$ %0-0d $ %1-24d $ %0-16d $ %16h $
%0-16d $ %16h $ %d $ %l $ %d $ %*h"
...
static int valid(char *ciphertext, struct fmt_main *self)
{
return proc_valid(ciphertext, HASH_FORMAT, BIG_ENOUGH);
}
static void *get_salt(char *ciphertext)
{
static union {
struct custom_salt _cs;
ARCH_WORD_32 dummy;
} un;
struct custom_salt *cs = &(un._cs);
size_t SaltSize, ivSize, length;
proc_extract(ciphertext, HASH_FORMAT,
&cs->type, &cs->NumCyclesPower,
IGNORE_NUM, &SaltSize, cs->salt,
IGNORE_NUM, &ivSize, cs->iv,
&cs->crc, &cs->unpacksize,
&length, cs->data);
cs->SaltSize = SaltSize;
cs->ivSize = ivSize;
cs->length = length;
return (void *)cs;
}
----------------------------------------------------------------------
After some tuning it should become even shorter. IMHO it's much better
than current approach of manual parsing.
The attached patch contains new files parsing_plug.c/parsing.h and
changes to 7z_fmt_plug.c. I've only checked that self-tests are passed.
I don't think it's worth committing yet. But it should be enough to
start discussion and to take it into account while make gsoc plans more
precise.
Some notes.
I hope, for each john format, to have one format string describing the
hash structure so that it's enough to validate a hash and to extract
info from it a-la scanf (and to create a hash a-la printf if the need
arises). Probably not for every john format, but for most of them.
It's possible to also expose intermediate functions (to parse a number
etc.) but I'm not yet sure how useful it is. IMHO the less functions we
expose the better.
Which elements of format string are implemented:
- spaces are ignored;
- everything special starts with %, everything else is treated as literals;
- %d for unsigned decimal numbers (uint32_t), can have a range for
accepted values like %1-24d. Returns the result via uint32_t *;
- %h for binary data of variable length, encoded in hex. Max length have
to be indicated. Returns two(!) things -- actual length via size_t * and
data via unsigned char *;
- %l for a length of the next field of variable length. Returns nothing.
All length are for decoded data. '*' can be used in place of any number,
then the number is taken from the arguments a-la printf.
Future elements:
- %% -- literal %;
- %m -- base64/mime-encoded string without padding;
- %M -- base64/mime-encoded string with padding;
- %b -- base64/crypt-encoded string without padding;
- %B -- base64/crypt-encoded string with padding;
- %s -- arbitrary string (like usernames).
There are naturally many questions:
- spaces. Do we have hashes with spaces in them?
- numbers. Should we require to always indicate the range? Do we need
negative numbers (they are used only in pdf hashes)?
- types. Types are probably not very convenient. The idea was that for
numbers extracted from a hash a type of fixed size should used. And for
numbers like sizes size_t should be used. But in the example above this
leads to 3 intermediate variable which is not very nice;
- hex. Do we need variants for lower- and upper-case?
- fixed-length data. Do we have cases of fixed-length data without a
separator after it? Or cases when there is no separator and the length
is extracted from the hash, like this:
$<length-of-data1>$<length-of-data2>$<data1-in-hex><data2-in-hex>? LDAP
formats have salt+binary base64-encoded together, they probably should
splitted by hand;
- variable-length data. Do we need ranges for lengths?
- is scanf-like approach is good at all. It seems to be quite compact
but types of arguments are not checked and mistakes there are fatal and
hard to debug. Mismatch between number of specifiers and number of
arguments (2 for %h and 0 for %l) doesn't help too;
- are chosen letters for specifier good (e.g. %b vs. %m)?
- which other types of field do we need?
Comments?
--
Alexander Cherepanov
View attachment "0001-Test-generic-parsing-function-on-7z-format.patch" of type "text/x-patch" (11782 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.