Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 11 Sep 2012 15:12:13 -0400
From: Rich Rumble <richrumble@...il.com>
To: john-users@...ts.openwall.com
Subject: Re: Re: Passphrase Creation

On Tue, Sep 11, 2012 at 2:45 PM, Matt Weir <cweir@...edu> wrote:
>> I've found the FreeCDDB very useful in cracking.
>
> Cool! I didn't even know about that site until you mentioned it.
>
>> I'm not at all happy with the parsing I've done on the cddb
>
> Scraping web content is pretty much a universal problem from everyone
> I've talked to, (though I'll admit my code tends to achieve a certain
> level of bugginess above and beyond the norm ;). I have to imagine
> there's been a lot of work/research/tools developed to do this for
> other problems besides password cracking though. It might be useful
> for one of us to look into existing solutions rather than re-invent
> the wheel of developing our own scrapers.
You can get it all in one (tarball)archive, it's full of flat text
files. There are several categories, and thousands of files, so many
in fact I had to batch my grep'ing and cat'ing into the first letter
of file names (a-f and 0-9 if I recall the naming scheme)
http://ftp.freedb.org/pub/freedb/
freedb-complete-20120901.tar.bz2        01-Sep-2012 04:39  765M
  1,782,794 0-blues.txt
  5,309,193 0-classical.txt
  1,345,775 0-country.txt
  1,448,932 0-data.txt
  3,896,135 0-folk.txt
  1,486,088 0-jazz.txt
 15,320,828 0-misc.txt
The files are delimited uniformly "Title01:" "Title02:" "Genre:"
"TrackName:" something like that...
There is code on the site, and a perl script or two to keep things
updated as well, it was easy to get info out of this archive, but due
to it's "crowd sourced" nature it's not as accurate, complete or
"proper" as the CDDB (proprietary) one, and I needed to sanitize my
exports a bit.
-rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.