|
Message-ID: <CANWtx00HS00+N+uUjvbH_m=se-V65Q1ctkjHANXKY7nU+jj+FA@mail.gmail.com> Date: Tue, 11 Sep 2012 15:12:13 -0400 From: Rich Rumble <richrumble@...il.com> To: john-users@...ts.openwall.com Subject: Re: Re: Passphrase Creation On Tue, Sep 11, 2012 at 2:45 PM, Matt Weir <cweir@...edu> wrote: >> I've found the FreeCDDB very useful in cracking. > > Cool! I didn't even know about that site until you mentioned it. > >> I'm not at all happy with the parsing I've done on the cddb > > Scraping web content is pretty much a universal problem from everyone > I've talked to, (though I'll admit my code tends to achieve a certain > level of bugginess above and beyond the norm ;). I have to imagine > there's been a lot of work/research/tools developed to do this for > other problems besides password cracking though. It might be useful > for one of us to look into existing solutions rather than re-invent > the wheel of developing our own scrapers. You can get it all in one (tarball)archive, it's full of flat text files. There are several categories, and thousands of files, so many in fact I had to batch my grep'ing and cat'ing into the first letter of file names (a-f and 0-9 if I recall the naming scheme) http://ftp.freedb.org/pub/freedb/ freedb-complete-20120901.tar.bz2 01-Sep-2012 04:39 765M 1,782,794 0-blues.txt 5,309,193 0-classical.txt 1,345,775 0-country.txt 1,448,932 0-data.txt 3,896,135 0-folk.txt 1,486,088 0-jazz.txt 15,320,828 0-misc.txt The files are delimited uniformly "Title01:" "Title02:" "Genre:" "TrackName:" something like that... There is code on the site, and a perl script or two to keep things updated as well, it was easy to get info out of this archive, but due to it's "crowd sourced" nature it's not as accurate, complete or "proper" as the CDDB (proprietary) one, and I needed to sanitize my exports a bit. -rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.