john-dev - Re: [RFC] Johnny further development proposal

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <C02269C5-40D1-436D-97BB-1F0F6CACED8E@shinnok.com>
Date: Tue, 21 Apr 2015 10:24:46 +0300
From: Shinnok <admin@...nnok.com>
To: john-dev@...ts.openwall.com
Subject: Re: [RFC] Johnny further development proposal


> On Apr 20, 2015, at 11:20 PM, Aleksey Cherepanov <lyosha@...nwall.com> wrote:
> 
> Shinnok, Mathieu,
> 
> On Mon, Apr 20, 2015 at 10:47:06AM -0400, Mathieu Laprise wrote:
>> Hey guys, here is my first draft timeline. I put it on gsoc website and in
>> the list too. I'll appreciate any comments. Like Aleksey says, some tasks
>> like jumbo support and 2*john conversion are not clear to me at the moment
>> so I put more time into these activities.
>> 
>> This timeline start on the* first week of may(during the community bounding
>> period) and at version 1.4 since 1.2-1.3 have Shinnok name on the roadmap.*
>> I'm not really experimented at planning so if something takes less time
>> than planned, I'll start working on the next sprint. And if something takes
>> WAY more time than planned, I'll take a moment to brainstorm about is this
>> feature worth it or should we priorize another feature for the best of
>> Johnny ?
>> 
>> Sprint 1(week 1 and 2) :  Get familiar with john the ripper doc and
>> codebase. Code, integrate, test version 1.4 requirements . Translation is
>> already advanced but proper threading will introduce new changes and bugs
>> that I'll fix.
> 
> Threading is a pain and unneeded complexity. So I propose to not
> implement it as long as we can. Is there an example of slowness
> solvable by threading?

Threading is not so much of a pain nowadays and is quite a necessity. We have plenty of Qt cross-platform support for that in both QThread and QtConcurrent.
This is not just a threading task, what I ultimately desire is to have proper separation between UI specific logic and all the backend related tasks(cli invocation and monitoring, cli output parsing, post and pre file input/output processing). Reasons for having this separation include:

* Better code where we do not mix parsing of JtR process status and output right in the UI related slots and activities. The UI code(mainwindow.cpp) is a particularly bad choice of handling other processes or doing any kind of compute intensive task. Can lead to crashes, slow responses, slow parsing and bad user experience in the end.
* Loose coupling between UI code and backend code(as defined previously), leads to better code clarity, the possibility of defining an interface of communication between JtR  and Johnny, less crashes, easier debugging and greater overall architecture design.
* Having a loosely coupled relation between the UI and the backend tasks will leave room for easier extensibility(via a bit more management code yes). We'll need to call 2john scripts and gracefully recover if some fail, aren't available or are hanging, we can't really do that from MainWindow. Apart from those obvious extensions, I'm thinking the big picture here, when we'll need to transition Johnny to managing multiple instances of JtR running on different machines(which I am a serious believer is the only maturity conclusion for Johnny, even if nobody agrees with me :) ), it will not warrant a complete rewrite, but just an extension of that mediating intermediary step(the language).
* Now about the threading itself, that's just a piece of the decoupling procedure. Once we properly decouple, we can run any compute intensive or busy wait in different threads. Another aspect is that the QProcess event loop polling for stdin/stdout is happening on the same UI thread in the current implementation. That's a problem since each time the event loop is awaken for the main UI thread it also does quite a lot of other UI stuff instead of just polling the QProcess. A separate thread for that would be so much better and it might even fix one CPU usage problem I noticed.

I know this all sounds very complicated, but it's not. It's not like we're going to do this in C and POSIX interfaces. We use C++ and Qt and that eases things quite a lot. If you look in the ancient history of Johnny, I intended to do that from the ground up(look for johnParser and johnThread). (https://github.com/shinnok/johnny/tree/9357c12b585fd5270d7c8fca29bf343a1bdbea4f)
Though I think you shun it away for convenience reasons and in absence of some guidance, which was mostly my fault since I wasn't around.

I propose a simple design as having these three classes for a start:
JohnClient     - maintains the QProcess handle and handles the std IO
JohnParser    - convenient class for interpreting john output to whatever intermediary data structures we need
TaskManager - starts JohnClients(just one for now) for a start based on what the UI is telling it to do and relays client results filtered via JohnParser back to it

TaskManager runs on a separate thread than that of the UI. Neither is supposed to consume any CPU as to have any realistic impact in the actual JtR crunching happening on the machine(we can't reasonably expect to take full exclusivity of a thread/core in a modern desktop anyhow). The UI, the TaskManager and the individual JohnClients are all connected via a layer of exchanging Qt signals and slots(which are loosely coupled, pay attention to the thread affinity if coded properly and don't work via sockets or shared memory).

> 
> Version 1.4
> 1. Make sure all strings are translatable and add language switching
>    support [Mathieu]
> 2. Add tooltips to all UI actions that are not very self explanatory
>    to a new comer
> 3. Properly separate the CLI wrapper from the UI and proper
>    threading. Any delays or crashes at the CLI side shouldn't be
>    mirrored by Johnny
> 4. Rename Output tab to CLI journal and also print JtR cmds (allows
>    the user to inspect whatever commands Johnny issued to JtR as
>    well as the output)
> Version 1.5
> 1. Add the –fork option to the UI so that we can use multi core
> 2. Manual plain-text guessing for individual ciphers (directly in
>    the table view)
> 3. Hash type suggestion/guessing for individual hashes (which is the
>    best way? do we have any support from JtR jumbo with that)
> 
> I'd mix sprint 1 and sprint 2 to get -fork earlier. 1.4.1 should not
> be hard to finish. 1.4.2 may be interesting at the beginning to learn
> Johnny. 1.4.3 may be hard to implement right at the beginning, also
> examples of bad behaviour are needed.

I traded places for the threading task with the --fork and OpenMP one. Thus now we have:
1.4.3 Add the –fork and OpenMP support so that we can use multi core (an option should be available for selecting how many cores should be used)
1.5.3 Properly separate the CLI wrapper from the UI and proper threading. Any delays or crashes at the CLI side shouldn't be mirrored by Johnny

> 
> 1.4.4: I'd print commands that can be copy-pasted to term. It may be
> tricky to implement because there should be proper quoting that
> depends onto platform. Though "Ability to select/deselect individual
> hashes" makes it less important.

It's more for debugging and learning I think. I'd like that have that for all CLI wrappers if possible, in a perfect world.

> 
> 1.5.1 may be started earlier. 1.5.2 is either small (and ok to be done
> at that position) or overlaps with 1.6.2 (though in such form it
> should be small too).
> 
> 1.5.3 may need support from john. We need it anyway so I may implement
> john's part. Though there is a pitfall: there are hashes with 1 format
> in john, some hashes have several identical formats in john (several
> implementations), while some hashes may be of different formats with
> different passwords (md5u vs md5).

This is in an important aspect. What are the actual chances of getting any support code for Johnny in at least Jumbo when warranted? Thinking more explicit return codes, error messages and possibly new key interrupts for more in depth or new data into what's JtR actually doing at any point in time.

> 
> So I'd move 1.5.1 earlier and 1.4.3 later.
> 
>> Sprint 2(week 3 and 4) :  Code, integrate, test version 1.5
>> requirements . *Make
>> builds for major platforms and send it to the list and johnny website.*
> 
> We don't really send releases itself to the list, only announces.
> Johnny does not have a separate website. It is hosted on Openwall's
> wiki:
> http://openwall.info/wiki/john/johnny

Mathieu, can you please detail a bit more the builds and release step in your timeline? We need Linux and OS X for now, it would help us in better gauging how much you could help with this and if you need any help.

Aleksey, Solar, is it enough if we test and provide deb's for Debian latest and Ubuntu latest in "house"? I personally don't want to test on rpm based distros, mainly since I don't use any anymore. For OS X I'm thinking just a DMG, will include JtR, Qt libs and Johnny build(I'm thinking of testing only Yosemite). If there's trouble with bundling JtR in that I need to know. Johnny should and will be able to look for an existing JtR in PATH and override in settings either way.

PS: I'm not sure where do we stand with Windows support. Maybe after 2.0 we can clarify that?

> 
>> Sprint 3(week 5 and 6) : Prepare for sprint 4 and understand *2john
>> conversion support(from 1.7). Think about a plan and show it to the list
>> for approval. Code, integrate, test version 1.6 requirements.
> 
> I guess multiple pwd files session management is quite big task.

You sure? I'm thinking of just keeping a track of multiple .pot, .recs and input sources. Just the ability of switching between .recs and a last used history session menu with an option to clear that on demand. We can use QSettings further to keep track of this along with the current app settings.

> 
>> Sprint 4 (week 7 and 8) : Code, integrate, test version 1.7 requirements .
>> 
>> Sprint 5(week 9 and 10) : If 1.7 isn't finished, continue here. Also, make
>> some refactoring(if needed) to Johnny, retest everything we have so far and
>> prepare for next sprint and think about jumbo support. Which thing do we
>> want to implement? is it possible ?what are the risks ? how will we do it ?
>> Maybe make a prototype ? Start implementing version 1.8. *Make builds for
>> major platforms and send it to the list and johnny website.*
>> 
>> Sprint 6 (week 11 and 12) : Work really hard on version 1.8
> 
> I hope to split "jumbo support" and prioritize parts.
> 
>> Sprint 7 (week 13 and 14) : Work really hard on version 1.8. Start
>> brainstorming with the list about distribution channels and supported
>> platform to be ready for next sprint. *Make builds for major platforms and
>> send it to the list and johnny website.*
>> 
>> Sprint 8(week 14 and 15) : (Version 1.9) verify that everything build on
>> all supported platforms, do some tests using VM from different platforms,
>> fix platform specific bugs. Figure out distribution channel with the list.
>> 
>> Sprint 9 (week 16 and 17) (Version 2.0) is released, I polish stuff and fix
>> left bugs. A lot of testing.
>> 
>> After GSOC : Gather feedback from people regarding version 2.0 and work
>> part-time when school allows me on fixing them.
> 
> It is good that you're planning to maintain Johnny after GSoC too. We
> appreciate it.
> 
> So far, I think the only real feedback is here:
> http://openwall.com/lists/john-users/2013/03/14/1
> 1) No build for Mac OS X
> 2) No support for rar (rar2john should be called to get the hashes
> from .rar file, so it is part of *2john task).
> 
> I think you'll need to include the text of points into your
> application. Shinnok - right?

Not sure what you mean right there.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.