|
Message-ID: <20200806132149.GB14882@openwall.com> Date: Thu, 6 Aug 2020 15:21:49 +0200 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: sha512crypt-opencl / Self test failed (cmp_all(1)) > On Wed, Aug 5, 2020 at 9:29 PM Albert Veli <albert.veli@...il.com> wrote: > > I did not see the self test error on ZTEX. But I saw some other errors > > on my setup, Aleksey saw them too on his setup. Something like this: > > > > SN 04A36E226F FPGA #2 error: pkt_comm_status=0x01, debug=0x0000 > > SN 04A36E226F error -1 doing r/w of FPGAs (LIBUSB_ERROR_IO) > > SN 04A36E226F: Timeout. > > > > It happens after a while. Not every time but sometimes. This is kind of normal. We're using USB, which involves many not exactly reliable hardware and software components. Further, the FPGAs themselves do misbehave sometimes. We're running them at combinations of utilization and clock rate close to their limits seen in our testing in practice. (Per Xilinx design tools', they're supposed to run most of our designs at higher clock rates, but in practice they don't - so we adjusted to be near the maximums that actually work.) > > It is usually enough to power off the boards and power them on again That's a bit puzzling. In my experience, when errors like the above happen, everything recovers from them on their own. John includes logic to recover from such errors without needing to be restarted, it's just that the average c/s rate becomes lower (because some FPGAs are idle for a while when an error happens, then are put back to use). > > (I have connected > > the PSU to a Silver Shield power manager to do so remotely, a modbus I/O > > could also be used for this). That's good. I use something like this too, but it's mostly just to power the boards off when not in use, not to recover from errors. On Wed, Aug 05, 2020 at 09:45:26PM -0800, Royce Williams wrote: > When this happened to me, I dropped the speed on the specific boards by > 10MHz or so until it stopped, When errors are infrequent, it's generally more efficient to just let them happen once in a while, giving a higher average c/s rate than you'd have at a lower clock rate. > using the "Frequency_[serial] = 999" syntax > for that particular algorithm's section. > > If enough boards are lower than the default, it's easier to just change the > default and create exceptions for the remainder. Please remember that there's generally no point in adjusting frequencies per board (except for testing) if you use all of your boards as one big cluster. John is currently only able to use the boards synchronously, so the slowest board will determine the cluster's overall performance. This changes when you use "--fork" or "--devices", but in particular with "--fork" it'd probably be inconvenient for you to have some forked processes terminate much sooner than others. So the per-board frequency adjustment is generally only useful when you run per-board-set attacks, explicitly targeting attacks to same-frequency lists of "--devices". Of course, you'd also use "--session" to launch multiple attacks from the same "run" directory. > If that doesn't work, you have other issues (flaky USB connector, flaky USB > cable, unstable power, etc.) Right. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.