|
Message-ID: <000401d363ee$d8bfa180$8a3ee480$@dexlab.nl> Date: Thu, 23 Nov 2017 01:05:58 +0100 From: "Jeroen" <spam@...lab.nl> To: <john-users@...ts.openwall.com> Subject: Re: OpenMPI and .rec files? magnum wrote: <SNAP> This sounds like either a bug or PEBCAK but it may well be a bug - I'm pretty >sure I have never tested that many nodes at once. > >> Same result for OpenMPI tasks with (more OR less than 640) AND more >> than 100 subtasks. >> >> Is all the resume data in 100 recovery files, don't matter the number >> of tasks or is there something going wrong? > >You should get one session file per node. What "exact" command line did you >use to start the job? For example when submitted with prun (control framework for cluster job management): prun -np 18 -32 -t 5:00 -script openmpi-config /home/john/run/test hashes where -np 18 is #hosts, -32 is 32 processes per host, openmpi-config is a basic bash script, loading openmpi (gcc 64 bit) on the workers. The number is jobs started is - as mentioned before - ok, benchmark (--test) also works fine (... (640xMPI) DONE). Number of .rec files never exceeds 100. >Are all nodes running in the same $JOHN directory, eg. >using NFS? Yes. >What happens if you try to resume such a session? It should fail and complain >about missing files unless the bug is deeper than I can imagine. Is resuming like any other normal job, no complains as far as I can see. Please let me know if you need specific debug info. Thanks, Jeroen
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.