===== Some Condor hints (to be extended) ===== ==== Sample submit file: ====
universe = vanilla executable = /home/steffeng/condor/test.sh initialdir = /home/steffeng/condor arguments = $(Process) # log = $(Process).log output = $(Process).out notification = Never # request_cpus = 1 queue 400* The //executable// must be executable (''chmod +x''). In general, it's a good idea to use a //wrapper script// around your executable proper, to create a working directory (see "Using temporary storage" below), copy/symlink input and output data, and cleanup when your work is done (this is one of the most famous use cases of the ''trap'' statement). * It is recommended to explicitly (full path) define the //initialdir//. * If you //queue// n jobs, they will get //Process// numbers from 0 to (n-1). * You are expected not only to announce your CPU requirements (//request_memory=n//), but also those for memory (//request_memory//) and GPUs (//request_gpus//). * As Condor is a High-Throughput Computing scheduler, a larger number of shorter jobs is usually preferrable, as long as the runtimes stay above a few minutes (Condor tends to get confused by too-short runs). ==== Universes ==== * The **vanilla universe** is what you want to use most of the time. * It is not planned to use the **standard universe** as that would imply "stupid checkpointing". * With memory images several GB in size, and a few hundred nodes, checkpoint servers are easily overloaded, which does make little sense if your parameter set is rather small, allowing for "smart checkpointing" instead. **Hypatia does not implement (standard universe-type) checkpointing.** * We have a **parallel universe** as well. * The **DedicatedScheduler** runs on **hypatia2**, which means parallel jobs have to be submitted **from there (only)**. * Use **machine_count** instead of **request_cpus**. * The **queue** count (again) is the number of separate jobs, **not** threads! * See the [[clusters:hypatia:openmpi|OpenMPI]] article for details. ==== Dynamic slots ==== * All machines are configured with a single dynamic ("parent", "partitionable") slot, containing **all** resources of the node. * Requesting a number of CPU cores, a certain amount of memory, etc. will "partition" (split some resources off) that slot, resulting in a sub-slot running the job, and a (free) slot keeping track of the remaining resources. * This process is repeated until the parent slot is running out of resources. * This has a drawback: ''condor_status -t'' will always show unclaimed slots, even if these are depleted (no CPU available). * To get a better image of available resources, you may check the [[https://hypatia.aei.mpg.de/s/h|Hypatia Status page]] ==== Using (and addressing remotely) temporary storage ==== While your job may behave nicely if it runs on your personal machine, multiplying it by 100 may result in problems. Please keep in mind that your home directory resides on a fileserver, and is accessed via NFS - which is basically synchronous, adding considerable delays. Also, several dozen write accesses from different machines will keep the disk heads busy, as metadata and data have to be written to various locations on the disk. Not to forget that adding a few bytes to the end of a file would require a full block read/update/writeback cycle. Even a RAID controller had limited opportunities to balance this kind of load. The better approach is to write to a local disk (where only a handful of jobs compete for the filesystem), make implicit use of the OS' file buffering mechanisms, and **at the end** of your job send all the data back to the central storage in one big chunk. The local scratch space has been named ''/local''. Although this request sounds a bit like "belt and braces": please use **/local/tmp/$user** for your temporary data! Try to avoid unnecessary copies of files when a **symbolic link** can also do. Scratch space can be addressed from each other node via ''/hypatia/${nodename}'' which is mapped to ''${nodename}:/local'' by the automounter. ==== Wrapper scripts ==== The general structure of your wrapper script would be * invent a unique directory name ''$worktree'' (perhaps derived from the command line, and possibly similar to the one you use below your home), or use $$ (the unique process id) to build something that doesn't get harmed by jobs running in parallel //on the same local machine// * ''rm -rf /local/tmp/$user/$worktree'' (to make sure you don't get stuff from a previous run you don't want) * ''mkdir -p /local/tmp/$user/$worktree'' (-p will create the whole directory hierarchy if it doesn't exist yet) * setup a "trap" to remove the working tree if the script gets murdered , ex: ''trap "/bin/rm -rf /scratch/tmp/$username/$worktree; exit 0" 0 1 2 6 9 15'' where 0 1 2 ... 15 are the signals you'd like to catch; "9" can't be trapped :) * ''cd .../$worktree'' * run your code * ''rsync -a .../$worktree /home/.../target/'' (this will create a single-level subdirectory below target/ **not** the whole tree!) * clean up (or get murdered... it's possible to catch exit code 0 in the trap, as well) In the worst case, some stuff remains on the ''/local'' disk - we'll discuss cleanups later. ==== DAGs ==== * Citing Kent Wenger (2014-06-23, condor-users mailing list):
"The safest way of shutting down a running DAG is using a halt file. If your DAG is foo.dag, create the file foo.dag.halt. This will allow the DAG jobs to drain from the system, and create a rescue DAG at the end. The down side of this is that if you have long-running node jobs the draining can take a long time. You can always condor_rm any running DAGs, but you'll lose whatever work the already-running node jobs have completed."==== Working your way through a text file with many parameter lines, one by one ==== * Remember that the number of the job within a job-cluster, //$(Process)//, can be submitted to your //executable// script by defining ''arguments="...$(Process) ..."'' . * Your jobs will run with the numbers 0 through (n-1), where n is the count given in the ''queue'' line. * Note that it's possible to create "homogeneous" submit files even if your jobs will perform inhomogeneous tasks - if you can hide these inhomogeneities in the individual command lines. * To select a single line from a file, I suggest something like line=$(( $1+1 )) commandline=`sed -ne "${line}p" $listfile` exec $executable $commandline (You get the idea? You might even put the executable part into the commandline itself.) * Don't forget the ''queue'' line at the end of your submit lines. ==== Removing jobs from the queue ==== You may have accidentally submitted a job-cluster, and want to get rid of it? It's generally a bad idea to kill running jobs (think of the cleanup part!), so it's better done in steps: * put idle jobs (job status 1) on hold: ''condor_q `whoami` -format '%d\n' ProcId -constraint "JobStatus==1" | xargs -r condor_hold'' * remove held (job status 5) jobs: ''condor_q `whoami` -format '%d\n' ProcId -constraint "JobStatus==5" | xargs -r condor_rm'' * check the result: ''condor_q `whoami`'' (Of course, you may replace `whoami` with your username.) While there is a ''-global'' option, you better stick to the submit machine (scheduler) you used to submit the jobs. Some additional info, for the daring: If you don't care too much, the command to kill everything that's not running (job status 2) would be ''condor_rm -constraint "JobStatus =!= 2"'' Since the signal will be sent to the executable named in the submit file, you **might** think about establishing a "trap" to run some cleanups - **this won't work if** the signal sent is a SIGKILL (9) which cannot be trapped. (There is contradicting information on the 'net, whether SIGKILL or SIGTERM will be used. You'll have to find out.) If you don't like that, you may add a SIGTERM (15) handler, and use the ''condor_vacate_job'' command, which will send a SIGTERM to the job before putting it on Idle. ==== Wrong number of jobs? ==== If you submitted too few jobs and want to add more, but are bound to keep the $(Process) number as an index: While you cannot extend an existing job cluster, you can create one that consists of two parts, one held (possibly forever) and a second that's going to run. Instead of your single ''queue $number'' line, add:
hold = True queue ${number_of_jobs_to_be_skipped} # hold = False queue ${number_of_additional_jobs}After submitting, the first part will be on **hold** (and can removed as described above) while the second part gets run with the right $(Process) arguments! Of course, this mechanism can also be used if you want to run a few processes as "test balloons" with the option of later adding more - you may do the splitting the other way round (set hold to True for the second part). And, of course, you can also submit a whole job cluster into hold state, and then start to release part of it (if you come up with a cron based solution, please get it included here!). Update: a method that works without **hold**ing (presented on the condor-users mailinglist)
noop_job = True queue ${number_of_jobs_to_be_skipped} # noop_job = False queue ${number_of_jobs_to_be_run}==== To be extended ==== Anything you'd like to share which might be important for others? (last modified by Steffen Grunewald on 16 Oct 2018)