HTCondor Quick Guide¶
Quick Guide¶
This article will show you how to submit a computational job to the HTCondor system.
First of all, you should have your program prepared. For example, it is called exampleA. The program will accept a number as the command line argument, which means the number of events to be simulated. If you execute the program in the node directly, you may type:
$ exampleA 10
to generate 10 simulation events.
Then you need an job file for more simulation. It looks like:
Universe = vanilla
Executable = exampleA
Arguments = 1000
Log = exampleA.log
Output = exampleA.out
Error = exampleA.error
Rank = (OpSysName == "CentOS")
Queue
Save it and choose a name for it. For example, job_file_1.
Let’s examine each of these lines:
Universe
: The vanilla universe means a plain old job. Later on, we’ll
encounter some special universes.
Executable
: The name of your program
Arguments
: These are the arguments you want. They will be the same
arguments we typed above.
Log
: This is the name of a file where Condor will record information
about your job’s execution. While it’s not required, it is a really
good idea to have a log.
Output
: Where Condor should put the standard output from your job.
Error
: Where Condor should put the standard error from your job. Our job
isn’t likely to have any, but we’ll put it there to be safe.
Rank
: It is used to choose a match from among all machines that satisfy the job’s requirements. In this example, we choose to execute the job on nodes with CentOS. It is highly recommend to add this line to avoid your CPU-only job is submitted to the GPU node, in which Ubuntu is installed.
Next, tell Condor to run your job:
$ condor_submit job_file_1
Then it’s OK. You can use command condor_q
to query the status of
your job.
When the job is finished, you can find several files: exampleA.log, exampleA.error and exampleA.out. These files contain the output of the program.
You can also write a script to invoke your program, then set the script
as the Executable
parameter in the job file.
For more information, please read the HT Condor Documents for help.
Multi-threading Job¶
If the job’s executable is multi-threaded, then request_cpus must be used in the job file, otherwise the job will be limited to run on one CPU core only, no matter how many threads it spawning. The job file may looks like:
executable = my_multi_threading_program
arguments = arg1 arg2 arg3
log = job.log
output = job.out
error = job.err
request_cpus = 8 # requests 8 CPU cores for the job
queue 1
Each job can request at most 18 cores. If the jobs stay in idle status as
reported by condor_q
, then the cluster may not able to match the job to a
slot that has enough cores as requested. Run condor_q -analyze
may reveal
some clues.
Singularity¶
An example condor job file that mounts /lustre/pandax/csmc
into singularity
container and lists directory:
Executable = /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity
Universe = vanilla
getenv = True
Output = log/sgrun.out
Error = log/sgrun.err
Log = log/sgrun.log
# --bind binds the cluster filesystem into singularity
# centos6.sif is a singularity image pulled from sylabs.io using command:
# singularity build centos6.sif library://library/default/centos:6
Arguments = "exec --bind /lustre/pandax/csmc:/mnt \
/home/csmc/singularity/centos6.sif \
sh -c 'cat /etc/redhat-release && ls -F /mnt'"
# or run an unpacked centos 7 container from ATLAS. Bind mount both cvmfs
# and user directory on lustre
# Arguments = "exec --bind /cvmfs:/cvmfs \
# --bind /lustre/pandax/csmc:/mnt \
# /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7-base \
# sh -c 'cat /etc/redhat-release \
# && ls -F /mnt \
# && ls -F /cvmfs/sft.cern.ch/lcg'"
Queue
Working Example for People without Patience¶
User can try to look at one of Prof. Yang Haijun’s example to run condor jobs at folder on bl-1-1.physics.sjtu.edu.cn:
$ cd /home/yhj/condor-example/
$ ./makejob # script to make powhegbox MC generator jobs with different parameters, one job per folder
$ ./makejob-condor # script to make condor jobs and combine them into one file 'Mytest-data.condor', then submit all condor jobs
The powhegbox MC generator job is pwhg_main
, the condor job file is
powhegZZ.sh
.