HTCondor Quick Guide

Quick Guide

This article will show you how to submit a computational job to the HTCondor system.

First of all, you should have your program prepared. For example, it is called exampleA. The program will accept a number as the command line argument, which means the number of events to be simulated. If you execute the program in the node directly, you may type:

$ exampleA 10

to generate 10 simulation events.

Then you need an job file for more simulation. It looks like:

Universe   = vanilla
Executable = exampleA
Arguments  = 1000
Log        = exampleA.log
Output     = exampleA.out
Error      = exampleA.error
Rank       = (OpSysName == "CentOS")
Queue

Save it and choose a name for it. For example, job_file_1.

Let’s examine each of these lines:

Universe: The vanilla universe means a plain old job. Later on, we’ll encounter some special universes.

Executable: The name of your program

Arguments: These are the arguments you want. They will be the same arguments we typed above.

Log: This is the name of a file where Condor will record information about your job’s execution. While it’s not required, it is a really good idea to have a log.

Output: Where Condor should put the standard output from your job.

Error: Where Condor should put the standard error from your job. Our job isn’t likely to have any, but we’ll put it there to be safe.

Rank: It is used to choose a match from among all machines that satisfy the job’s requirements. In this example, we choose to execute the job on nodes with CentOS. It is highly recommend to add this line to avoid your CPU-only job is submitted to the GPU node, in which Ubuntu is installed.

Next, tell Condor to run your job:

$ condor_submit job_file_1

Then it’s OK. You can use command condor_q to query the status of your job.

When the job is finished, you can find several files: exampleA.log, exampleA.error and exampleA.out. These files contain the output of the program.

You can also write a script to invoke your program, then set the script as the Executable parameter in the job file.

For more information, please read the HT Condor Documents for help.

Multi-threading Job

If the job’s executable is multi-threaded, then request_cpus must be used in the job file, otherwise the job will be limited to run on one CPU core only, no matter how many threads it spawning. The job file may looks like:

executable = my_multi_threading_program
arguments = arg1 arg2 arg3
log    = job.log
output = job.out
error  = job.err

request_cpus = 8  # requests 8 CPU cores for the job
queue 1

Each job can request at most 18 cores. If the jobs stay in idle status as reported by condor_q, then the cluster may not able to match the job to a slot that has enough cores as requested. Run condor_q -analyze may reveal some clues.

Singularity

An example condor job file that mounts /lustre/pandax/csmc into singularity container and lists directory:

Executable = /cvmfs/atlas.cern.ch/repo/containers/sw/singularity/x86_64-el7/current/bin/singularity
Universe = vanilla
getenv = True

Output = log/sgrun.out
Error  = log/sgrun.err
Log    = log/sgrun.log

# --bind binds the cluster filesystem into singularity
# centos6.sif is a singularity image pulled from sylabs.io using command:
#  singularity build centos6.sif library://library/default/centos:6
Arguments = "exec --bind /lustre/pandax/csmc:/mnt \
    /home/csmc/singularity/centos6.sif  \
    sh -c 'cat /etc/redhat-release && ls -F /mnt'"

# or run an unpacked centos 7 container from ATLAS. Bind mount both cvmfs
# and user directory on lustre
# Arguments = "exec --bind /cvmfs:/cvmfs \
#   --bind /lustre/pandax/csmc:/mnt \
#   /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7-base \
#   sh -c 'cat /etc/redhat-release \
#          && ls -F /mnt \
#          && ls -F /cvmfs/sft.cern.ch/lcg'"

Queue

Working Example for People without Patience

User can try to look at one of Prof. Yang Haijun’s example to run condor jobs at folder on bl-1-1.physics.sjtu.edu.cn:

$ cd /home/yhj/condor-example/

$ ./makejob  # script to make powhegbox MC generator jobs with different parameters, one job per folder

$ ./makejob-condor  # script to make condor jobs and combine them into one file 'Mytest-data.condor', then submit all condor jobs

The powhegbox MC generator job is pwhg_main, the condor job file is powhegZZ.sh.