CCL Home

Research

Software Community Operations

Using Condor at Notre Dame

These instructions will get you started using Condor at Notre Dame. You can learn a lot more about Condor in general at the Condor web site.

To start, add the Condor tools to your path:

setenv PATH /afs/nd.edu/user37/condor/software/bin:$PATH
setenv PATH /afs/nd.edu/user37/condor/software/sbin:$PATH
Next, log into a machine that has Condor installed. A large number of machine on campus are already set up with Condor. If you are a student doing coursework on Condor, you can access Condor from the student[00-03].cse.nd.edu machines. If you have a CRC account, you can use the front-end CRC machines newcell.crc.nd.edu and opteron.crc.nd.edu to access Condor. You can also use the CCL submit nodes cclsubmit[00-03].cse.nd.edu to access Condor. (Email dthain@nd.edu for access to those machines.) If you would like your machine set up as a Condor submitter, ask your system administrator to install Condor with these instructions.

To see the machines available in the ND Condor pool, you can view the Condor status web page, or you can run the condor_status command:

condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime
 
vm1@hedwig.cs LINUX       INTEL  Owner      Idle       0.220   501  0+00:00:10
vm2@hedwig.cs LINUX       INTEL  Owner      Idle       0.000   501  0+00:00:11
wombat00.csel LINUX       INTEL  Owner      Idle       0.010   121  0+00:00:14
...

To submit a batch job to Condor, you must create a submission file and then run the condor_submit command. Try creating this sample submit file in /tmp/YOURNAME/test.submit. (Make sure that your really do put it in /tmp/YOURNAME/test.submit)

universe = vanilla
executable = /bin/echo
arguments = hello condor
output = test.output
should_transfer_files = yes
when_to_transfer_output = on_exit
log = test.logfile
queue 
Now, to submit the job to Condor, execute:
cd /tmp/YOURNAME
condor_submit test.submit
Submitting job(s)...
1 job(s) submitted to cluster 2.
Once the job is submitted, you can use condor_q to look at the status of the jobs in your queue. If you run condor_q quickly enough, you will see your job idle:
-- Submitter: hedwig.cse.nd.edu : <129.74.154.241:33593> : hedwig.cse.nd.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   2.0   dthain          8/26 17:21   0+00:00:00 I  0   0.0  echo hello world
If you decide to cancel a job, use condor_rm and the job id:
condor_rm 2.0
Job 2.0 marked for removal.
Note about email: Despite what the Condor manual says, you will not receive email when a job is complete. This feature has been disabled at Notre Dame due to our email security configuration.

Because you will certainly want to run many jobs at once via Condor, you can easily modify your submit file to run a program with tens or hundreds of variations. Change the queue command to queue several jobs at once, and the $(PROCESS) macro to modify the parameters with the job number.

universe = vanilla
executable = /bin/echo
arguments = hello $(PROCESS)
output = test.output.$(PROCESS)
error = test.error.$(PROCESS)
should_transfer_files = yes
when_to_transfer_output = on_exit
log = test.logfile
queue 10
Now, when you run condor_submit, you should see something like this:
condor_submit test.submit
Submitting job(s)..........
10 job(s) submitted to cluster 9.
Note in this case that "cluster" means "a bunch of jobs", where each job is named 9.0, 9.1, 9.2, and so forth. In this next example, condor_q shows that cluster 9 is halfway complete, with job 9.5 currently running.
condor_q

-- Submitter: hedwig.cse.nd.edu : <129.74.154.241:33593> : hedwig.cse.nd.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   9.5   dthain          8/26 17:46   0+00:00:01 R  0   0.0  echo hello 5
   9.6   dthain          8/26 17:46   0+00:00:00 I  0   0.0  echo hello 6
   9.7   dthain          8/26 17:46   0+00:00:00 I  0   0.0  echo hello 7
   9.8   dthain          8/26 17:46   0+00:00:00 I  0   0.0  echo hello 8
   9.9   dthain          8/26 17:46   0+00:00:00 I  0   0.0  echo hello 9

Important note about AFS:

In the example above, the submit file and all of the job's details were stored in /tmp/YOURNAME on your local disk. Condor simply moved the necessary files back and forth in order to run your jobs. If instead your store your data files in AFS (i.e. your home directory), Condor cannot access them because it will not have your AFS Kerberos ticket..

If you want Condor to be able to read any data out of AFS, you must change the ACLs on the necessary directories to allow any machine on campus to read the data. This is fine for non-sensitive data. Here's how:

fs setacl ~/my/data/directory nd_campus rl
If you want Condor to be able to write to AFS, you must change the ACLs to allow any machine on campus to write to that directory. Of course, this is a security risk, and should probably not be done without some careful thought.

There is much more to Condor. Please read the manual to learn more.

Users and administrators of Condor at Notre Dame are encouraged to subscribe to the condor-discuss mailing list to learn more.

Related Links