|CCL HomeSoftware Community Operations||
Using Condor at Notre DameThese instructions will get you started using Condor at Notre Dame. You can learn a lot more about Condor in general at the Condor web site.
To start, add the Condor tools to your path:
export PATH=/afs/crc.nd.edu/user/c/condor/software/bin:$PATH export PATH=/afs/crc.nd.edu/user/c/condor/software/sbin:$PATHOr using tcsh:
export PATH=/afs/crc.nd.edu/user/c/condor/software/bin:$PATH export PATH=/afs/crc.nd.edu/user/c/condor/software/sbin:$PATHNext, log into a machine that has Condor installed. A large number of machine on campus are already set up with Condor. If you are a student doing coursework on Condor, you can access Condor from the CSE student[00-03].cse.nd.edu machines. If you have a CRC account, you can use the front-end CRC machine condorfe.crc.nd.edu to access Condor. You can also use the CCL submit machine cclsubmit.cse.nd.edu if you are submitting a large number of jobs. (Email firstname.lastname@example.org for access.) If you would like your machine set up as a Condor submitter, ask your system administrator to install Condor with these instructions.
To see the machines available in the ND Condor pool, you can view the Condor status web page, or you can run the condor_status command:
condor_status Name OpSys Arch State Activity LoadAv Mem ActvtyTime email@example.com LINUX INTEL Owner Idle 0.220 501 0+00:00:10 firstname.lastname@example.org LINUX INTEL Owner Idle 0.000 501 0+00:00:11 wombat00.csel LINUX INTEL Owner Idle 0.010 121 0+00:00:14 ...
To submit a batch job to Condor, you must create a submission file and then run the condor_submit command. Try creating this sample submit file in /tmp/YOURNAME/test.submit. (Make sure that your really do put it in /tmp/YOURNAME/test.submit)
universe = vanilla executable = /bin/echo arguments = hello condor output = test.output should_transfer_files = yes when_to_transfer_output = on_exit log = test.logfile queueNow, to submit the job to Condor, execute:
cd /tmp/YOURNAME condor_submit test.submit Submitting job(s)... 1 job(s) submitted to cluster 49603.Once the job is submitted, you can use condor_q to look at the status of the jobs in your queue. By default condor_q shows the total number of jobs in various states:
-- Schedd: disc01.crc.nd.edu : <10.32.74.140:9618?... @ 02/10/21 10:12:52 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE HOLD TOTAL JOB_IDS dthain ID: 49603 8/26 17:21 0 1 0 0 1 49603.0If you want to look at the details of individual jobs, use condor_q -nobatch.
If you run condor_q quickly enough, you will see your job idle:
-- Schedd: disc01.crc.nd.edu : <10.32.74.140:9618?... @ 02/10/21 10:12:52 ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 49603 dthain 8/26 17:21 0+00:00:00 I 0 0.0 echo hello worldIf you decide to cancel a job, use condor_rm and the job id:
condor_rm 49630.0 Job 49630.0 marked for removal.Note about email: Despite what the Condor manual says, you will not receive email when a job is complete. This feature has been disabled at Notre Dame due to our email security configuration.
Because you will certainly want to run many jobs at once via Condor, you can easily modify your submit file to run a program with tens or hundreds of variations. Change the queue command to queue several jobs at once, and the $(PROCESS) macro to modify the parameters with the job number.
universe = vanilla executable = /bin/echo arguments = hello $(PROCESS) output = test.output.$(PROCESS) error = test.error.$(PROCESS) should_transfer_files = yes when_to_transfer_output = on_exit log = test.logfile queue 10Now, when you run condor_submit, you should see something like this:
condor_submit test.submit Submitting job(s).......... 10 job(s) submitted to cluster 9.Note in this case that "cluster" means "a bunch of jobs", where each job is named 9.0, 9.1, 9.2, and so forth. In this next example, condor_q shows that cluster 9 is halfway complete, with job 9.5 currently running.
condor_q -nobatch -- Schedd: disc01.crc.nd.edu : <10.32.74.140:9618?... @ 02/10/21 10:12:52 ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 9.5 dthain 8/26 17:46 0+00:00:01 R 0 0.0 echo hello 5 9.6 dthain 8/26 17:46 0+00:00:00 I 0 0.0 echo hello 6 9.7 dthain 8/26 17:46 0+00:00:00 I 0 0.0 echo hello 7 9.8 dthain 8/26 17:46 0+00:00:00 I 0 0.0 echo hello 8 9.9 dthain 8/26 17:46 0+00:00:00 I 0 0.0 echo hello 9
Important note about AFS:In the example above, the submit file and all of the job's details were stored in /tmp/YOURNAME on your local disk. Condor simply moved the necessary files back and forth in order to run your jobs. If instead your store your data files in AFS (i.e. your home directory), Condor cannot access them because it will not have your AFS Kerberos ticket..
If you want Condor to be able to read any data out of AFS, you must change the ACLs on the necessary directories to allow any machine on campus to read the data. This is fine for non-sensitive data. Here's how:
fs setacl ~/my/data/directory nd_campus rlIf you want Condor to be able to write to AFS, you must change the ACLs to allow any machine on campus to write to that directory. Of course, this is a security risk, and should not be done without some careful thought.
There is much more to Condor. Please read the manual to learn more.
Users and administrators of Condor at Notre Dame are encouraged to subscribe to the condor-discuss mailing list to learn more.