CCL Home

Software

Community

Operations

Makeflow Tutorial

Download and Installation

First log into the ND CRC head node crcfe01.crc.nd.edu or crcfe02.crc.nd.edu by using ssh, Putty, or a similar tool. Once you have a shell, download and install the cctools software in your home directory as follows:

cd $HOME
wget http://ccl.cse.nd.edu/software/files/cctools-5.4.13-source.tar.gz
tar xvzf cctools-5.4.13-source.tar.gz
cd cctools-5.4.13-source
./configure --prefix $HOME/cctools
make
make install
cd $HOME

If you use bash then do this to set your path:

export PATH=${HOME}/cctools/bin:/opt/sge/bin/lx-amd64:${PATH}

If you use tcsh instead, then do this:

 setenv PATH ${HOME}/cctools/bin:/opt/sge/bin/lx-amd64:${PATH}

Now double check that you can run the various commands, like this:

makeflow -v
work_queue_status
qstat

Makeflow Example

Let's being by using Makeflow to run a handful of simulation codes. First, make and enter a clean directory to work in:

cd $HOME
mkdir tutorial
cd tutorial

Now, download this program, which performs a highly sophisticated simulation of black holes colliding together:

wget http://ccl.cse.nd.edu/software/tutorials/ndreu16/simulation.py

Try running it once, just to see what it does:

chmod 755 simulation.py
./simulation.py 5

Now, let's use Makeflow to run several simulations. Create a file called example.makeflow and paste the following text into it:

input.txt:
	LOCAL /bin/echo "Simulate Black Holes" > input.txt

output.1: simulation.py input.txt
	./simulation.py 1 < input.txt > output.1

output.2: simulation.py input.txt
	./simulation.py 2 < input.txt > output.2

output.3: simulation.py input.txt
	./simulation.py 3 < input.txt > output.3

output.4: simulation.py input.txt
	./simulation.py 4 < input.txt > output.4

To run it on your local machine, one job at a time, do this:

makeflow -j 1 example.makeflow

Note that if you run it a second time, nothing will happen, because all of the files are built:

makeflow example.makeflow
makeflow: nothing left to do

Use the -c option to clean everything up before trying it again:

makeflow -c example.makeflow

Of course, you are running on a machine with multiple cores. If you leave out the -j option, then makeflow will run as many jobs as you have cores:

makeflow example.makeflow

If the jobs are expected to be long running, then you would tell Makeflow to submit each job to SGE:

makeflow -T sge example.makeflow

After that completes, examine the output files output.1 etc, and you will notice that each job ran on a different machine in the cluster.

Running Makeflow with Work Queue

Sometimes, submitting jobs individually to a batch system is not convenient. It can take a long time for each job to wait in the queue and receive service. Or, perhaps you are working in a situation where you don't have a batch system set up for use. Instead, you can use the Work Queue system to run the jobs. To do this, first start makeflow in Work Queue (wq) mode:

makeflow -c example.makeflow
makeflow -T wq example.makeflow -p 0
listening for workers on port XXXX.
...

Now open up another shell and run a single worker process:

work_queue_worker localhost XXXX

Go back to your first shell and observe that the makeflow has finished. Of course, remembering port numbers all the time gets old fast, so try the same thing again, but using a project name:

makeflow -c example.makeflow
makeflow -T wq example.makeflow -N MYPROJECT
listening for workers on port XXXX
...

Now open up another shell and run your worker with a project name:

work_queue_worker -N MYPROJECT

When using a project name, your workflow is advertised to the catalog server, and can be viewed using work_queue_status:

work_queue_status

Running Workers in SGE

Of course, we don't really want to run workers on the head node, so let's instead start five workers using SGE:

sge_submit_workers -N MYPROJECT 5
Creating worker submit scripts in dthain-workers...
Your job 18728 ("worker.sh") has been submitted
Your job 18729 ("worker.sh") has been submitted
Your job 18730 ("worker.sh") has been submitted
Your job 18731 ("worker.sh") has been submitted
Your job 18732 ("worker.sh") has been submitted

Use the qstat command to observe that they are submitted (and possibly running):

qstat -u $USER
job-ID     prior   name       user         state submit/start at     queue                      
------------------------------------------------------------------------------------------------
     18728 100.49976 worker.sh  dthain       r     06/02/2016 12:04:45 long@d6copt172.crc.nd.edu
     18729 100.49976 worker.sh  dthain       r     06/02/2016 12:04:47 long@d6copt184.crc.nd.edu
     18730 100.49976 worker.sh  dthain       r     06/02/2016 12:04:47 long@d6copt025.crc.nd.edu
     18731 100.49976 worker.sh  dthain       r     06/02/2016 12:04:48 long@d6copt025.crc.nd.edu
     18732 100.49976 worker.sh  dthain       r     06/02/2016 12:04:48 long@dqcneh084.crc.nd.edu

Now, restart your Makeflow and it will use the workers already running in SGE

makeflow -c example.makeflow
makeflow -T wq example.makeflow -N MYPROJECT
listening for workers on port XXXX.
...

You can leave the workers running there, if you want to start another Makeflow. They will remain until they have been idle for fifteen minutes, then will stop automatically.

If you add the -d all option to Makeflow, it will display debugging information that shows where each task was sent, when it was returned, and so forth:

makeflow -c example.makeflow
makeflow -T wq example.makeflow -N MYPROJECT -d all
listening for workers on port XXXX.

(Alternate) Running Workers in Condor

Of course, we don't really want to run workers on the head node, so let's instead start five workers using Condor:

condor_submit_workers -N MYPROJECT 5
Creating worker submit scripts in dthain-workers...
Submitting job(s).....
5 job(s) submitted to cluster 258192.

Use the condor_q command to observe that they are submitted to Condor:

condor_q
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD       
258192.0   dthain          5/31 16:03   0+00:00:12 R  0   0.7  work_queue_worker 
258192.1   dthain          5/31 16:03   0+00:00:12 R  0   0.7  work_queue_worker 
258192.2   dthain          5/31 16:03   0+00:00:12 R  0   0.7  work_queue_worker 
258192.3   dthain          5/31 16:03   0+00:00:12 R  0   0.7  work_queue_worker 
258192.4   dthain          5/31 16:03   0+00:00:11 R  0   0.7  work_queue_worker

Now, restart your Makeflow and it will use the workers already running in Condor:

makeflow -c example.makeflow
makeflow -T wq example.makeflow -N MYPROJECT
listening for workers on port XXXX.
...

You can leave the workers running there, if you want to start another Makeflow. They will remain until they have been idle for fifteen minutes, then will stop automatically.

If you add the -d all option to Makeflow, it will display debugging information that shows where each task was sent, when it was returned, and so forth:

makeflow -c example.makeflow
makeflow -T wq example.makeflow -N MYPROJECT -d all
listening for workers on port XXXX.

If you finish all that, then go on to the Homework Assignment.