Makeflow Tutorial

This tutorial goes through the installation process of CCTools, the creation and running of a makeflow, and how to use Makeflow in conjunction with Work Queue to leverage different execution resources for your execution. More information can be found at http://ccl.cse.nd.edu/. For specific information on Makeflow execution see http://ccl.cse.nd.edu/software/manuals/makeflow.html and Work Queue see http://ccl.cse.nd.edu/software/manuals/workqueue.html.

Download and Installation

If you have access to the Notre Dame Center for Research Computing, first log into the CRC head node crcfe01.crc.nd.edu by using ssh, PuTTY, or a similar tool. If you do not have access, please build the code on your own machine. Once you have a shell, download and install the CCTools software in your home directory in one of two ways:

To build our latest release:

wget http://ccl.cse.nd.edu/software/files/cctools-6.0.7-source.tar.gz
tar zxpvf cctools-6.0.7-source.tar.gz
cd cctools-6.0.7-source
./configure --prefix $HOME/cctools --tcp-low-port 9000 --tcp-high-port 9500
make
make install
cd $HOME

If you use bash then do this to set your path:

export PATH=$HOME/cctools/bin:$PATH

If you use tcsh instead, then do this:

setenv PATH $HOME/cctools/bin:$PATH

Now double check that you can run the various commands, like this:

makeflow -v
work_queue_worker -v
work_queue_status

Makeflow Example

Let's begin by using Makeflow to run a handful of simulation codes. First, make and enter a clean directory to work in:

cd $HOME
mkdir tutorial
cd tutorial

Download this program, which performs a highly sophisticated simulation of black holes colliding together:

wget http://ccl.cse.nd.edu/software/tutorials/ndtut16/simulation.py

Try running it once, just to see what it does:

chmod 755 simulation.py
./simulation.py 5

Now, let's use Makeflow to run several simulations. Create a file called example.makeflow and paste the following text into it:

input.txt:
	LOCAL /bin/echo "Hello Makeflow!" > input.txt

output.1: simulation.py input.txt
	./simulation.py 1 < input.txt > output.1

output.2: simulation.py input.txt
	./simulation.py 2 < input.txt > output.2

output.3: simulation.py input.txt
	./simulation.py 3 < input.txt > output.3

output.4: simulation.py input.txt
	./simulation.py 4 < input.txt > output.4

To run it on your local machine, one job at a time:

makeflow example.makeflow -j 1

Note that if you run it a second time, nothing will happen, because all of the files are built:

makeflow example.makeflow
makeflow: nothing left to do

Use the -c option to clean everything up before trying it again:

makeflow -c example.makeflow

The Notre Dame CRC supports (among other systems) the Condor batch system, so to run your jobs through Condor, do this:

makeflow -T condor example.makeflow
2016/10/19 09:30:30.60 makeflow[57454] notice: The working directory is '/afs/crc.nd.edu/user/n/nkremerh/tutorial':
2016/10/19 09:30:30.60 makeflow[57454] notice: This won't work because Condor is not able to write to files in AFS.
2016/10/19 09:30:30.60 makeflow[57454] notice: Instead, run makeflow from a local disk like /tmp.
2016/10/19 09:30:30.60 makeflow[57454] notice: Or, use the Work Queue with -T wq and condor_submit_workers.

As you can see each batch system has peculiarities, so to use Condor move to /tmp:

mkdir -p /tmp/$USER/tutorial
cd /tmp/$USER/tutorial
cp $HOME/tutorial/* ./
makeflow -T condor example.makeflow

If you are working at another site that uses SLURM or Torque or SGE, then you would invoke Makeflow like this instead:

makeflow -T slurm example.makeflow
makeflow -T torque example.makeflow
makeflow -T sge example.makeflow

Note: For the last three listed system, a shared file system is assumed. If you were to attempt SGE you would either want to move to /scratch or $HOME.

We support many other batch types, so please check out our documentation if your preferred batch system wasn't listed.

Running Makeflow with Work Queue

You will notice that a workflow can run very slowly if you submit each job individually. To get around this limitation, we provide the Work Queue system. This allows Makeflow to function as a master process that quickly dispatches work to remote worker processes.

makeflow -c example.makeflow
makeflow -T wq example.makeflow -p 0
listening for workers on port XXXX.
...

Now open up another shell and run a single worker process:

work_queue_worker crcfe01.crc.nd.edu XXXX

Go back to your first shell and observe that the makeflow has finished. Of course, remembering port numbers all the time gets old fast, so try the same thing again, but using a project name:

makeflow -c example.makeflow
makeflow -T wq example.makeflow -N project-$USER
listening for workers on port XXXX
...

Now open up another shell and run your worker with a project name:

work_queue_worker -N project-$USER

Running Makeflow with Work Queue Factory

Of course, we don't really want to manually start workers, so let's instead use work_queue_factory to set up workers in Condor for us:

work_queue_factory -T condor -N project-$USER

2016/10/19 09:30:00: |submitted: 0 |needed: 5 |waiting connection: 0 |requested: 5
PROJECT            HOST                   PORT WAITING RUNNING COMPLETE WORKERS
masters:
nkremerh           crcfe01.crc.nd.edu     9000       4       0        0       0

Use the condor_q command to observe that they are submitted to Condor:

condor_q -submitter $USER
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
85177.0   nkremerh       10/19 09:30   0+00:00:00 I  0   0.0  condor.sh ./work_q
85178.0   nkremerh       10/19 09:30   0+00:00:00 I  0   0.0  condor.sh ./work_q
85179.0   nkremerh       10/19 09:30   0+00:00:00 I  0   0.0  condor.sh ./work_q
85180.0   nkremerh       10/19 09:30   0+00:00:00 I  0   0.0  condor.sh ./work_q
85181.0   nkremerh       10/19 09:30   0+00:00:00 I  0   0.0  condor.sh ./work_q

Now, restart your makeflow and it will use the workers already running in Condor:

makeflow -c example.makeflow
makeflow -T wq example.makeflow -N project-$USER
listening for workers on port XXXX.
...

You can leave the workers running there, if you want to start another makeflow. They will remain until they have been idle for fifteen minutes, then will stop automatically.

If you add the -d all option to Makeflow, it will display debugging information that shows where each task was sent, when it was returned, and so forth:

makeflow -c example.makeflow
makeflow -T wq example.makeflow -N project-$USER -d all
listening for workers on port XXXX.

For information on using the Work Queue API, check out the Work Queue tutorial.