Cooperative Computing Lab

CCL Home

Software

Community

Operations

Makeflow Tutorial

Getting Started
Makeflow Example
Running Makeflow with Work Queue
Exercise
1. Running with Work Queue Workers on Torque

This tutorial will have you install CCTools into your FutureGrid home directory and will take you through some distributed computation examples using Makeflow.

Getting Started

For this tutorial, we assume you have an open SSH connection to the Future Grid login nodes. If you do not have an account with Future Grid, then you may register here.

In this tutorial, we will use the alamo login node:

ssh alamo.futuregrid.org

Download, Build, and Install CCTools

Navigate to the download page in your browser to review the most recent versions: http://www.nd.edu/~ccl/software/download.shtml

Setup a Sandbox for this Tutorial and Download a copy of CCTools 3.5.3

mkdir ~/cctools-tutorial
wget http://www.nd.edu/~ccl/software/files/cctools-3.5.3-RC1-source.tar.gz
tar xzf cctools-3.5.3-RC1-source.tar.gz

Build and Install CCTools

cd ~/cctools-3.5.3-RC1-source
./configure --prefix ~/cctools && make install

Set Environment Variables

You will need to add your CCTools directory to your $PATH:

export PATH=~/cctools/bin:${PATH}

Makeflow Example

Setup

mkdir ~/cctools-tutorial/makeflow-simple
cd ~/cctools-tutorial/makeflow-simple

Download simulation.py which is our application executable for this exercise. Download this Makeflow script which defines the workflow.

wget http://www.nd.edu/~ccl/software/tutorials/acic12/simple/simulation.py
wget http://www.nd.edu/~ccl/software/tutorials/acic12/simple/Makeflow

The Makeflow script should look like:

$ cat Makeflow
input.txt:
	LOCAL /bin/echo Hello World > input.txt

A: simulation.py input.txt
	python simulation.py 1 < input.txt > A
B: simulation.py input.txt
	python simulation.py 2 < input.txt > B
C: simulation.py input.txt
	python simulation.py 3 < input.txt > C
D: simulation.py input.txt
	python simulation.py 4 < input.txt > D

Running with Local (Multiprocess) Execution

Here we're going to tell Makeflow to dispatch the jobs using regular local processes (no distributed computing!). This is basically the same as regular Unix Make using the -j flag.

makeflow -T local

If everything worked out correctly, you should see:

$ makeflow -T local
/bin/echo Hello World > input.txt
python simulation.py 4 < input.txt > D
python simulation.py 3 < input.txt > C
python simulation.py 2 < input.txt > B
python simulation.py 1 < input.txt > A
nothing left to do.

Running with FutureGrid's Torque

The following code tells Makeflow to dispatch jobs using the Torque batch submission system (qsub, qdel, qstat, etc.).

makeflow -T torque

You will get as output:

$ makeflow -T torque
nothing left to do.

Well... that's not right. Nothing was run! We need to clean out the generated output files and logs so Makeflow starts from a clean slate again:

makeflow -c

We see it deleted the files we generated in the last run:

$ makeflow -c
deleted file D
deleted file C
deleted file B
deleted file A
deleted file input.txt
deleted file ./Makeflow.makeflowlog

Now let's try again:

makeflow -T torque

We get the output we expect:

$ makeflow -T torque
/bin/echo Hello World > input.txt
python simulation.py 4 < input.txt > D
python simulation.py 3 < input.txt > C
python simulation.py 2 < input.txt > B
python simulation.py 1 < input.txt > A
nothing left to do.

Notice that the output is no different from using local execution. Makeflow is built to be execution engine agnostic. There is no difference between executing the task locally or remotely.

In this case, we can confirm that the job was run on another host by looking at the output produced by the simulation:

$ cat D
Running on host c056.cm.cluster
Starting 2012 9 Sep 15:48:22
x = 2.0
x = 1.41421356237
x = 1.189207115
x = 1.09050773267
x = 1.04427378243
Finished 2012 9 Sep 15:48:27

Here we see that the worker ran on node c056.cm.cluster. It took 5 seconds to complete.

Running Makeflow with Work Queue

The submission and wait times for the Makeflow tasks in the above case will vary because of the latencies in the underlying batch job submission platform (Torque). To avoid long submission and wait times, Makeflow can be run using Work Queue. Work Queue excels at handling low latency and short turn-around time jobs.

Here, we will start Makeflow which will setup a Work Queue master on an arbitrary port using -p 0. And, we will turn on the debugging output to see what happens as it runs.

makeflow -c
makeflow -T wq -p 0 -d all

You should see output like this:


2012/07/30 12:18:54.30 [10131] makeflow: debug: checking for duplicate targets...
2012/07/30 12:18:54.30 [10131] makeflow: debug: checking rules for consistency...
2012/07/30 12:18:54.30 [10131] makeflow: tcp: listening on port XXXX
2012/07/30 12:18:54.30 [10131] makeflow: wq: Work Queue is listening on port XXXX.
2012/07/30 12:18:54.30 [10131] makeflow: batch: started process 10132: /bin/echo Hello World > input.txt
2012/07/30 12:18:54.30 [10131] makeflow: debug: node 0 waiting -> running
...

Now, run the work_queue_worker with the port the master is listening on.

work_queue_worker -t 10 localhost XXXX

When the tasks are finished, the worker should quit due to the 10 second timeout.

Exercise

Running Makeflow with Work Queue Workers on Torque

The goal of this exercise is to setup Work Queue workers on Torque compute nodes. Here we submit the worker tasks using the torque_submit_workers executable. To know more about the options and arguments for torque_submit_workers, do:

torque_submit_workers -h

In this exercise, we will use the catalog server and a project name so workers can find the master without being provided with the master's hostname and port. We force Makeflow and the Work Queue workers to use the catalog server by specifying the -a option. We then specify a project name for the Makeflow script using the -N option that takes a string as an argument. The workers are then provided with the project name to connect to using the same -N option.

NOTE: Pick your own distinct project name for MYPROJECT.

makeflow -c
torque_submit_workers -t 300 -a -N MYPROJECT 5
makeflow -T wq -a -N MYPROJECT -d all -o makeflow.debug > /dev/null 2> /dev/null &