Cooperative Computing Lab
CCL | Software | Install | Manuals | Forum | Papers
CCL Home

Research

Software Community Operations

Makeflow Tutorial

  1. Getting Started
    1. Login to the Future Grid Head Node
    2. Download, Build and Install CCTools
    3. Set Environment Variables
  2. Makeflow Example
    1. Setup
    2. Running with Local (Multiprocess) Execution
    3. Running with FutureGrid's Torque
  3. Running Makeflow with Work Queue
  4. Exercise
    1. Running with Work Queue Workers on Torque

This tutorial will have you install CCTools into your FutureGrid home directory and will take you through some distributed computation examples using Makeflow.

Getting Started

Login to the Future Grid Head Node

For this tutorial, we assume you have an open SSH connection to the Future Grid login nodes. If you do not have an account with Future Grid, then you may register here.

In this tutorial, we will use the alamo login node:

ssh alamo.futuregrid.org

Download, Build, and Install CCTools

Navigate to the download page in your browser to review the most recent versions: http://www.nd.edu/~ccl/software/download.shtml

Setup a Sandbox for this Tutorial and Download a copy of CCTools 3.5.3

mkdir ~/cctools-tutorial wget http://www.nd.edu/~ccl/software/files/cctools-3.5.3-RC1-source.tar.gz tar xzf cctools-3.5.3-RC1-source.tar.gz

Build and Install CCTools

cd ~/cctools-3.5.3-RC1-source ./configure --prefix ~/cctools && make install

Set Environment Variables

You will need to add your CCTools directory to your $PATH:

export PATH=~/cctools/bin:${PATH}

Makeflow Example

Setup

mkdir ~/cctools-tutorial/makeflow-simple cd ~/cctools-tutorial/makeflow-simple

Download simulation.py which is our application executable for this exercise. Download this Makeflow script which defines the workflow.

wget http://www.nd.edu/~ccl/software/tutorials/acic12/simple/simulation.py wget http://www.nd.edu/~ccl/software/tutorials/acic12/simple/Makeflow

The Makeflow script should look like:

$ cat Makeflow input.txt: LOCAL /bin/echo Hello World > input.txt A: simulation.py input.txt python simulation.py 1 < input.txt > A B: simulation.py input.txt python simulation.py 2 < input.txt > B C: simulation.py input.txt python simulation.py 3 < input.txt > C D: simulation.py input.txt python simulation.py 4 < input.txt > D

Running with Local (Multiprocess) Execution

Here we're going to tell Makeflow to dispatch the jobs using regular local processes (no distributed computing!). This is basically the same as regular Unix Make using the -j flag.

makeflow -T local

If everything worked out correctly, you should see:

$ makeflow -T local /bin/echo Hello World > input.txt python simulation.py 4 < input.txt > D python simulation.py 3 < input.txt > C python simulation.py 2 < input.txt > B python simulation.py 1 < input.txt > A nothing left to do.

Running with FutureGrid's Torque

The following code tells Makeflow to dispatch jobs using the Torque batch submission system (qsub, qdel, qstat, etc.).

makeflow -T torque

You will get as output:

$ makeflow -T torque nothing left to do.

Well... that's not right. Nothing was run! We need to clean out the generated output files and logs so Makeflow starts from a clean slate again:

makeflow -c

We see it deleted the files we generated in the last run:

$ makeflow -c deleted file D deleted file C deleted file B deleted file A deleted file input.txt deleted file ./Makeflow.makeflowlog

Now let's try again:

makeflow -T torque

We get the output we expect:

$ makeflow -T torque /bin/echo Hello World > input.txt python simulation.py 4 < input.txt > D python simulation.py 3 < input.txt > C python simulation.py 2 < input.txt > B python simulation.py 1 < input.txt > A nothing left to do.

Notice that the output is no different from using local execution. Makeflow is built to be execution engine agnostic. There is no difference between executing the task locally or remotely.

In this case, we can confirm that the job was run on another host by looking at the output produced by the simulation:

$ cat D Running on host c056.cm.cluster Starting 2012 9 Sep 15:48:22 x = 2.0 x = 1.41421356237 x = 1.189207115 x = 1.09050773267 x = 1.04427378243 Finished 2012 9 Sep 15:48:27

Here we see that the worker ran on node c056.cm.cluster. It took 5 seconds to complete.

Running Makeflow with Work Queue

The submission and wait times for the Makeflow tasks in the above case will vary because of the latencies in the underlying batch job submission platform (Torque). To avoid long submission and wait times, Makeflow can be run using Work Queue. Work Queue excels at handling low latency and short turn-around time jobs.

Here, we will start Makeflow which will setup a Work Queue master on an arbitrary port using -p 0. And, we will turn on the debugging output to see what happens as it runs.

makeflow -c makeflow -T wq -p 0 -d all

You should see output like this:

2012/07/30 12:18:54.30 [10131] makeflow: debug: checking for duplicate targets... 2012/07/30 12:18:54.30 [10131] makeflow: debug: checking rules for consistency... 2012/07/30 12:18:54.30 [10131] makeflow: tcp: listening on port XXXX 2012/07/30 12:18:54.30 [10131] makeflow: wq: Work Queue is listening on port XXXX. 2012/07/30 12:18:54.30 [10131] makeflow: batch: started process 10132: /bin/echo Hello World > input.txt 2012/07/30 12:18:54.30 [10131] makeflow: debug: node 0 waiting -> running ...

Now, run the work_queue_worker with the port the master is listening on.

work_queue_worker -t 10 localhost XXXX

When the tasks are finished, the worker should quit due to the 10 second timeout.

Exercise

Running Makeflow with Work Queue Workers on Torque

The goal of this exercise is to setup Work Queue workers on Torque compute nodes. Here we submit the worker tasks using the torque_submit_workers executable. To know more about the options and arguments for torque_submit_workers, do:

torque_submit_workers -h

In this exercise, we will use the catalog server and a project name so workers can find the master without being provided with the master's hostname and port. We force Makeflow and the Work Queue workers to use the catalog server by specifying the -a option. We then specify a project name for the Makeflow script using the -N option that takes a string as an argument. The workers are then provided with the project name to connect to using the same -N option.

NOTE: Pick your own distinct project name for MYPROJECT.

makeflow -c torque_submit_workers -t 300 -a -N MYPROJECT 5 makeflow -T wq -a -N MYPROJECT -d all -o makeflow.debug > /dev/null 2> /dev/null &