Cooperative Computing Lab
CCL | Software | Install | Manuals | Forum | Papers
CCL Home

Research

Software Community Operations

Makeflow Tutorial

  1. Getting Started
    1. Login to the Future Grid Head Node
    2. Download, Build and Install CCTools
    3. Set Environment Variables
  2. Makeflow Example
    1. Setup
    2. Running with Local (Multiprocess) Execution
    3. Running with FutureGrid's Torque
  3. Running Makeflow with Work Queue
  4. Exercise
    1. Running with Work Queue Workers on Torque

This tutorial will have you install CCTools into your FutureGrid home directory and will take you through some distributed computation examples using Makeflow.

Getting Started

Login to the Future Grid Head Node

For this tutorial, we assume you have SSH access to the Future Grid login nodes. If you do not have an account with Future Grid, then you may register here.

In this tutorial, we will use the lima login node in FutureGrid:

ssh lima.futuregrid.org

Download, Build, and Install CCTools

Navigate to the download page in your browser to review the most recent versions: http://www.nd.edu/~ccl/software/download.shtml

Setup a Sandbox for this Tutorial and Download a copy of CCTools 4.0.2

mkdir ~/cctools-tutorial wget http://www.nd.edu/~ccl/software/files/cctools-4.0.2-source.tar.gz tar xzf cctools-4.0.2-source.tar.gz

Build and Install CCTools

cd ~/cctools-4.0.2-source ./configure --prefix ~/cctools && make install

Set Environment Variables

You will need to add your CCTools directory to your $PATH:

export PATH=~/cctools/bin:${PATH}

Makeflow Example

Setup

mkdir ~/cctools-tutorial/makeflow-simple cd ~/cctools-tutorial/makeflow-simple

Download simulation.py which is our application executable for this exercise. Download this Makeflow script which defines the workflow.

wget http://www.nd.edu/~ccl/software/tutorials/acic13/simple/simulation.py wget http://www.nd.edu/~ccl/software/tutorials/acic13/simple/Makeflow

The Makeflow script should look like:

$ cat Makeflow input.txt: LOCAL /bin/echo Hello World > input.txt A: simulation.py input.txt python simulation.py 1 < input.txt > A B: simulation.py input.txt python simulation.py 2 < input.txt > B C: simulation.py input.txt python simulation.py 3 < input.txt > C D: simulation.py input.txt python simulation.py 4 < input.txt > D

Running with Local (Multiprocess) Execution

Here we're going to tell Makeflow to dispatch the jobs using regular local processes (no distributed computing!). This is basically the same as regular Unix Make using the -j flag.

makeflow -T local

If everything worked out correctly, you should see:

$ makeflow -T local /bin/echo Hello World > input.txt python simulation.py 4 < input.txt > D python simulation.py 3 < input.txt > C python simulation.py 2 < input.txt > B python simulation.py 1 < input.txt > A nothing left to do.

Running with FutureGrid's Torque

The following code tells Makeflow to dispatch jobs using the Torque batch submission system (qsub, qdel, qstat, etc.).

makeflow -T torque

You will get as output:

$ makeflow -T torque nothing left to do.

Well... that's not right. Nothing was run! We need to clean out the generated output files and logs so Makeflow starts from a clean slate again:

makeflow -c

We see it deleted the files we generated in the last run:

$ makeflow -c deleted file D deleted file C deleted file B deleted file A deleted file input.txt deleted file ./Makeflow.makeflowlog

Now let's try again:

makeflow -T torque

We get the output we expect:

$ makeflow -T torque /bin/echo Hello World > input.txt python simulation.py 4 < input.txt > D python simulation.py 3 < input.txt > C python simulation.py 2 < input.txt > B python simulation.py 1 < input.txt > A nothing left to do.

Notice that the output is no different from using local execution. Makeflow is built to be execution engine agnostic. There is no difference between executing the task locally or remotely.

In this case, we can confirm that the job was run on another host by looking at the output produced by the simulation:

$ cat D Running on host i62 Starting 2013 30 Sep 18:01:14 x = 2.0 x = 1.41421356237 x = 1.189207115 x = 1.09050773267 x = 1.04427378243 Finished 2013 30 Sep 18:01:19

Here we see that the task ran on node i62. It took 5 seconds to complete.

Running Makeflow with Work Queue

The submission and wait times for the Makeflow tasks in the above case will vary because of the latencies in the underlying batch job submission platform (Torque). To avoid long submission and wait times, Makeflow can be run using Work Queue. Work Queue excels at handling low latency and short turn-around time jobs.

Here, we will start Makeflow which will setup a Work Queue master on an arbitrary port using -p 0.

makeflow -c makeflow -T wq -p 0

You should see output like this:

listening for workers on port 1024. /bin/echo Hello World > input.txt python simulation.py 4 < input.txt > D python simulation.py 3 < input.txt > C python simulation.py 2 < input.txt > B python simulation.py 1 < input.txt > A

To start a work_queue_worker for this master, open another terminal window and login into the lima login node:

ssh lima.futuregrid.org

Add the CCTools directory to our $PATH as before:

export PATH=~/cctools/bin:${PATH}

Now, start a work_queue_worker for this Makeflow by giving it the port the master is listening on. Let us also enable the debugging output using -d all.

work_queue_worker -t 10 -d all localhost XXXX ...

replacing XXXX with the port the Makeflow master is listening on. When the tasks are finished, the worker should quit due to the 10 second timeout.

Note that the Makeflow still executed locally since the work_queue_worker was run on the same node (lima) as the Makeflow master. The exercise below shows how to start several work_queue_worker processes on the Torque cluster.

Exercise

Running Makeflow with Work Queue Workers on Torque

The goal of this exercise is to setup Work Queue workers on Torque compute nodes. Here we submit the worker tasks using the torque_submit_workers executable. To know more about the options and arguments for torque_submit_workers, do:

torque_submit_workers -h

In this exercise, we will also use the catalog server and a project name so workers can find the master without being provided with the master's hostname and port. We force Makeflow and the Work Queue workers to use the catalog server by specifying the -a option. We then specify a project name for the Makeflow script using the -N option that takes a string as an argument. The workers are then provided with the project name to connect using the same -N option.

NOTE: Pick your own distinct project name for MYPROJECT.

makeflow -c torque_submit_workers -t 300 -a -N MYPROJECT 5 makeflow -T wq -a -N MYPROJECT