Cooperative Computing Lab
CCL | Software | Install | Manuals | Forum | Papers
CCL Home

Research

Software Community Operations

Makeflow Tutorial

This tutorial will have you install CCTools and will take you through some distributed computation examples using Makeflow.

  1. Getting Started
    1. Download, Build, and Install CCTools
    2. Set Environment Variables
  2. Makeflow Example
    1. Setup
    2. Running with Local (Multiprocess) Execution
  3. Running Makeflow with Work Queue
    1. Running with a single Work Queue Worker
    2. Running with multiple Work Queue Workers on EC2
  4. Exercise: Running Makeflow with Work Queue using Catalog Server
  5. Homework: Running Makeflow at your institution

Getting Started

First, login in to the below head node where you will install CCTools and run this tutorial.

ssh USERNAME@ec2-23-23-23-169.compute-1.amazonaws.com

Please replace USERNAME with the username in the sheet of paper handed out to you. Use the password given there as well.

Download, Build, and Install CCTools

wget http://www.nd.edu/~ccl/software/files/cctools-3.7.2-source.tar.gz tar xzf cctools-3.7.2-source.tar.gz cd ~/cctools-3.7.2-source ./configure --prefix ~/cctools && make install

Set Environment Variables

You will need to add your CCTools directory to your $PATH:

export PATH=~/cctools/bin:${PATH}

If these commands executed successfully, the follwing command should print out the version string (3.7.2 RELEASE) of Makeflow and information from its build step.

$ makeflow -v makeflow version 3.7.2 RELEASE (released 04/23/2013) Built by ubuntu@ip-10-147-219-157 on May 10 2013 at 15:26:58 System: Linux ip-10-147-219-157 2.6.32-312-ec2 #24-Ubuntu SMP Fri Jan 7 18:30:50 UTC 2011 x86_64 GNU/Linux Configuration: --prefix /home/user1135/cctools/afs/nd.edu/user37/ccl/software/cctools/bin/makeflow

Makeflow Example

Setup

mkdir ~/cctools-tutorial mkdir ~/cctools-tutorial/makeflow-simple cd ~/cctools-tutorial/makeflow-simple

Download simulation.py which is our application executable for this exercise. Download this Makeflow script which defines the workflow.

wget http://www.nd.edu/~ccl/software/tutorials/ccgrid13/makeflow/simulation.py wget http://www.nd.edu/~ccl/software/tutorials/ccgrid13/makeflow/Makeflow

The Makeflow script should look like:

$ cat Makeflow input.txt: LOCAL /bin/echo Start Simulation > input.txt A: simulation.py input.txt python simulation.py 1 < input.txt > A B: simulation.py input.txt python simulation.py 2 < input.txt > B C: simulation.py input.txt python simulation.py 3 < input.txt > C D: simulation.py input.txt python simulation.py 4 < input.txt > D

Running with Local (Multiprocess) Execution

Here we're going to tell Makeflow to dispatch the jobs using regular local processes (no distributed computing!). This is basically the same as regular Unix Make using the -j flag.

makeflow -T local

If everything worked out correctly, you should see:

$ makeflow -T local /bin/echo Start Simulation > input.txt python simulation.py 4 < input.txt > D python simulation.py 3 < input.txt > C python simulation.py 2 < input.txt > B python simulation.py 1 < input.txt > A nothing left to do.

Running Makeflow with Work Queue

The above example illustrated how Makeflows are easy to write and execute. However, running Makeflows locally can incur performance penalties if the number of tasks are signficantly larger than the cores available for local execution.

In this section, we will illustrate the steps to run the above Makeflow using EC2 instances for parallel execution of its tasks. We will use Work Queue to distribute its tasks to EC2 instances for parallel execution. Work Queue excels at handling low latency and short turn-around time jobs.

Running with a single Work Queue Worker

Here, we will invoke Makeflow to setup a Work Queue master listening on an arbitrary port using -p 0.

makeflow -c makeflow -T wq -p 0

You should see output like this:

$ makeflow -T wq -p 0 listening for workers on port 1024. /bin/echo Start Simulation > input.txt python simulation.py 4 < input.txt > D python simulation.py 3 < input.txt > C python simulation.py 2 < input.txt > B python simulation.py 1 < input.txt > A

To start a work_queue_worker for this master, open another terminal window and login into the EC2 head node with your given username:

ssh USERNAME@ec2-23-23-23-169.compute-1.amazonaws.com

As before, replace USERNAME with the username provided to you (and its password).

Add the CCTools directory to our $PATH as before:

export PATH=~/cctools/bin:${PATH}

Now, start a work_queue_worker for this Makeflow by giving it the port the master is listening on. Let us also enable the debugging output using -d all

work_queue_worker -t 10 -d all localhost XXXX ...

replacing XXXX with the port the Makeflow master is listening on. When the tasks are finished, the worker should quit due to the 10 second timeout.

Note that the Makeflow still executed locally since the work_queue_worker was run on the same EC2 instance as the Makeflow master. The section below describes how to start several work_queue_worker processes on multiple EC2 instances.

Running with Multiple Work Queue Workers on EC2

Here we're going to show how to start work_queue_worker processes on multiple EC2 instances.

Logging into each EC2 instance to start a worker process can get quickly cumbersome. So we have a ec2_submit_workers script (provided in cctools) that creates EC2 instances and starts workers on them.

First, restart the Makeflow program again. This time we enable the printing of debug messages in Makeflow to get a sense of what happens during execution.

makeflow -c makeflow -T wq -p 0 -d all

Then, from the other open terminal, run the ec2_submit_workers script to start 4 workers on as many EC2 instances:

bash ec2_submit_workers -t 10 ec2-23-23-23-169.compute-1.amazonaws.com XXXX ccgrid-tutorial ~/.ssh/ccgrid-tutorial.pem 4

where XXXX is the port of Work Queue master program.

Exercise: Running Makeflow with Work Queue using Catalog Server

Keeping track of port numbers and specifying them correctly when starting workers for each master can get painful quickly. To overcome this, we have a catalog server that can help workers find Work Queue masters without using port numbers.

To use the catalog server, a project name for the Work Queue master needs to be specified and advertised to the catalog server. In Makeflow, the project name can be set using the -N command line option. The -a option will force Makeflow to advertise its project name to the catalog server.

Let us rerun the Makeflow to use the catalog server:

makeflow -c makeflow -T wq -p 0 -d all -a -N $USER.makeflow

This assigns a project name containing your username with which you logged in.

Similarly, we will need to use -a and -N options to specify the project name to connect for the workers:

work_queue_worker -t 10 -d all -a -N $USER.makeflow

Note that we no longer have to specify the hostname and port of the Work Queue master created by the Makeflow.

The same options apply when you want to submit multiple workers to run on EC2 instances.

bash ec2_submit_workers -t 10 -a -N $USER.makeflow ccgrid-tutorial ~/.ssh/ccgrid-tutorial.pem 4

Homework: Running Makeflow at your institution

Makeflow can also be setup to dispatch its jobs to a cluster or grid (Condor, SGE, PBS, Torque). Makeflow will then invoke the corresponding batch submission interface (qsub, msub, qdel, etc.,) to submit and monitor its jobs.

Makeflow supports the following clusters and grids for job dispatch and execution: condor, sge, moab, torque, hadoop.

Find out which one of these clusters/grids is run in your institution. Run Makeflow on it by invoking the "-T" option. For example, if you have access to a Condor pool and want to run your Makeflow on this pool, you will run:

makeflow -T condor

More information about the support for different batch systems and options in Makeflow can be found in its help message:

makeflow -h