Makeflow Tutorial
- Getting Started
- Login to a CRC Head Node
- Download, Build and Install CCTools
- Set Environment Variables
- Makeflow Example
- Setup
- Running with Local (Multiprocess) Execution
- Running with CRC's SGE
This tutorial will have you install CCTools into your CRC home directory and will take you through some distributed computation examples using Makeflow.
Getting Started
Login to a CRC Head Node
For this tutorial, we assume you have an open SSH connection to the CRC login nodes. If you do not have an account with CRC, then you may register here.
NOTE: If you do not have a CRC account, you can still do most of the tutorial on a local linux machine, including everything except the section where tasks are submitted to SGE.
In this tutorial, we will use the newcell login node:
> ssh newcell.crc.nd.edu
You can also use opteron.crc.nd.edu or stats.crc.nd.edu.
Download, Build, and Install CCTools
Navigate to the download page in your browser to review the most recent versions: http://www.nd.edu/~ccl/software/download.shtml
Setup a Sandbox for this Tutorial and Download a copy of CCTools 3.6.1
> mkdir ~/cctools-tutorial
> wget http://nd.edu/~ccl/software/files/cctools-3.6.1-source.tar.gz
...
> tar xzf cctools-3.6.1-source.tar.gz
Build and Install CCTools
> cd ~/cctools-3.6.1-source
> ./configure
...
> make install
...
Set Environment Variables
You will need to add your CCTools directory to your $PATH:
> setenv PATH ~/cctools/bin:${PATH}
Makeflow Example
Write a Makeflow to check if Shakespearean language is still used
In this example we will setup and run a Makeflow script to analyze five Shakespearean plays and determine what words are still in use in modern English.
At the conclusion of this example, participants will be able to:
- Identify components of a workflow
- Execute Makeflow scripts on multiple systems
Setup
> mkdir ~/cctools-tutorial/makeflow
> cd ~/cctools-tutorial/makeflow
Download the following:
- word-compare.py: our application executable for this exercise. This checks the system dictionary for every word in the input file, and prints each word it finds.
- Makeflow script: that defines the workflow.
- shakespeare-text.tgz: an archive containing the words in Hamlet, MacBeth, Othello, Julius Caesar, and King Lear
> wget http://nd.edu/~ccl/software/tutorials/ndtut12/makeflow/word-compare.py
...
> wget http://nd.edu/~ccl/software/tutorials/ndtut12/makeflow/Makeflow
...
> wget http://nd.edu/~ccl/software/tutorials/ndtut12/makeflow/shakespeare-text.tgz
...
Next, unpack shakespeare-text.tar.gz.
> tar xzf shakespeare-text.tgz
The Makeflow script should look like:
> cat Makeflow
hamlet.checks: word-compare.py hamlet.txt
python word-compare.py hamlet.txt > hamlet.checks
macbeth.checks: word-compare.py macbeth.txt
python word-compare.py macbeth.txt > macbeth.checks
othello.checks: word-compare.py othello.txt
python word-compare.py othello.txt > othello.checks
julius-caesar.checks: word-compare.py julius-caesar.txt
python word-compare.py julius-caesar.txt > julius-caesar.checks
king-lear.checks: word-compare.py king-lear.txt
python word-compare.py king-lear.txt > king-lear.checks
This makeflow contains 5 rules. Each rule checks the system dictionary for every word in one of Shakespeare's most famous plays, and saves the result into a file. The dependencies for each rule include both the word comparison script and the input data to be analyzed.
Running with Local (Multiprocess) Execution
Here we're going to tell Makeflow to dispatch the jobs using regular local processes (no distributed computing!). This is basically the same as regular Unix Make using the -j flag.
> makeflow -T local
If everything worked out correctly, you should see:
> makeflow -T local
python word-compare.py king-lear.txt > king-lear.checks
python word-compare.py julius-caesar.txt > julius-caesar.checks
python word-compare.py othello.txt > othello.checks
python word-compare.py macbeth.txt > macbeth.checks
python word-compare.py hamlet.txt > hamlet.checks
nothing left to do.
Running with CRC's SGE
NOTE: To do this section, you MUST be logged into a CRC machine (newcell.crc.nd.edu, opteron.crc.nd.edu, etc...)
The following code tells Makeflow to dispatch jobs using the SGE batch submission system (qsub, qdel, qstat, etc.).
> makeflow -T sge
You will get as output:
> makeflow -T sge
nothing left to do.
Well... that's not right. Nothing was run! We need to clean out the generated output files and logs so Makeflow starts from a clean slate again:
> makeflow -c
We see it deleted the files we generated in the last run:
> makeflow -c
deleted file king-lear.checks
deleted file julius-caesar.checks
deleted file othello.checks
deleted file macbeth.checks
deleted file hamlet.checks
deleted file ./Makeflow.makeflowlog
Now let's try again:
> makeflow -T sge
We get the output we expect:
> makeflow -T sge
python word-compare.py king-lear.txt > king-lear.checks
python word-compare.py julius-caesar.txt > julius-caesar.checks
python word-compare.py othello.txt > othello.checks
python word-compare.py macbeth.txt > macbeth.checks
python word-compare.py hamlet.txt > hamlet.checks
nothing left to do.
Notice that the output is no different from using local execution. Makeflow is built to be execution engine agnostic. There is no difference between executing the task locally or remotely.
In this case, we can confirm that the job was run on another host by looking at the output produced by the simulation:
> head king-lear.checks
Running on host dqcneh081.crc.nd.edu
Starting 2012 23 Oct 14:40:09
a
abated
abatement
abhorred
abjure
able
abode
Here we see that the worker ran on node dqcneh081.crc.nd.edu.
|