Cooperative Computing Lab
CCL | Software | Install | Manuals | Forum | Papers
CCL Home

Research

Software Community Operations

Work Queue Tutorial

  1. Getting Started
    1. Tutorial Setup
    2. Set Environment Variables
  2. Work Queue Example
    1. Setup
    2. Description
    3. Running

This tutorial will have you install CCTools into your AFS home directory and will take you through some distributed computation examples using Work Queue.

Getting Started

Tutorial Setup

Most of the setup required for this tutorial was already installed in the Makeflow tutorial.

Make sure you are logged into one of the CRC login nodes. If not, do:

> ssh newcell.crc.nd.edu

You can also use opteron.crc.nd.edu or stats.crc.nd.edu.

NOTE: If you do not have a CRC account, you can still do most of the tutorial on a local linux machine, including everything except the section where workers are submitted to CRC.

Set Environment Variables

For this tutorial, we will (again) add the CCTools directory to our $PATH:

> setenv PATH ~/cctools/bin:${PATH}

We will use both Python and Perl in this tutorial. They need to be able to find our installed packages. So set this environment variable for running the Python examples:

> setenv PYTHONPATH ~/cctools/lib/python2.4/site-packages

And this environment variable for running the Perl examples:

> setenv PERL5LIB ~/cctools/lib/perl5/site_perl

Work Queue Example

Write a Work Queue program to check if Shakespearean language still exists

It is simple to write a program in C/Perl/Python (or any language with appropriate Work Queue bindings) which can generate tasks to be queued for Work Queue. In this example, we will write a program using Work Queue to check if Shakespearean language still exists today (similar to the Makeflow tutorial) using word-compare.py.

At the conclusion of this example, participants will:

  • learn about the Work Queue API and their use
  • able to write Work Queue programs for executing workflows.

Setup

> mkdir ~/cctools-tutorial/wq > cd ~/cctools-tutorial/wq

Download the plays of Shakespeare and the Python executable to compare against words in the dictionary.

> wget http://www.nd.edu/~ccl/software/tutorials/ndtut12/wq/shakespeare-text.tgz ... > wget http://www.nd.edu/~ccl/software/tutorials/ndtut12/wq/word-compare.py ...

Unzip the archive file shakespeare-text.tgz to get the individual files - hamlet.txt, macbeth.txt, othello.txt, julius-caesar.txt, and king-lear.txt:

> tar xzf shakespeare-text.tgz

Download the Work Queue program used in this tutorial.

> wget http://www.nd.edu/~ccl/software/tutorials/ndtut12/wq/shakespeare.py ...

Description

We will now analyze the code in shakespeare.py that compares the words in each of the downloaded works of Shakespeare with the words currently in the English dictionary and identifies those words that are still used. This Work Program is implemented using the Python bindings for the Work Queue API.

> cat shakespeare.py #!/usr/bin/env python from work_queue import * import sys input_files = ["hamlet.txt", "othello.txt", "macbeth.txt", "julius-caesar.txt", "king-lear.txt"] try: Q = WorkQueue(port = 0) except: print "could not instantiate Work Queue master" sys.exit(1) print "Listening on port %d." % Q.port print "Submitting tasks..." for infile in input_file: outfile = "%s.checks" % infile command = "python word-compare.py %s > %s" % (infile, outfile) T = Task(command) T.specify_file("word-compare.py", "word-compare.py", WORK_QUEUE_INPUT, cache = True) T.specify_file(infile, infile, WORK_QUEUE_INPUT, cache = False) T.specify_file(outfile, outfile, WORK_QUEUE_OUTPUT, cache = False) taskid = Q.submit(T) print "done." print "Waiting for tasks to complete..." while not Q.empty(): T = Q.wait(5) if T: print "task (id# %d) complete: %s (return code %d)" % (T.id, T.command, T.return_status) print "done."

Here we load the work_queue Python binding. Python will look in PYTHONPATH which is setup in your environment.

This instantiates a Work Queue master to which you may submit work. Setting the port to 0 instructs Work Queue to pick an arbitrary port to listen on.

We create a task which takes a command argument. The first task created in this workflow will have the command:

T = Task("python word-compare.py hamlet.txt > hamlet.txt.checks");

Each task usually depends on a number of files to run. These include the executable and any input files. Here we specify the word-compare.py executable and its input infile. Notice that we specify both word-compare.py and infile by calling specify_file twice. The first argument is the name of the file on the master and the second argument is the name of the file we want created on the worker. Usually these filenames will be the same as in this example.

We specify the output file, outfile, which we want transferred back to the master.

At this point we have finished the description of our task and it is ready to be submitted for execution on the Work Queue workers. Q.submit submits this task.

At this point we wish to wait for all submitted tasks to complete. So long as the queue is not empty, we continue to call Q.wait waiting for the result of a task we submitted.

Here we call Q.wait(5) which takes a timeout argument. The call to wait will return a finished task which allows us to analyze the return_status or output. In this example, we set the timeout to 5 seconds which allows our application to do other things if a task is taking an inordinate amount of time to complete. We could have used the constant WORK_QUEUE_WAITFORTASK to wait indefinitely until a task completes.

Perl equivalent

The Perl equivalent of the above program is in shakespeare.pl and can be downloaded using:

> wget http://www.nd.edu/~ccl/software/tutorials/ndtut12/wq/shakespeare.pl ...

This program is shown below:

> cat shakespeare.pl #!/usr/bin/perl use work_queue; my @input_files = ('hamlet.txt', 'othello.txt', 'macbeth.txt', 'julius-caesar.txt', 'king-lear.txt'); my $q = work_queue_create(0); if (not defined($q)) { print "could not instantiate Work Queue master\n"; exit 1; } $port = work_queue_port($q); print "Listening on port $port.\n"; print "Submitting tasks..."; foreach $infile (@input_files) { my $outfile = sprintf("%s.out", $infile); my $command = "python word-compare.py $infile > $outfile"; my $t = work_queue_task_create($command); work_queue_task_specify_file($t,"word-compare.py","word-compare.py",$WORK_QUEUE_INPUT,$WORK_QUEUE_CACHE); work_queue_task_specify_file($t,$infile,$infile,$WORK_QUEUE_INPUT,$WORK_QUEUE_NOCACHE); work_queue_task_specify_file($t,$outfile,$outfile,$WORK_QUEUE_OUTPUT,$WORK_QUEUE_NOCACHE); my $taskid = work_queue_submit($q, $t); } print "done.\n"; print "Waiting for tasks to complete...\n"; while (not work_queue_empty($q)) { my $t = work_queue_wait($q, 5); if (defined($t)) { print "task (id#$t->{taskid}) complete:$t->{command_line} (return code $t->{return_status})\n"; work_queue_task_delete($t); } } print "done.\n"; work_queue_delete($q); exit 0;

We remove the created task instance when we have retrieved its output and the task instance is no longer required. (Note that the call to work_queue_task_delete is automatically done in the Python bindings when the task object goes out of scope.)

We delete the Work Queue instance when it is no longer required. That is, after all tasks in this program have finished and been retrieved. (Note that the call to work_queue_delete is automatically done in the Python bindings when the queue object goes out of scope.)

Running

First, determine the name of the host (machine) on which we will be running the Work Queue program:

> hostname -f YYYY.crc.nd.edu

We are now going to run the Work Queue program shown above for checking if Shakespearean language is in use. Pick one of the programs to run:

To run the Python program (shakespeare.py) do: To run the Perl program (shakespeare.pl) do:
> python shakespeare.py > perl shakespeare.pl

When a Work Queue program is run, it prints the port on which it is listening for connections from the workers. For example:

> python shakespeare.py Listening on port XXXX. > perl shakespeare.pl Listening on port XXXX.

We have successfully started a master and found the port and hostname on which it is listening. The master now is waiting for connections from workers so that it can dispatch the submitted tasks for execution.

To start a work_queue_worker for this master, first open another terminal window and login into one of the CRC login nodes:

> ssh opteron.crc.nd.edu

Add the CCTools directory to our $PATH as before:

> setenv PATH ~/cctools/bin:${PATH}

In this newly opened terminal, start a work_queue_worker for the master by giving it the port and hostname of the master.

> work_queue_worker -t 5 -d all YYYY.crc.nd.edu XXXX ...

replacing XXXX with the port the Work Queue master program is listening on and YYYY with the name of the machine on which it is running.

The debug output for the worker will appear on your terminal. When the tasks are finished, the worker will quit after the 5 second timeout.

If you encounter an error, be sure you did not forget to setup your environment.

Running Work Queue workers in Notre Dame CRC cluster

NOTE: To do this section, you MUST be logged into a CRC machine (newcell.crc.nd.edu, opteron.crc.nd.edu, etc...)

Alternatively, we can also submit workers to the ND CRC cluster using the sge_submit_workers executable (it is installed in ~/cctools during the intial setup).

Make sure we have the name of the host (machine) on which we will be running the Work Queue program:

> hostname -f YYYY.crc.nd.edu

Start one of the Work Queue master programs again:

> python shakespeare.py Listening on port XXXX. > perl shakespeare.pl Listening on port XXXX.

Now, goto the second terminal and start 5 workers on the ND CRC cluster with a 10 second timeout as:

> sge_submit_workers -t 10 YYYY.crc.nd.edu XXXX 5 ...

replacing XXXX with the port the Work Queue master program is listening on and YYYY with the name of the machine on which it is running.