Homework 2: Learning Work Queue

Option A: Write a Work Queue program to check if "Shakespeare speak" still exists
Option B: Configure Work Queue to run on multiple cloud platforms

This homework has two options. Please pick one option that you are comfortable with to complete this homework. Note: Only *ONE* of the two options needs to be submitted.

Use the instructions in the Work Queue user manual to install, setup, and learn more about Work Queue.

Option A

Write a Work Queue program to check if "Shakespeare speak" still exists

Prerequisites

Requires basic knowledge of Python.
Completion of the Using Work Queue on FutureGrid tutorial.

Objectives

At the conclusion of this assignment, you will be able to:

Decompose and execute workflows as smaller concurrent tasks.
Understand how the master-worker model of distributed execution works.
Write Work Queue programs for the execution of your workflows.

The Assignment

This assignment is similar to the Makeflow homework . In this assignment, you will write a Work Queue program to compare the words seen in the works of Shakespeare with the words currently in use in the English language and identify those words that are still in use.

You will compare the words of Shakespeare listed in each of these text files:

hamlet.txt (29714 words)
othello.txt (25333 words)
macbeth.txt (16912 words)
juliuscaesar.txt (19172 words)
kinglear.txt (25424 words)

Each file contains all the words found in a work of Shakespeare that is identified in the filename. For your convenience, each word is listed in a separate line to make it easier to scan and read them.

You will compare the words in each of these files with the dictionary provided in Unix systems (/usr/share/dict/words or /usr/dict/words). If a word is found to exist in the dictionary, you should print the word followed by the string "Art spoken ever and anon!". If not, you should print the word followed by the string "Ne'er spoken ever!".

To complete this assignment, you will write a Work Queue program that decomposes the problem into concurrent tasks. Your program should create and submit 5 tasks, each describing a search of the dictionary for the words in a file, or in other words, one task per file. You will run this master program on one of the Future Grid head nodes as described here.

Finally, you will then submit five Work Queue workers on the Future Grid Torque batch submission system as follows: torque_submit_workers MASTERHOST MASTERPORT 5 where MASTERHOST and MASTERPORT refer to the hostname and port where your Work Queue master program is listening.

Hint:

To read each file and compare the words listed in it against the dictionary, you can use this simple python program: shakespeare-compare.py. You will then have this program executed remotely by specifying the filename containing the words to check.

For example, to check if the words in hamlet.txt exist in the dictionary, run:

python shakespeare_compare.py hamlet.txt

Create concurrent tasks that check each of the files by executing this program.

Deliverables

You are required to submit the following for this option in the homework:

The Work Queue program that was created.

Option B

Configure Work Queue to run on multiple cloud platforms

Prerequisites

Requires access to deploy resources in Future Grid, HPC cluster at UA, iPlant, and Amazon EC2.
Requires access to a personal laptop.
Completion of the Using Work Queue on FutureGrid tutorial.

Objectives

At the conclusion of this assignment, you will be able to:

Setup and configure Work Queue to run on multiple platforms.
Harness resources from multiple heterogenous platforms to run your workflow.

The Assignment

For this assigment, you will need to download the following files:

Work Queue master program
simulation.py (used by the master program)

You will run the Work Queue master program on multiple platforms. The master program creates and submits 20 tasks that invoke simulation.py with different parameters for concurrent execution. To execute the 20 tasks created and submitted by the Work Queue master, you will start one worker on each of these platforms: Future Grid, HPC cluster at UA, iPlant, Amazon EC2, and personal laptop.

In this assignment, you will use the catalog server and project name feature to have the workers automatically find their master and establish connection.

To do this, the master needs to be provided with a project name that will be advertised to the catalog server. You will modify the Work Queue master program to specify a project name to the catalog server through the specify_name() API. An example of their usage is given below:

try:
   Q = WorkQueue(port = 0)
except:
   sys.exit(1)
	
Q.specify_name("dinesh-wq")

You will then start workers for your Work Queue master by specifying the option to use the catalog server (-a option) and the project name of your master (-N option). Example:

work_queue_worker -d all -a -N dinesh-wq

To successfully run Work Queue workers on multiple platforms, you may need to build and install CCTools on those platforms. So please start early. Follow these instructions to download and install CCTools.

Deliverables

You are required to submit the following for this option in the homework:

File containing the messages printed by your Work Queue master when it executes. To redirect these messages to a file, start the master as follows: python workqueue-hw.py > "file name here"
File containing the debug logs of the workers running on the following platforms: Future Grid, HPC cluster at UA, iPlant, Amazon EC2, and personal laptop.
The debug logs must show successful execution of a task dispatched by your Work Queue master. To write the debug logs to a file, use the "-o" option. For example, to send debug messages to a file named 'worker.ec2.log', start the worker as:
work_queue_worker -d all -o worker.ec2.log -a -N "your project name"