Homework 2: Learning Work Queue
-
Option A: Write a Work Queue program to check if
"Shakespeare speak" still exists
-
Prerequisites
-
Objectives
-
The Assignment
-
Deliverables
-
Option B: Configure Work Queue to run on multiple cloud platforms
-
Prerequisites
-
Objectives
-
The Assignment
-
Deliverables
This homework has two options. Please
pick one option that you are comfortable with to complete this homework.
Note: Only *ONE* of the two options needs to be submitted.
Use the instructions in the
Work Queue user manual to install, setup, and learn more about Work Queue.
Option A
Write a Work Queue program to check if "Shakespeare speak" still
exists
Prerequisites
Objectives
At the conclusion of this assignment, you will be able to:
- Decompose and execute workflows as smaller concurrent tasks.
- Understand how the master-worker model of distributed execution works.
- Write Work Queue programs for the execution of your workflows.
The Assignment
This assignment is similar to the Makeflow homework
. In this assignment, you will write a Work Queue program
to compare the words seen in the works of Shakespeare
with the words currently in use in the English language and identify those
words that are still in use.
You will compare the words of Shakespeare listed in
each of these text files:
Each file contains all the words found in a work of
Shakespeare that is identified in the filename. For your convenience, each word is listed
in a separate line to make it easier to scan and read them.
You will compare the words in each of these files
with the dictionary provided in Unix systems
(/usr/share/dict/words or /usr/dict/words). If a word is found to exist in
the dictionary, you should print the word followed by the string "Art spoken
ever and anon!". If not, you should print the word followed by the string
"Ne'er spoken ever!".
To complete this assignment, you will write a Work Queue program that
decomposes the problem into concurrent tasks. Your program
should create and submit 5 tasks, each describing a search of the dictionary for the words in a file,
or in other words, one task per file.
You will run this master
program on one of the Future Grid
head nodes as described here.
Finally, you will then submit five Work Queue workers on the Future Grid Torque
batch submission system as follows:
torque_submit_workers MASTERHOST MASTERPORT 5
where MASTERHOST and MASTERPORT refer to the hostname and port where your
Work Queue master program is listening.
Hint:
To read each file and compare the words listed in it against the dictionary, you can
use this simple python program: shakespeare-compare.py.
You will then have this program executed remotely by specifying the filename
containing the words to check.
For example, to check if the words in hamlet.txt exist in the dictionary, run:
python shakespeare_compare.py hamlet.txt
Create concurrent tasks that check each of the files by
executing this program.
Deliverables
You are required to submit the following for this option in the homework:
- The Work Queue program that was created.
Option B
Configure Work Queue to run on multiple cloud platforms
Prerequisites
- Requires access to deploy resources in Future Grid, HPC cluster
at UA, iPlant, and Amazon EC2.
- Requires access to a personal laptop.
- Completion of the Using
Work Queue on FutureGrid tutorial.
Objectives
At the conclusion of this assignment, you will be able to:
- Setup and configure Work Queue to run on multiple platforms.
- Harness resources from multiple heterogenous platforms to run your workflow.
The Assignment
For this assigment, you will need to download the following files:
- Work Queue
master program
- simulation.py
(used by the master program)
You will run the
Work Queue
master program on multiple platforms. The master program creates and
submits 20 tasks that invoke simulation.py
with different parameters for concurrent execution.
To execute the 20 tasks created and submitted by the Work Queue master,
you will start one worker on each of these platforms: Future Grid, HPC
cluster at UA, iPlant, Amazon EC2, and personal laptop.
In this assignment, you will use the catalog server and project name feature
to have the workers automatically find their master and establish connection.
To do this, the master needs to be provided with a project name that
will be advertised to the catalog server. You will modify the
Work Queue master program
to specify a project name to the catalog server through
the
specify_name() API.
An example of their usage is given below:
try:
Q = WorkQueue(port = 0)
except:
sys.exit(1)
Q.specify_name("dinesh-wq")
You will then start workers for your Work Queue master by
specifying the option to use the catalog server (-a option) and the
project name of your master (-N option). Example:
work_queue_worker -d all -a -N dinesh-wq
To successfully run Work Queue workers on multiple platforms, you may need to
build and install CCTools on those platforms. So please start early.
Follow these instructions to download and install CCTools.
Deliverables
You are required to submit the following for this option in the homework:
- File containing the messages printed by your Work Queue
master when it executes. To redirect these messages to a file, start the
master as follows:
python workqueue-hw.py > "file name here"
- File containing the debug logs of the workers running on the
following platforms: Future Grid, HPC cluster at UA, iPlant, Amazon EC2,
and personal laptop.
The debug logs must show successful execution of a task
dispatched by your Work Queue master. To write the debug logs to a file, use
the "-o" option. For example, to send debug messages to a file named
'worker.ec2.log', start the worker as:
work_queue_worker -d all -o worker.ec2.log -a -N "your project name"
|