CCL Home Software Community Operations |
Makeflow TutorialThis tutorial will have you install CCTools into your FutureGrid home directory and will take you through some distributed computation examples using Makeflow. Getting StartedLogin to the Future Grid Head NodeFor this tutorial, we assume you have an open SSH connection to the Future Grid login nodes. If you do not have an account with Future Grid, then you may register here. In this tutorial, we will use the alamo login node: ssh alamo.futuregrid.org
Download, Build, and Install CCToolsNavigate to the download page in your browser to review the most recent versions: http://www.nd.edu/~ccl/software/download.shtml Setup a Sandbox for this Tutorial and Download a copy of CCTools 3.5.3mkdir ~/cctools-tutorial
wget http://www.nd.edu/~ccl/software/files/cctools-3.5.3-RC1-source.tar.gz
tar xzf cctools-3.5.3-RC1-source.tar.gz
Build and Install CCToolscd ~/cctools-3.5.3-RC1-source
./configure --prefix ~/cctools && make install
Set Environment VariablesYou will need to add your CCTools directory to your $PATH: export PATH=~/cctools/bin:${PATH}
Makeflow ExampleSetupmkdir ~/cctools-tutorial/makeflow-simple
cd ~/cctools-tutorial/makeflow-simple
Download simulation.py which is our application executable for this exercise. Download this Makeflow script which defines the workflow. wget http://www.nd.edu/~ccl/software/tutorials/acic12/simple/simulation.py
wget http://www.nd.edu/~ccl/software/tutorials/acic12/simple/Makeflow
The Makeflow script should look like: $ cat Makeflow
input.txt:
LOCAL /bin/echo Hello World > input.txt
A: simulation.py input.txt
python simulation.py 1 < input.txt > A
B: simulation.py input.txt
python simulation.py 2 < input.txt > B
C: simulation.py input.txt
python simulation.py 3 < input.txt > C
D: simulation.py input.txt
python simulation.py 4 < input.txt > D
Running with Local (Multiprocess) ExecutionHere we're going to tell Makeflow to dispatch the jobs using regular local processes (no distributed computing!). This is basically the same as regular Unix Make using the -j flag. makeflow -T local
If everything worked out correctly, you should see: $ makeflow -T local
/bin/echo Hello World > input.txt
python simulation.py 4 < input.txt > D
python simulation.py 3 < input.txt > C
python simulation.py 2 < input.txt > B
python simulation.py 1 < input.txt > A
nothing left to do.
Running with FutureGrid's TorqueThe following code tells Makeflow to dispatch jobs using the Torque batch submission system (qsub, qdel, qstat, etc.). makeflow -T torque
You will get as output: $ makeflow -T torque
nothing left to do.
Well... that's not right. Nothing was run! We need to clean out the generated output files and logs so Makeflow starts from a clean slate again: makeflow -c
We see it deleted the files we generated in the last run: $ makeflow -c
deleted file D
deleted file C
deleted file B
deleted file A
deleted file input.txt
deleted file ./Makeflow.makeflowlog
Now let's try again: makeflow -T torque
We get the output we expect: $ makeflow -T torque
/bin/echo Hello World > input.txt
python simulation.py 4 < input.txt > D
python simulation.py 3 < input.txt > C
python simulation.py 2 < input.txt > B
python simulation.py 1 < input.txt > A
nothing left to do.
Notice that the output is no different from using local execution. Makeflow is built to be execution engine agnostic. There is no difference between executing the task locally or remotely. In this case, we can confirm that the job was run on another host by looking at the output produced by the simulation: $ cat D
Running on host c056.cm.cluster
Starting 2012 9 Sep 15:48:22
x = 2.0
x = 1.41421356237
x = 1.189207115
x = 1.09050773267
x = 1.04427378243
Finished 2012 9 Sep 15:48:27
Here we see that the worker ran on node c056.cm.cluster. It took 5 seconds to complete. Running Makeflow with Work QueueThe submission and wait times for the Makeflow tasks in the above case will vary because of the latencies in the underlying batch job submission platform (Torque). To avoid long submission and wait times, Makeflow can be run using Work Queue. Work Queue excels at handling low latency and short turn-around time jobs. Here, we will start Makeflow which will setup a Work Queue master on an arbitrary port using -p 0. And, we will turn on the debugging output to see what happens as it runs. makeflow -c
makeflow -T wq -p 0 -d all
You should see output like this:
2012/07/30 12:18:54.30 [10131] makeflow: debug: checking for duplicate targets...
2012/07/30 12:18:54.30 [10131] makeflow: debug: checking rules for consistency...
2012/07/30 12:18:54.30 [10131] makeflow: tcp: listening on port XXXX
2012/07/30 12:18:54.30 [10131] makeflow: wq: Work Queue is listening on port XXXX.
2012/07/30 12:18:54.30 [10131] makeflow: batch: started process 10132: /bin/echo Hello World > input.txt
2012/07/30 12:18:54.30 [10131] makeflow: debug: node 0 waiting -> running
...
Now, run the work_queue_worker with the port the master is listening on. work_queue_worker -t 10 localhost XXXX
When the tasks are finished, the worker should quit due to the 10 second timeout. ExerciseRunning Makeflow with Work Queue Workers on TorqueThe goal of this exercise is to setup Work Queue workers on Torque compute nodes. Here we submit the worker tasks using the torque_submit_workers executable. To know more about the options and arguments for torque_submit_workers, do: torque_submit_workers -h
In this exercise, we will use the catalog server and a project name so workers can find the master without being provided with the master's hostname and port. We force Makeflow and the Work Queue workers to use the catalog server by specifying the -a option. We then specify a project name for the Makeflow script using the -N option that takes a string as an argument. The workers are then provided with the project name to connect to using the same -N option. NOTE: Pick your own distinct project name for MYPROJECT. makeflow -c
torque_submit_workers -t 300 -a -N MYPROJECT 5
makeflow -T wq -a -N MYPROJECT -d all -o makeflow.debug > /dev/null 2> /dev/null &
|