CCL Home Software Community Operations |
Makeflow TutorialDownload and InstallationFirst log into the ND CRC head node crcfe01.crc.nd.edu or crcfe02.crc.nd.edu by using ssh, Putty, or a similar tool. Once you have a shell, download and install the cctools software in your home directory as follows:cd $HOME wget http://ccl.cse.nd.edu/software/files/cctools-5.4.13-source.tar.gz tar xvzf cctools-5.4.13-source.tar.gz cd cctools-5.4.13-source ./configure --prefix $HOME/cctools make make install cd $HOMEIf you use bash then do this to set your path: export PATH=${HOME}/cctools/bin:/opt/sge/bin/lx-amd64:${PATH}If you use tcsh instead, then do this: setenv PATH ${HOME}/cctools/bin:/opt/sge/bin/lx-amd64:${PATH}Now double check that you can run the various commands, like this: makeflow -v work_queue_status qstat Makeflow ExampleLet's being by using Makeflow to run a handful of simulation codes. First, make and enter a clean directory to work in:cd $HOME mkdir tutorial cd tutorialNow, download this program, which performs a highly sophisticated simulation of black holes colliding together: wget http://ccl.cse.nd.edu/software/tutorials/ndreu16/simulation.pyTry running it once, just to see what it does: chmod 755 simulation.py ./simulation.py 5Now, let's use Makeflow to run several simulations. Create a file called example.makeflow and paste the following text into it: input.txt: LOCAL /bin/echo "Simulate Black Holes" > input.txt output.1: simulation.py input.txt ./simulation.py 1 < input.txt > output.1 output.2: simulation.py input.txt ./simulation.py 2 < input.txt > output.2 output.3: simulation.py input.txt ./simulation.py 3 < input.txt > output.3 output.4: simulation.py input.txt ./simulation.py 4 < input.txt > output.4To run it on your local machine, one job at a time, do this: makeflow -j 1 example.makeflowNote that if you run it a second time, nothing will happen, because all of the files are built: makeflow example.makeflow makeflow: nothing left to doUse the -c option to clean everything up before trying it again: makeflow -c example.makeflowOf course, you are running on a machine with multiple cores. If you leave out the -j option, then makeflow will run as many jobs as you have cores: makeflow example.makeflowIf the jobs are expected to be long running, then you would tell Makeflow to submit each job to SGE: makeflow -T sge example.makeflowAfter that completes, examine the output files output.1 etc, and you will notice that each job ran on a different machine in the cluster. Running Makeflow with Work QueueSometimes, submitting jobs individually to a batch system is not convenient. It can take a long time for each job to wait in the queue and receive service. Or, perhaps you are working in a situation where you don't have a batch system set up for use. Instead, you can use the Work Queue system to run the jobs. To do this, first start makeflow in Work Queue (wq) mode:makeflow -c example.makeflow makeflow -T wq example.makeflow -p 0 listening for workers on port XXXX. ...Now open up another shell and run a single worker process: work_queue_worker localhost XXXXGo back to your first shell and observe that the makeflow has finished. Of course, remembering port numbers all the time gets old fast, so try the same thing again, but using a project name: makeflow -c example.makeflow makeflow -T wq example.makeflow -N MYPROJECT listening for workers on port XXXX ...Now open up another shell and run your worker with a project name: work_queue_worker -N MYPROJECTWhen using a project name, your workflow is advertised to the catalog server, and can be viewed using work_queue_status: work_queue_status Running Workers in SGEOf course, we don't really want to run workers on the head node, so let's instead start five workers using SGE:sge_submit_workers -N MYPROJECT 5 Creating worker submit scripts in dthain-workers... Your job 18728 ("worker.sh") has been submitted Your job 18729 ("worker.sh") has been submitted Your job 18730 ("worker.sh") has been submitted Your job 18731 ("worker.sh") has been submitted Your job 18732 ("worker.sh") has been submittedUse the qstat command to observe that they are submitted (and possibly running): qstat -u $USER job-ID prior name user state submit/start at queue ------------------------------------------------------------------------------------------------ 18728 100.49976 worker.sh dthain r 06/02/2016 12:04:45 long@d6copt172.crc.nd.edu 18729 100.49976 worker.sh dthain r 06/02/2016 12:04:47 long@d6copt184.crc.nd.edu 18730 100.49976 worker.sh dthain r 06/02/2016 12:04:47 long@d6copt025.crc.nd.edu 18731 100.49976 worker.sh dthain r 06/02/2016 12:04:48 long@d6copt025.crc.nd.edu 18732 100.49976 worker.sh dthain r 06/02/2016 12:04:48 long@dqcneh084.crc.nd.eduNow, restart your Makeflow and it will use the workers already running in SGE makeflow -c example.makeflow makeflow -T wq example.makeflow -N MYPROJECT listening for workers on port XXXX. ...You can leave the workers running there, if you want to start another Makeflow. They will remain until they have been idle for fifteen minutes, then will stop automatically. If you add the -d all option to Makeflow, it will display debugging information that shows where each task was sent, when it was returned, and so forth: makeflow -c example.makeflow makeflow -T wq example.makeflow -N MYPROJECT -d all listening for workers on port XXXX. (Alternate) Running Workers in CondorOf course, we don't really want to run workers on the head node, so let's instead start five workers using Condor:condor_submit_workers -N MYPROJECT 5 Creating worker submit scripts in dthain-workers... Submitting job(s)..... 5 job(s) submitted to cluster 258192.Use the condor_q command to observe that they are submitted to Condor: condor_q ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 258192.0 dthain 5/31 16:03 0+00:00:12 R 0 0.7 work_queue_worker 258192.1 dthain 5/31 16:03 0+00:00:12 R 0 0.7 work_queue_worker 258192.2 dthain 5/31 16:03 0+00:00:12 R 0 0.7 work_queue_worker 258192.3 dthain 5/31 16:03 0+00:00:12 R 0 0.7 work_queue_worker 258192.4 dthain 5/31 16:03 0+00:00:11 R 0 0.7 work_queue_workerNow, restart your Makeflow and it will use the workers already running in Condor: makeflow -c example.makeflow makeflow -T wq example.makeflow -N MYPROJECT listening for workers on port XXXX. ...You can leave the workers running there, if you want to start another Makeflow. They will remain until they have been idle for fifteen minutes, then will stop automatically. If you add the -d all option to Makeflow, it will display debugging information that shows where each task was sent, when it was returned, and so forth: makeflow -c example.makeflow makeflow -T wq example.makeflow -N MYPROJECT -d all listening for workers on port XXXX. If you finish all that, then go on to the Homework Assignment.
|