Makeflow User's Manual

Makeflow is Copyright (C) 2016- The University of Notre Dame. This software is distributed under the GNU General Public License. See the file COPYING for details.

Overview

Makeflow is a workflow engine for large scale distributed computing. It accepts a specification of a large amount of work to be performed, and runs it on remote machines in parallel where possible. In addition, Makeflow is fault-tolerant, so you can use it to coordinate very large tasks that may run for days or weeks in the face of failures. Makeflow is designed to be similar to Make, so if you can write a Makefile, then you can write a Makeflow.

Makeflow makes it easy to move a large amount of work from one facility to another. After writing a workflow, you can test it out on your local laptop, then run it at your university computing center, move it over to a national computing facility like XSEDE, and then again to a commercial cloud system. Tasks can also be run using Docker containers. Using the (bundled) Work Queue system, you can even run across multiple systems simultaneously. No matter where you run your tasks, the workflow language stays the same.

Makeflow is used in production to support large scale problems in science and engineering. Researchers in fields such as bioinformatics, biometrics, geography, and high energy physics all use Makeflow to compose workflows from existing applications.

Makeflow currently supports these batch systems:

local Local execution on the submitting machine, using multiple cores if available.
wq Work Queue distributed application system, included with Makeflow.
condor HTCondor distributed computing system.
sge Open Grid Scheduler, Univa Grid Engine, Oracle Grid Engine, all derived from Sun Grid Engine (SGE).
pbs Portable Batch Scheduler
torque Torque Resource Manager
slurm Slurm Workload Manager
moab Moab Workload Manager
cluster Custom-defined batch submission commands, see custom drivers below for details.
chirp Active storage jobs on the Chirp filesystem.

Makeflow is part of the Cooperating Computing Tools. You can download the CCTools from this web page, follow the installation instructions, and you are ready to go.

The Makeflow Language

The Makeflow language is very similar to Make. A Makeflow script consists of a set of rules. Each rule specifies a set of target files to create, a set of source files needed to create them, and a command that generates the target files from the source files.

Makeflow attempts to generate all of the target files in a script. It examines all of the rules and determines which rules must run before others. Where possible, it runs commands in parallel to reduce the execution time.

Here is a Makeflow that uses the convert utility to make an animation. It downloads an image from the web, creates four variations of the image, and then combines them back together into an animation. The first and the last task are marked as LOCAL to force them to run on the controlling machine.

CURL=/usr/bin/curl CONVERT=/usr/bin/convert URL=http://ccl.cse.nd.edu/images/capitol.jpg capitol.montage.gif: capitol.jpg capitol.90.jpg capitol.180.jpg capitol.270.jpg capitol.360.jpg $CONVERT LOCAL $CONVERT -delay 10 -loop 0 capitol.jpg capitol.90.jpg capitol.180.jpg capitol.270.jpg capitol.360.jpg capitol.270.jpg capitol.180.jpg capitol.90.jpg capitol.montage.gif capitol.90.jpg: capitol.jpg $CONVERT $CONVERT -swirl 90 capitol.jpg capitol.90.jpg capitol.180.jpg: capitol.jpg $CONVERT $CONVERT -swirl 180 capitol.jpg capitol.180.jpg capitol.270.jpg: capitol.jpg $CONVERT $CONVERT -swirl 270 capitol.jpg capitol.270.jpg capitol.360.jpg: capitol.jpg $CONVERT $CONVERT -swirl 360 capitol.jpg capitol.360.jpg capitol.jpg: $CURL LOCAL $CURL -o capitol.jpg $URL

Note that Makeflow differs from Make in a few important ways. The complete details are given below.

Running Makeflow

To try out the example above, copy and paste it into a file named example.makeflow. To run it on your local machine:

makeflow example.makeflow

Note that if you run it a second time, nothing will happen, because all of the files are built:

makeflow example.makeflow makeflow: nothing left to do

Use the -c option to clean everything up before trying it again:

makeflow -c example.makeflow

If you have access to a batch system like Condor, SGE, Torque, SLURM, or Moab, you can direct Makeflow to run your jobs there:

makeflow -T condor example.makeflow makeflow -T sge example.makeflow makeflow -T torque example.makeflow ...

Running Makeflow with Work Queue

You will notice that a workflow can run very slowly if you submit each task in the usual way, because it typically takes 30 seconds or so to start each batch job running. To get around this limitation, we provide the Work Queue system. This allows Makeflow to function as a master process that quickly dispatches work to remote worker processes.

To begin, let's assume that you are logged into a machine named barney.nd.edu. start your Makeflow like this:

makeflow -T wq example.makeflow

Then, submit 10 worker processes to Condor like this:

condor_submit_workers barney.nd.edu 9123 10 Submitting job(s).......... Logging submit event(s).......... 10 job(s) submitted to cluster 298.

Or, submit 10 worker processes to SGE like this:

sge_submit_workers barney.nd.edu 9123 10

Or, you can start workers manually on any other machine you can log into:

work_queue_worker barney.nd.edu 9123

Once the workers begin running, Makeflow will dispatch multiple tasks to each one very quickly. If a worker should fail, Makeflow will retry the work elsewhere, so it is safe to submit many workers to an unreliable system.

When the Makeflow completes, your workers will still be available, so you can either run another Makeflow with the same workers, remove them from the batch system, or wait for them to expire. If you do nothing for 15 minutes, they will automatically exit.

Note that condor_submit_workers and sge_submit_workers, are simple shell scripts, so you can edit them directly if you would like to change batch options or other details. Please refer to Work Queue manual for more details.

Port Numbers

Makeflow listens on a port which the remote workers would connect to. The default port number is 9123. Sometimes, however, the port number might be not available on your system. You can change the default port via the -p option. For example, if you want the master to listen on port 9567 by default, you can run the following command:

makeflow -T wq -p 9567 example.makeflow

Project Names

The best and simplest way to match workers to masters is to use the project name matching. You can give the master a project name with the -N option. This works with any batch system type used.

makeflow -T wq -N MyProj example.makeflow

The -N option gives the master a project name called 'MyProj', and will cause it to advertise its information such as the project name, running status, hostname and port number, to a catalog server. Then a worker can simply identify the workload by its project name. By default, makeflow will use the global catalog server at catalog.cse.nd.edu but this can be changed, as described below.

To start a worker that automatically finds MyProj's master via the default Notre Dame catalog server:

work_queue_worker -N MyProj

You can also give multiple -N options to a worker. The worker will find out which ones of the specified projects are running from the catalog server and randomly select one to work for. When one project is done, the worker would repeat this process. Thus, the worker can work for a different master without being stopped and given the different master's hostname and port. An example of specifying multiple projects:

work_queue_worker -N proj1 -N proj2 -N proj3

In addition to creating a project name using the -N option, this will also trigger automatic reporting to the designated catalog server. The Port and Server address are taken from the environment variables CATALOG_HOST and CATALOG_PORT. If either variable is not set, then the addresses "catalog.cse.nd.edu,backup-catalog.cse.nd.edu" and port 9097 will be used.

Setting a Password

We recommend that any workflow that uses a project name also set a password. To do this, select any password and write it to a file called mypwfile. Then, run Makeflow and each worker with the --password option to indicate the password file:

makeflow --password mypwfile ... work_queue_worker --password mypwfile ...

Catalog Server

It is also possible to run your own catalog server. See Catalog Servers for details.

Suppose you have a catalog server listening at catalog.somewhere.edu:9097. To make your masters and workers contact this catalog server, simply add the -C hostname:port option to both of your master and worker:

makeflow -T wq -C catalog.somewhere.edu:9097 -N MyProj example.makeflow work_queue_worker -C catalog.somewhere.edu:9097 -a -N MyProj

Running Makeflow with Docker

Makeflow can be used with Docker to create a precise execution environment for each task. In this mode, Makeflow will invoke Docker to set up the environment, copy the input files into the container, run the command, and then save the output files created by the command.

To do this, simply invoke Makeflow with the --docker argument, and name the container image that you wish all of the commands to use. Makeflow will ensure that the named image is pulled into each Docker node, and then execute the task within that container. For example, --docker debian will cause all tasks to be run in the container name debian.

Alternatively, if you have an exported container image, you can use the exported image via the --docker-tar option. Makeflow will load the container into each execution node as needed. This allows you to use a container without pushing it to a remote repository.

This capability in addition to the selection of a batch system. For example, if you invoke makeflow -T condor --docker debian, then Makeflow will distribute tasks via Condor, and then use Docker to invoke the task at each execution node.

Running Makeflow with Singularity

In Addition to Docker, Makeflow can also be used with Singularity to containerize work into a custom execution environment. When using this mode, Singularity will take in an image, set up the container, and runs the command inside of the container. Any needed input files will be read in from Makeflow, and created files will be delivered by Makeflow.

To use this mode, invoke Makeflow with the --singularity=<img> argument. Makeflow will ensure that the passed image will be sent to each worker. All of the comands will be ran through the container built with that image. Unlike Docker, Singularity requires that users provide their own image.

Similarly, this capability can be used in addition to the selection of a batch system. Thus, if you launch Makeflow as makeflow -T condor --singularity=mycontainer.img, then Makeflow will distribute tasks via Condor, and then use Singularity to execute the task at each worker.

Running Makeflow with Umbrella

Makeflow allows the user to specify the execution environment for each rule via its --umbrella-spec and --umbrella-binary options. The --umbrella-spec option allows the user to specify an Umbrella specification, the --umbrella-binary option allows the user to specify the path to an Umbrella binary. Using this mode, each rule will be converted into an Umbrella job, and the specified Umbrella specification and binary will be added into the input file list of a job.

To test makeflow with umbrella using local execution engine: makeflow --umbrella-binary $(which umbrella) --umbrella-spec convert_S.umbrella example.makeflow To test makeflow with umbrella using wq execution engine: makeflow -T wq --umbrella-binary $(which umbrella) --umbrella-spec convert_S.umbrella example.makeflow

To run each makeflow rule as an Umbrella job, the --umbrella-spec must be specified. However, the --umbrella-binary option is optional: when it is specified, the specified umbrella binary will be sent to the execution node; when it is not specified, the execution node is expected to have an umbrella binary available through the $PATH environment variable.

Makeflow also allows the user to specify the umbrella log file prefix via its --umbrella-log-prefix option. The umbrella log file is in the format of ".". The default value for the --umbrella-log-prefix option is ".umbrella.log".

Makeflow also allows the user to specify the umbrella execution mode via its --umbrella-mode option. Currently, this option can be set to the following three modes: local, parrot, and docker. The default value of the --umbrella-mode option is local, which first tries to utilize the docker mode, and tries to utilize the parrot mode if the docker mode is not available.

Specify Umbrella in Makefile

You can also specify an Umbrella specification for a group of rule(s) in the Makefile by putting the following directives before the rule(s) you want to apply the Umbrella spec to:

.MAKEFLOW CATEGORY 1 .UMBRELLA SPEC convert_S.umbrella

In this case, the specified Umbrella spec will be applied to all the following rules until a new ".MAKEFLOW CATEGORY ..." directive is declared. All the rules before the first ".MAKEFLOW CATEGORY ..." directive will use the Umbrella spec specified by the --umbrella-spec option. If the --umbrella-spec option is not specified, these rules will run without being wrapped by Umbrella.

More Language Details

The Makeflow language is very similar to Make, but it does have a few important differences that you should be aware of.

Get the Dependencies Right

You must be careful to accurately specify all of the files that a rule requires and creates, including any custom executables. This is because Makeflow requires all these information to construct the environment for a remote job. For example, suppose that you have written a simulation program called mysim.exe that reads calib.data and then produces and output file. The following rule won't work, because it doesn't inform Makeflow what files are neded to execute the simulation:

# This is an incorrect rule. output.txt: ./mysim.exe -c calib.data -o output.txt

However, the following is correct, because the rule states all of the files needed to run the simulation. Makeflow will use this information to construct a batch job that consists of mysim.exe and calib.data and uses it to produce output.txt:

# This is a correct rule. output.txt: mysim.exe calib.data ./mysim.exe -c calib.data -o output.txt

Note that when a directory is specified as an input dependency, it means that the command relies on the directory and all of its contents. So, if you have a large collection of input data, you can place it in a single directory, and then simply give the name of that directory.

No Phony Rules

For a similar reason, you cannot have "phony" rules that don't actually create the specified files. For example, it is common practice to define a clean rule in Make that deletes all derived files. This doesn't make sense in Makeflow, because such a rule does not actually create a file named clean. Instead use the -c option as shown above.

Just Plain Rules

Makeflow does not support all of the syntax that you find in various versions of Make. Each rule must have exactly one command to execute. If you have multiple commands, simply join them together with semicolons. Makeflow allows you to define and use variables, but it does not support pattern rules, wildcards, or special variables like $< or $@. You simply have to write out the rules longhand, or write a script in your favorite language to generate a large Makeflow.

Local Job Execution

Certain jobs don't make much sense to distribute. For example, if you have a very fast running job that consumes a large amount of data, then it should simply run on the same machine as Makeflow. To force this, simply add the word LOCAL to the beginning of the command line in the rule.

Rule Lexical Scope

Variables in Makeflow have global scope, that is, once defined, their value can be accessed from any rule. Sometimes it is useful to define a variable locally inside a rule, without affecting the global value. In Makeflow, this can be achieved by defining the variables after the rule's requirements, but before the rule's command, and prepending the name of the variable with @, as follows:

SOME_VARIABLE=original_value target_1: source_1 command_1 target_2: source_2 @SOME_VARIABLE=local_value_for_2 command_2 target_3: source_3 command_3 In this example, SOME_VARIABLE has the value 'original_value' for rules 1 and 3, and the value 'local_value_for_2' for rule 2.

Environment Variables

Environment variables can be defined with the export keyword inside a workflow. Makeflow will communicate explicitly named environment variables to remote batch systems, where they will override whatever local setting is present. For example, suppose you want to modify the PATH for every job in the makeflow: export PATH=/opt/tools/bin:${PATH} If no value is given, then the current value of the environment variable is passed along to the job: export USER

Batch Job Refinement

When executing jobs, Makeflow simply uses the default settings in your batch system. If you need to pass additional options, use the BATCH_OPTIONS variable or the -B option to Makeflow.

When using Condor, this string will be added to each submit file. For example, if you want to add Requirements and Rank lines to your Condor submit files, add this to your Makeflow:

BATCH_OPTIONS = Requirements = (Memory>1024)

When using SGE, the string will be added to the qsub options. For example, to specify that jobs should be submitted to the devel queue: BATCH_OPTIONS = -q devel

Remote File Renaming

With the Work Queue and Condor batch systems, Makeflow has a feature called remote file renaming. For example:

local_name->remote_name

indicates that the file local_name is called remote_name in the remote system. Consider the following example:

b.out: a.in myprog LOCAL myprog a.in > b.out c.out->out: a.in->in1 b.out myprog->prog prog in1 b.out > out

The first rule runs locally, using the executable myprog and the local file a.in to locally create b.out. The second rule runs remotely, but the remote system expects a.in to be named in1, c.out, to be named out and so on. Note that we did not need to rename the file b.out. Without remote file renaming, we would have to create either a symbolic link, or a copy of the files with the expected correct names.

Shared Filesystems

Makeflow does not assume that a shared filesystem such as HDFS is available, and by default copies all necessary dependencies for each job. Specifying dependencies on a shared filesystem can be problematic given large files that are already accessible at every execution site. To avoid unnecessary copying, use makeflow --shared_fs /foo example.makeflow This flag can be specified multiple times. Any files on one of the given shared filesystems will be managed at the DAG level, so Makeflow will check existence and age to decide if other targets need to be built. Any such dependencies, however, will not be passed down to the batch system; shared files must be available by some other means at all execution and submission sites.

Nested Makeflows

One Makeflow can be nested inside of another by writing a rule with the following syntax: output-files: input-files MAKEFLOW makeflow-file [working-dir] The input and output files are specified as usual, describing the files consumed and created by the child makeflow as a whole. Then, the MAKEFLOW keyword< introduces the child makeflow specification, and an optional working directory into which the makeflow will be executed. If not given, the current working directory is assumed.

Displaying a Makeflow

There are several ways to visualize both the structure of a Makeflow as well as its progress over time. makeflow_viz can be used to convert a Makeflow into a file that can be displayed by Graphviz DOT tools like this:

makeflow_viz -D dot example.makeflow > example.dot
dot -Tgif < example.dot > example.gif

Or use a similar command to generate a Cytoscape input file. (This will also create a Cytoscape style.xml file.)

makeflow_viz -D cyto example.makeflow > example.xgmml

To observe how a makeflow runs over time, use makeflow_graph_log to convert a log file into a timeline that shows the number of tasks ready, running, and complete over time:

makeflow_graph_log example.makeflowlog example.png

Wrapper Commands

Makeflow allows a global wrapper command to be applied to every rule in the workflow. This is particularly useful for applying troubleshooting tools, or for setting up a global environment without rewriting the entire workflow. The --wrapper option will prefix a command in front of every rule, while the --wrapper-input and --wrapper-output options will specify input and output files related to the wrapper.

A few special characters are supported by wrappers. If the wrapper command or wrapper files contain two percents (%%), then the number of the current rule will be substituted there. If the command contains curly braces ({}) the original command will be substituted at that point. Square brackets ([]) are the same as curly braces, except that the command is quoted and escaped before substitution. If neither specifier is given, Makeflow appends /bin/sh -c [] to the wrapper command.

For example, suppose that you wish to shell builtin command time to every rule in the workflow. Instead of modifying the workflow, run it like this:

makeflow --wrapper 'time -p' example.makeflow

Since the preceding wrapper did not specify where to substitute the command, it is equivalent to

makeflow --wrapper 'time -p /bin/sh -c []' example.makeflow

This way, if a single rule specifies multiple commands, the wrapper will time all of them.

The square brackets and the default behavior of running commands in a shell were added because Makeflow allows a rule to run multiple commands. The curly braces simply perform text substitution, so for example

makeflow --wrapper 'env -i {}' example.makeflow does not work correctly if multiple commands are specified. target_1: source_1 command_1; command_2; command_3 will be executed as env -i command_1; command_2; command_3

Notice that only command_1's environment will be cleared; subsequent commands are not affected. Thus this wrapper should be given as

makeflow --wrapper 'env -i /bin/sh -c []' example.makeflow or more succinctly as makeflow --wrapper 'env -i' example.makeflow

Suppose you want to apply strace to every rule, to obtain system call traces. Since every rule would have to have its own output file for the trace, you could indicate output files like this:

makeflow --wrapper 'strace -o trace.%%' --wrapper-output 'trace.%%' example.makeflow

Suppose you want to wrap every command with a script that would set up an appropriate Java environment. You might write a script called setupjava.sh like this:

#!/bin/sh
export JAVA_HOME=/opt/java-9.8.3.6.7
export PATH=${JAVA_HOME}/bin:$PATH
echo "using java in $JAVA_HOME"
exec "$@"

And then invoke Makeflow like this:

makeflow --wrapper ./setupjava.sh --wrapper-input setupjava.sh example.makeflow

Resources and Categories

Makeflow has the capability of automatically setting the cores, memory, and disk space requirements to the underlying batch system (currently this only works with Work Queue and Condor). Jobs are grouped into job categories , and jobs in the same category have the same cores, memory, and disk requirements.

Job categories and resources are specified with variables. Jobs are assigned to the category named in the value of the variable CATEGORY. Likewise, the values of the variables CORES, MEMORY (in MB), and DISK (in MB) describe the resources requirements for the category specified in CATEGORY.

Jobs without an explicit category are assigned to default. Jobs in the default category get their resource requirements from the value of the environment variables CORES, MEMORY, and DISK.

Consider the following example: # These tasks are assigned to the category preprocessing. # MEMORY and CORES are read from the environment, if defined. CATEGORY="preprocessing" DISK=500 one: src cmd two: src cmd # These tasks have the category "simulation". Note that now CORES, MEMORY, and DISK are specified. CATEGORY="simulation" CORES=1 MEMORY=400 DISK=400 three: src cmd four: src cmd # Another category switch. MEMORY is read from the environment. CATEGORY="analysis" CORES=4 DISK=600 five: src cmd export MEMORY=800 makeflow ...
Resources specified:
CategoryCoresMemory (MB)Disk (MB)
preprocessing (unspecified) 800 (from environment) 500
simulation 1 400 400
analysis 4 800 (from environment) 600

Mountfile Support

Makeflow allows the user to specify the source and target of each input dependency for a makeflow through --mounts mountfile option. Every line of mountfile should be in the format of target source.

Here is an example of a mountfile:

curl /usr/bin/curl convert ../bin/convert data/file1 /home/bob/input1 1.root http://myresearch.org/1.root

To limit the behavoir of a makeflow inside the current working directory, the target field should satisfy the following requirements:

The source field can be a local file path or a http URL. When a local file path is specified, the following requirements should be satisfied:

To execute a makeflow with a mountfile, named mountfile, specifying its input dependencies:

makeflow --mounts mountfile example.makeflow

Before execution, Makeflow first parses each line of the mountfile when the --mounts option is set, and copies the specified dependency from the location specified by source field into a local cache, and then links the target to the item under the cache. Makeflow also records the location of the local cache and the info (source, target, filename under the cache dir, and so on) of each dependencies specified in the mountfile into its log.

To cleanup a makeflow together with the local cache and all the links created due to the mountfile:

makeflow -c example.makeflow

To only cleanup the local cache and all the links created due to the mountfile:

makeflow -ccache example.makeflow

By default, makeflow would create a unique directory under the current working directory to hold all the dependencies introduced by the mountfile. However, Makeflow also allows the user to provide a cache dir through the --cache cache_dir option, which can be empty or already has some of the depedencies, or all the dependencies.

To limit the behavoir of a makeflow inside the current working directory, the cache_dir should satisfy the following requirements:

To execute a makeflow with a mountfile, named mountfile, specifying its input dependencies, and a given cache, named mycache, specifying the cache location:

makeflow --mounts mountfile --cache mycache example.makeflow

Makeflow first checks the validity of the specified cache path, and then creates it if necessary. Then all the missing dependencies will be copied from the source to the cache dir, and the link from each dependency to its target under the cache dir will be created. The other part of Makeflow works similarly to the case when there is no --cache option.

Linking Dependencies

Makeflow provides a tool to collect all of the dependencies for a given workflow into one directory. By collecting all of the input files and programs contained in a workflow it is possible to run the workflow on other machines.

Currently, Makeflow copies all of the files specified as dependencies by the rules in the makeflow file, including scripts and data files. Some of the files not collected are dynamically linked libraries, executables not listed as dependencies (python, perl), and configuration files (mail.rc).

To avoid naming conflicts, files which would otherwise have an identical path are renamed when copied into the bundle:

Example usage:

makeflow_analyze -b some_output_directory example.makeflow

Garbage Collection

As the workflow execution progresses, Makeflow can automatically delete intermediate files that are no longer needed. In this context, an intermediate file is an input of some rule that is the target of another rule. Therefore, by default, garbage collection does not delete the original input files, nor final target files.

Which files are deleted can be tailored from the default by appending files to the Makeflow variables MAKEFLOW_INPUTS and MAKEFLOW_OUTPUTS. Files added to MAKEFLOW_INPUTS augment the original inputs files that should not be deleted. MAKEFLOW_OUTPUTS marks final target files that should not be deleted. However, different from MAKEFLOW_INPUTS, files specified in MAKEFLOW_OUTPUTS does not include all output files. If MAKEFLOW_OUTPUTS is not specified, then all files not used in subsequent rules are considered outputs. It is considered best practice to always specify MAKEFLOW_INPUTS/OUTPUTS to clearly specify which files are considered inputs and outputs and allow for better space management if garbage collection is used.

Makeflow offers two modes for garbage collection: reference count, and on demand. With the reference count mode, intermediate files are deleted as soon as no rule has them listed as input. The on-demand mode is similar to reference count, only that files are deleted until the space on the local file system is below a given threshold.

To activate reference count garbage collection:

makeflow -gref_count

To activate on-demand garbage collection, with a threshold of 500MB:

makeflow -gon_demand -G500000000

Log File Format

After you have executed the example.makeflow Makeflow script, you should see a log file named example.makeflow.makeflowlog under the directory where you ran the makeflow command. The Makeflow log file records how and when every task is run by Makeflow. It exists primarily so that Makeflow can recover cleanly after a failure, but can also be used for logging and debugging.

A sample logfile might look like this:

# STARTED 1435251570723463 # 1 capitol.jpg 1435251570725086 1435251570725528 5 1 17377 5 1 0 0 0 6 # 2 capitol.jpg 1435251570876426 1435251570876486 5 2 17377 5 0 1 0 0 6 # 1 capitol.360.jpg 1435251570876521 1435251570876866 4 1 17379 4 1 1 0 0 6 # 1 capitol.270.jpg 1435251570876918 1435251570877166 3 1 17380 3 2 1 0 0 6 # 2 capitol.270.jpg 1435251570984114 1435251570984161 3 2 17380 3 1 2 0 0 6 # 1 capitol.180.jpg 1435251570984199 1435251570984533 2 1 17383 2 2 2 0 0 6 # 2 capitol.360.jpg 1435251571003847 1435251571003923 4 2 17379 2 1 3 0 0 6 # 1 capitol.90.jpg 1435251571003969 1435251571004476 1 1 17384 1 2 3 0 0 6 # 2 capitol.180.jpg 1435251571058319 1435251571058369 2 2 17383 1 1 4 0 0 6 # 2 capitol.90.jpg 1435251571094157 1435251571094214 1 2 17384 1 0 5 0 0 6 # 1 capitol.montage.gif 1435251571094257 1435251571094590 0 1 17387 0 1 5 0 0 6 # 2 capitol.montage.gif 1435251575980215 # 3 capitol.360.jpg 1435251575980270 # 3 capitol.270.jpg 1435251575980288 # 3 capitol.180.jpg 1435251575980303 # 3 capitol.90.jpg 1435251575980319 # 3 capitol.jpg 1435251575980334 1435251575980350 0 2 17387 0 0 6 0 0 6 # COMPLETED 1435251575980391

Each line in the log file represents a single action taken on a single rule in the workflow. For simplicity, rules are numbered from the beginning of the Makeflow, starting with zero. Each line contains the following items:

timestamp task_id new_state job_id tasks_waiting tasks_running tasks_complete tasks_failed tasks_aborted task_id_counter

Which are defined as follows:

In addition, lines starting with a pound sign are comments and contain additional high-level information that can be safely ignored. The logfile begins with a comment to indicate the starting time, and ends with a comment indicating whether the entire workflow completed, failed, or was aborted.

Aside from the high-level information, file states are also recorded in the log. This allows for tracking files throughout the workflow execution. This information is shown starting with the #:

# new_state filename timestamp

Each file state line records the state change and time:

Custom Drivers

For clusters that are not directly supported by Makeflow we strongly suggest using the Work Queue system and submitting workers via the cluster's normal submission mechanism.

For clusters using managers similar to Torque, SGE, and PBS that submit jobs with commands like "qsub", you can inform makeflow of those commands and use the cluster driver. For this to work, it is assumed there is a distributed filesystem shared (like NFS) shared across all nodes of the cluster.

To configure a custom driver, set the following environment variables:

These will be used to construct a task submission for each makeflow rule that consists of:

$SUBMIT_COMMAND $SUBMIT_OPTIONS $CLUSTER_NAME.wrapper "<rule commandline>"

The wrapper script is a shell script that reads the command to be run as an argument and handles bookkeeping operations necessary for Makeflow.

Archiving a workflow

Makeflow allows for users to archive the results of each job within a specified archive directory. This is done using the --archive option, which by default creates a archiving directory at /tmp/makeflow.archive.$UID. Both files and jobs are stored as the workflow executes. Makeflow will also check to see if a job has already been archived into the archiving directory, and if so the outputs of the job will be copied to the working directory and the job will skip execution.

makeflow --archive example.makeflow

To only write to the archiving directory (and ensure that all nodes will be executed instead), pass --archive-write. To only read from the archive and use the outputs of any archived job, pass --archive-read. To specify a directory to use for the archiving directory, give an optional argument as shown below

makeflow --archive=/path/to/directory/ example.makeflow

For More Information

For the latest information about Makeflow, please visit our web site and subscribe to our mailing list.