Makeflow User's Manual

Makeflow is Copyright (C) 2017- The University of Notre Dame. This software is distributed under the GNU General Public License. See the file COPYING for details.

Table of Contents

Introduction

Overview

Makeflow is a workflow engine for large scale distributed computing. It accepts a specification of a large amount of work to be performed, and runs it on remote machines in parallel where possible. In addition, Makeflow is fault-tolerant, so you can use it to coordinate very large tasks that may run for days or weeks in the face of failures. Makeflow is designed to be similar to Make, so if you can write a Makefile, then you can write a Makeflow.

Makeflow makes it easy to move a large amount of work from one facility to another. After writing a workflow, you can test it out on your local laptop, then run it at your university computing center, move it over to a national computing facility like XSEDE, and then again to a commercial cloud system. Using the (bundled) Work Queue system, you can even run across multiple systems simultaneously. No matter where you run your tasks, the workflow language stays the same.

Makeflow is used in production to support large scale problems in science and engineering. Researchers in fields such as bioinformatics, biometrics, geography, and high energy physics all use Makeflow to compose workflows from existing applications.

Makeflow can send your jobs to a wide variety of services, such as batch systems (HTCondor, SGE, PBS, Torque), cluster managers (Mesos and Kubernetes), cloud services (Amazon EC2) and container environments like Docker and Singularity. Details for each of those systems are given in the Batch System Support section.

Installing

Makeflow is part of the Cooperating Computing Tools, which is easy to install. In most cases, you can should just build from source and install in your home directory like this: git clone https://github.com/cooperative-computing-lab/cctools cctools-source cd cctools-source ./configure make make install Then, set your path to include the appropriate directory. If you use bash, do this: export PATH=${HOME}/cctools/bin:$PATH Or if you use tcsh, do this: setenv PATH ${HOME}/cctools/bin:$PATH For more complex situations, consult the CCTools installation instructions.

Basic Usage

The Makeflow language is very similar to Make. A Makeflow script consists of a set of rules. Each rule specifies a set of target files to create, a set of source files needed to create them, and a command that generates the target files from the source files.

Makeflow attempts to generate all of the target files in a script. It examines all of the rules and determines which rules must run before others. Where possible, it runs commands in parallel to reduce the execution time.

Here is a Makeflow that uses the convert utility to make an animation. It downloads an image from the web, creates four variations of the image, and then combines them back together into an animation. The first and the last task are marked as LOCAL to force them to run on the controlling machine.

CURL=/usr/bin/curl CONVERT=/usr/bin/convert URL=http://ccl.cse.nd.edu/images/capitol.jpg capitol.anim.gif: capitol.jpg capitol.90.jpg capitol.180.jpg capitol.270.jpg capitol.360.jpg $CONVERT LOCAL $CONVERT -delay 10 -loop 0 capitol.jpg capitol.90.jpg capitol.180.jpg capitol.270.jpg capitol.360.jpg capitol.270.jpg capitol.180.jpg capitol.90.jpg capitol.anim.gif capitol.90.jpg: capitol.jpg $CONVERT $CONVERT -swirl 90 capitol.jpg capitol.90.jpg capitol.180.jpg: capitol.jpg $CONVERT $CONVERT -swirl 180 capitol.jpg capitol.180.jpg capitol.270.jpg: capitol.jpg $CONVERT $CONVERT -swirl 270 capitol.jpg capitol.270.jpg capitol.360.jpg: capitol.jpg $CONVERT $CONVERT -swirl 360 capitol.jpg capitol.360.jpg capitol.jpg: $CURL LOCAL $CURL -o capitol.jpg $URL

(Note that Makeflow differs from Make in a few subtle ways, you can learn about those in the Language Reference below.)

To try out the example above, copy and paste it into a file named example.makeflow. To run it on your local machine:

makeflow example.makeflow

Note that if you run it a second time, nothing will happen, because all of the files are built:

makeflow example.makeflow makeflow: nothing left to do

Use the --clean option to clean everything up before trying it again:

makeflow --clean example.makeflow

If you have access to a batch system like Condor, SGE, or Torque, you can direct Makeflow to run your jobs there by using the -T option:

makeflow -T condor example.makeflow or makeflow -T sge example.makeflow or makeflow -T torque example.makeflow ... To learn more about the various batch system options, see the Batch System Support section.

Resources

Some batch systems require information about exactly what resources each job needs, so as to schedule them appropriately. You can convey this by setting the variables CORES, MEMORY (in MB), and DISK (in MB) ahead of each job. Makeflow will translate this information as needed to the underlying batch system. For example:

CORES=4 MEMORY=1024 DISK=4000 output.txt: input.dat analyze input.dat > output.txt

Monitoring

A variety of tools are available to help you monitor the progress of a workflow as it runs. Makeflow itself creates a transaction log (example.makeflow.makeflowlog) which contains details of each task as it runs:
  1. makeflow_monitor reads the transaction log and produceds a continuous display that shows the overall time and progress through the workflow: makeflow_monitor example.makeflow.makeflowlog
  2. makeflow_graph_log will read the transaction log, and produce a timeline graph showing the number of jobs ready, running, and complete over time: makeflow_graph_log example.makeflow.makeflowlog example.png
  3. makeflow_viz will display the workflow in graphical form, so that you can more easily understand the structure and dependencies. (Read more about Visualization

General Advice

A few key bits of advice address the most common problems encountered when using Makeflow:

First, Makeflow works best when it has accurate information about each task that you wish to run. Make sure that you are careful to indicate exactly which input files each task needs, and which output files it produces.

Second, if Makeflow is doing something unexpected, you may find it useful to turn on the debugging stream with the -d all option. This will emit all sorts of detail about how each job is constructed and sent to the underlying batch system.

Finally, Makeflow was created by the Cooperative Computing Lab at the University of Notre Dame. We are always happy to learn more about how Makeflow is used and assist you if you have questions or difficulty.

For the latest information about Makeflow, please visit our web site and subscribe to our mailing list for more information.

Batch Systems

Makeflow supports a wide variety of batch systems. Use makeflow --help to see the current list supported. Generally speaking, simply run Makeflow with the -T option to select your desired batch system. If no option is given, then -T local is assumed.

If you need to pass additional parameters to your batch system, such as specifying a specific queue or machine category, use the -B option to Makeflow, or set the BATCH_OPTIONS variable in your Makeflow file. The interpretation of these options is slightly different with each system, as noted below.

To avoid overwhelming a batch system with an enormous number of idle jobs, Makeflow will limit the number of jobs sent to a system at once. You can control this on the command line with the --max-remote option or the MAKEFLOW_MAX_REMOTE_JOBS environment variable. Likewise, local execution can be limited with --max-local and MAKEFLOW_MAX_LOCAL_JOBS.

Local Execution

By default, Makeflow executes on the local machine. Makeflow will attempt to use all of the cores available on your machine. You can manually control the number of running jobs with the --max-local command line option.

HTCondor

Use the -T condor option to submit jobs to the HTCondor batch system. (Formerly known as Condor.)

Makeflow will automatically generate a submit file for each job. However, if you would like to customize the Condor submit file, use the -B option or BATCH_OPTIONS variable to specify text to add to the submit file.< /p>

For example, if you want to add Requirements and Rank lines to your Condor submit files, add this to your Makeflow:

BATCH_OPTIONS = Requirements = (Memory>1024)

UGE - Univa Grid Engine / OGE - Open Grid Engine / SGE - Sun Grid Engine

Use the -T sge option to submit jobs to Sun Grid Engine or systems derived from it like Open Grid Scheduler or Univa Grid Engine.

As above, Makeflow will automatically generate qsub commands. Use the -B option or BATCH_OPTIONS variable to specify text to add to the command line. For example, to specify that jobs should be submitted to the devel queue:

BATCH_OPTIONS = -q devel

PBS - Portable Batch System

Use the -T pbs option to submit jobs to the Portable Batch System or compatible systems.

Torque Batch System

Use the -T torque option to submit jobs to the Torque Resource Manager batch system.

SLURM - Simple Linux Resource Manager

Use the -T slurm option to submit jobs to the SLURM batch system.

Moab Scheduler

Use the -T moab option to submit jobs to the Moab scheduler or compatible systems.

Mesos

Makeflow can be used with Apache Mesos. To run Makeflow with Mesos, give thebatch mode via -T mesos and pass the hostname and port number of Mesos master to Makeflow with the --mesos-master option. Since the Makeflow-Mesos Scheduler is based on Mesos Python2 API, the path to Mesos Python2 library should be included in the $PATH, or one can specify a preferred Mesos Python2 API via --mesos-path option. To successfully import the Python library, you may need to indicate dependencies through --mesos-preload option.

For example, here is the command to run Makeflow on a Mesos cluster that has the master listening on port 5050 of localhost, with a user specified python library:

makeflow -T mesos --mesos-master localhost:5050 --mesos-path /path/to/mesos-0.26.0/lib/python2.6/site-packages example.makeflow ...

Amazon EC2

To run jobs on Amazon EC2, give the -T ec2 option, along with --amazon-ami to specify the machine image and --amazon-credentials to specify your Amazon credentials file. For each job to execute, Makeflow will create a virtual machine, upload the input files, run the job, download the output files, and then delete the virtual machine.

Generic Cluster Submission

For clusters that are not directly supported by Makeflow we strongly suggest using the Work Queue system and submitting workers via the cluster's normal submission mechanism.

However, if you have a system similar to Torque, SGE, or PBS which submits jobs with commands like "qsub", you can inform Makeflow of those commands and use the cluster driver. For this to work, it is assumed there is a distributed filesystem shared (like NFS) shared across all nodes of the cluster.

To configure a custom driver, set the following environment variables:

These will be used to construct a task submission for each makeflow rule that consists of:

$SUBMIT_COMMAND $SUBMIT_OPTIONS $CLUSTER_NAME.wrapper "<rule commandline>"

The wrapper script is a shell script that reads the command to be run as an argument and handles bookkeeping operations necessary for Makeflow.

Using Work Queue

Overview

As described so far, Makeflow translates each job in the workflow into a single submission to the target batch system. This works well as long as each job is relatively long running and doesn't perform a large amount of I/O. However, if each job is relatively short, the workflow may run very slowly, because it can take 30 seconds or more for the batch system to start an individual job. If common files are needed by each job, they may end up being transferred to the same node multiple times.

To get around these limitations, we provide the Work Queue system. The basic idea is to submit a number of persistent "worker" processes to an existing batch system. Makeflow communicates directly with the workers to quickly dispatch jobs and cache data, rather than interacting with the underlying batch system. This accelerates short jobs and exploits common data dependencies between jobs.

To begin, let's assume that you are logged into the head node of your cluster, called head.cluster.edu Run Makeflow in Work Queue mode like this:

makeflow -T wq example.makeflow

Then, submit 10 worker processes to Condor like this:

condor_submit_workers head.cluster.edu 9123 10 Submitting job(s).......... Logging submit event(s).......... 10 job(s) submitted to cluster 298.

Or, submit 10 worker processes to SGE like this:

sge_submit_workers head.cluster.edu 9123 10

Or, you can start workers manually on any other machine you can log into:

work_queue_worker head.cluster.edu 9123

Once the workers begin running, Makeflow will dispatch multiple tasks to each one very quickly. If a worker should fail, Makeflow will retry the work elsewhere, so it is safe to submit many workers to an unreliable system.

When the Makeflow completes, your workers will still be available, so you can either run another Makeflow with the same workers, remove them from the batch system, or wait for them to expire. If you do nothing for 15 minutes, they will automatically exit.

Note that condor_submit_workers and sge_submit_workers, are simple shell scripts, so you can edit them directly if you would like to change batch options or other details. Please refer to Work Queue manual for more details.

Port Numbers

Makeflow listens on a port which the remote workers would connect to. The default port number is 9123. Sometimes, however, the port number might be not available on your system. You can change the default port via the -p option. For example, if you want the master to listen on port 9567 by default, you can run the following command:

makeflow -T wq -p 9567 example.makeflow

Project Names

If you don't like using port numbers, an easier way to match workers to masters is to use a project name. You can give each master a project name with the -N option.

makeflow -T wq -N MyProject example.makeflow

The -N option gives the master a project name called 'MyProject', and will cause it to advertise its information such as the project name, running status, hostname and port number, to a catalog server. Then a worker can simply identify the workload by its project name.

To start a worker that automatically finds MyProject's master via the default catalog server:

work_queue_worker -N MyProject

You can also give multiple -N options to a worker. The worker will find out which ones of the specified projects are running from the catalog server and randomly select one to work for. When one project is done, the worker would repeat this process. Thus, the worker can work for a different master without being stopped and given the different master's hostname and port. An example of specifying multiple projects:

work_queue_worker -N proj1 -N proj2 -N proj3

In addition to creating a project name using the -N option, this will also trigger automatic reporting to the designated catalog server. The Port and Server address are taken from the environment variables CATALOG_HOST and CATALOG_PORT. If either variable is not set, then the addresses "catalog.cse.nd.edu,backup-catalog.cse.nd.edu" and port 9097 will be used.

It is also easy to run your own catalog server, if you prefer. For more details, see the Catalog Server Manual.

Setting a Password

We recommend that anyone using the catalog server also set a password. To do this, select any password and write it to a file called mypwfile. Then, run Makeflow and each worker with the --password option to indicate the password file:

makeflow --password mypwfile ... work_queue_worker --password mypwfile ...

Container Environments

Makeflow can interoperate with a variety of container technologies, including Docker, Singularity, and Umbrella. In each case, you name the container image on the command line, and then Makeflow will apply it to each job by creating the container, loading (or linking) the input files into the container, running the job, extracting the output files, and deleting the container.

Note that container specification is independent of batch system selection. For example, if you select HTCondor execution with -T condor and Docker support with --docker, then Makeflow will submit HTCondor batch jobs, each consisting of an invocation of docker to run the job. You can switch to any combination of batch system and container technology that you like.

Docker

If you have the Docker container enviornment installed on your cluster, Makeflow can be set up to place each job in your workflow into a Docker container. Invoke Makeflow with the --docker argument followed by the container image to use. Makeflow will ensure that the named image is pulled into each Docker node, and then execute the task within that container. For example, --docker debian will cause all tasks to be run in the container name debian.

Alternatively, if you have an exported container image, you can use the exported image via the --docker-tar option. Makeflow will load the container into each execution node as needed. This allows you to use a container without pushing it to a remote repository.

Singularity

In a similar way, Makeflow can be used with the Singularity container system. When using this mode, Singularity will take in an image, set up the container, and runs the command inside of the container. Any needed input files will be read in from Makeflow, and created files will be delivered by Makeflow.

Invoke Makeflow with the --singularity argument, followed by the path to the desired image file. Makeflow will ensure that the named image will be transferred to each job, using the appropriate mechanism for that batch system

Umbrella

Makeflow allows the user to specify the execution environment for each rule via its --umbrella-spec and --umbrella-binary options. The --umbrella-spec option allows the user to specify an Umbrella specification, the --umbrella-binary option allows the user to specify the path to an Umbrella binary. Using this mode, each rule will be converted into an Umbrella job, and the specified Umbrella specification and binary will be added into the input file list of a job.

To test makeflow with umbrella using local execution engine: makeflow --umbrella-binary $(which umbrella) --umbrella-spec convert_S.umbrella example.makeflow To test makeflow with umbrella using wq execution engine: makeflow -T wq --umbrella-binary $(which umbrella) --umbrella-spec convert_S.umbrella example.makeflow

To run each makeflow rule as an Umbrella job, the --umbrella-spec must be specified. However, the --umbrella-binary option is optional: when it is specified, the specified umbrella binary will be sent to the execution node; when it is not specified, the execution node is expected to have an umbrella binary available through the $PATH environment variable.

Makeflow also allows the user to specify the umbrella log file prefix via its --umbrella-log-prefix option. The umbrella log file is in the format of ".". The default value for the --umbrella-log-prefix option is ".umbrella.log".

Makeflow also allows the user to specify the umbrella execution mode via its --umbrella-mode option. Currently, this option can be set to the following three modes: local, parrot, and docker. The default value of the --umbrella-mode option is local, which first tries to utilize the docker mode, and tries to utilize the parrot mode if the docker mode is not available.

You can also specify an Umbrella specification for a group of rule(s) in the Makefile by putting the following directives before the rule(s) you want to apply the Umbrella spec to:

.MAKEFLOW CATEGORY 1 .UMBRELLA SPEC convert_S.umbrella

In this case, the specified Umbrella spec will be applied to all the following rules until a new ".MAKEFLOW CATEGORY ..." directive is declared. All the rules before the first ".MAKEFLOW CATEGORY ..." directive will use the Umbrella spec specified by the --umbrella-spec option. If the --umbrella-spec option is not specified, these rules will run without being wrapped by Umbrella.

Wrapper Commands

Makeflow allows a global wrapper command to be applied to every rule in the workflow. This is particularly useful for applying troubleshooting tools, or for setting up a global environment without rewriting the entire workflow. The --wrapper option will prefix a command in front of every rule, while the --wrapper-input and --wrapper-output options will specify input and output files related to the wrapper.

A few special characters are supported by wrappers. If the wrapper command or wrapper files contain two percents (%%), then the number of the current rule will be substituted there. If the command contains curly braces ({}) the original command will be substituted at that point. Square brackets ([]) are the same as curly braces, except that the command is quoted and escaped before substitution. If neither specifier is given, Makeflow appends /bin/sh -c [] to the wrapper command.

For example, suppose that you wish to shell builtin command time to every rule in the workflow. Instead of modifying the workflow, run it like this:

makeflow --wrapper 'time -p' example.makeflow

Since the preceding wrapper did not specify where to substitute the command, it is equivalent to

makeflow --wrapper 'time -p /bin/sh -c []' example.makeflow

This way, if a single rule specifies multiple commands, the wrapper will time all of them.

The square brackets and the default behavior of running commands in a shell were added because Makeflow allows a rule to run multiple commands. The curly braces simply perform text substitution, so for example

makeflow --wrapper 'env -i {}' example.makeflow does not work correctly if multiple commands are specified. target_1: source_1 command_1; command_2; command_3 will be executed as env -i command_1; command_2; command_3

Notice that only command_1's environment will be cleared; subsequent commands are not affected. Thus this wrapper should be given as

makeflow --wrapper 'env -i /bin/sh -c []' example.makeflow or more succinctly as makeflow --wrapper 'env -i' example.makeflow

Suppose you want to apply strace to every rule, to obtain system call traces. Since every rule would have to have its own output file for the trace, you could indicate output files like this:

makeflow --wrapper 'strace -o trace.%%' --wrapper-output 'trace.%%' example.makeflow

Suppose you want to wrap every command with a script that would set up an appropriate Java environment. You might write a script called setupjava.sh like this:

#!/bin/sh export JAVA_HOME=/opt/java-9.8.3.6.7 export PATH=${JAVA_HOME}/bin:$PATH echo "using java in $JAVA_HOME" exec "$@"

And then invoke Makeflow like this:

makeflow --wrapper ./setupjava.sh --wrapper-input setupjava.sh example.makeflow

Advanced Features

Shared File Systems

By default, Makeflow does not assume that your cluster has a shared filesystem like NFS or HDFS, and so, to be safe, it copies all necessary dependencies for each job. However, if you do have a shared filesystem, it can be used to efficiently access files without making remote copies. Makeflow must be told where the shared filesystem is located, so that it can take advantage of it. To enable this, invoke Makeflow with the --shared-fs option, indicating the path where the shared filesystem is mounted. (This option can be given multiple times.) Makeflow will then verify the existence of input and output files in these locations, but will not cause them to be transferred.

For example, if you use NFS to access the /home and /software directories on your cluster, then invoke makeflow like this:

makeflow --shared-fs /home --shared-fs /software example.makeflow ...

NFS Consistency Delay

After a job completes, Makeflow checks that the expected output files were actually created. However, if the output files were written via NFS (or another shared filesystem), it is possible that the outputs may not be visible for a few seconds. This is due to caching and buffering effects in many filesystems.

If you experience this problem, you can instruct Makeflow to retry the output file check for a limited amount of time, before concluding that the files are not there. Use the --wait-for-files-upto option to specify the number of seconds to wait.

Mounting Remote Files

It is often convenient to separate the logical purpose of a file from it's physical location. For example, you may have a workflow which makes use of a large reference database called ref.db which is a standard file downloaded from a common data repository whenever needed.

Makeflow allows you to map logical file names within the workflow to arbitrary locations on disk, or downloaded from URLs. This is accomplished by writing a "mount file" in which each line lists a logical file name and its physical location.

Here is an example of a mountfile:

curl /usr/bin/curl convert ../bin/convert data/file1 /home/bob/input1 1.root http://myresearch.org/1.root

Then, simply execute Makeflow with the --mounts option to apply the desred mountfile:

makeflow --mounts my.mountfile example.makeflow ...

Before execution, Makeflow first parses each line of the mountfile when the --mounts option is set, and copies the specified dependency from the location specified by source field into a local cache, and then links the target to the item under the cache. Makeflow also records the location of the local cache and the info (source, target, filename under the cache dir, and so on) of each dependencies specified in the mountfile into its log.

To cleanup a makeflow together with the local cache and all the links created due to the mountfile:

makeflow -c example.makeflow

By default, makeflow creates a unique directory under the current working directory to hold all the dependencies introduced by the mountfile. This location can be adjusted with the --cache-dir option.

To only cleanup the local cache and all the links created due to the mountfile:

makeflow -ccache example.makeflow

To limit the behavoir of a makeflow inside the current working directory, the target field should satisfy the following requirements:

The source field can be a local file path or a http URL. When a local file path is specified, the following requirements should be satisfied:

To limit the behavoir of a makeflow inside the current working directory, the cache_dir should satisfy the following requirements:

Garbage Collection

As the workflow execution progresses, Makeflow can automatically delete intermediate files that are no longer needed. In this context, an intermediate file is an input of some rule that is the target of another rule. Therefore, by default, garbage collection does not delete the original input files, nor final target files.

Which files are deleted can be tailored from the default by appending files to the Makeflow variables MAKEFLOW_INPUTS and MAKEFLOW_OUTPUTS. Files added to MAKEFLOW_INPUTS augment the original inputs files that should not be deleted. MAKEFLOW_OUTPUTS marks final target files that should not be deleted. However, different from MAKEFLOW_INPUTS, files specified in MAKEFLOW_OUTPUTS does not include all output files. If MAKEFLOW_OUTPUTS is not specified, then all files not used in subsequent rules are considered outputs. It is considered best practice to always specify MAKEFLOW_INPUTS/OUTPUTS to clearly specify which files are considered inputs and outputs and allow for better space management if garbage collection is used.

Makeflow offers two modes for garbage collection: reference count, and on demand. With the reference count mode, intermediate files are deleted as soon as no rule has them listed as input. The on-demand mode is similar to reference count, only that files are deleted until the space on the local file system is below a given threshold.

To activate reference count garbage collection:

makeflow -gref_count

To activate on-demand garbage collection, with a threshold of 500MB:

makeflow -gon_demand -G500000000

Visualization

There are several ways to visualize both the structure of a Makeflow as well as its progress over time. makeflow_viz can be used to convert a Makeflow into a file that can be displayed by Graphviz DOT tools like this:

makeflow_viz -D dot example.makeflow > example.dot dot -Tgif < example.dot > example.gif

Or use a similar command to generate a Cytoscape input file. (This will also create a Cytoscape style.xml file.)

makeflow_viz -D cyto example.makeflow > example.xgmml

To observe how a makeflow runs over time, use makeflow_graph_log to convert a log file into a timeline that shows the number of tasks ready, running, and complete over time:

makeflow_graph_log example.makeflowlog example.png

Archiving Jobs

Makeflow allows for users to archive the results of each job within a specified archive directory. This is done using the --archive option, which by default creates a archiving directory at /tmp/makeflow.archive.$UID. Both files and jobs are stored as the workflow executes. Makeflow will also check to see if a job has already been archived into the archiving directory, and if so the outputs of the job will be copied to the working directory and the job will skip execution.

makeflow --archive example.makeflow

To only write to the archiving directory (and ensure that all nodes will be executed instead), pass --archive-write. To only read from the archive and use the outputs of any archived job, pass --archive-read. To specify a directory to use for the archiving directory, give an optional argument as shown below

makeflow --archive=/path/to/directory/ example.makeflow

Linking Dependencies

Makeflow provides a tool to collect all of the dependencies for a given workflow into one directory. By collecting all of the input files and programs contained in a workflow it is possible to run the workflow on other machines.

Currently, Makeflow copies all of the files specified as dependencies by the rules in the makeflow file, including scripts and data files. Some of the files not collected are dynamically linked libraries, executables not listed as dependencies (python, perl), and configuration files (mail.rc).

To avoid naming conflicts, files which would otherwise have an identical path are renamed when copied into the bundle:

Example usage:

makeflow_analyze -b some_output_directory example.makeflow

Technical Reference

Language Reference

The Makeflow language is very similar to Make, but it does have a few important differences that you should be aware of.

Get the Dependencies Right

You must be careful to accurately specify all of the files that a rule requires and creates, including any custom executables. This is because Makeflow requires all these information to construct the environment for a remote job. For example, suppose that you have written a simulation program called mysim.exe that reads calib.data and then produces and output file. The following rule won't work, because it doesn't inform Makeflow what files are neded to execute the simulation:

# This is an incorrect rule. output.txt: ./mysim.exe -c calib.data -o output.txt

However, the following is correct, because the rule states all of the files needed to run the simulation. Makeflow will use this information to construct a batch job that consists of mysim.exe and calib.data and uses it to produce output.txt:

# This is a correct rule. output.txt: mysim.exe calib.data ./mysim.exe -c calib.data -o output.txt

Note that when a directory is specified as an input dependency, it means that the command relies on the directory and all of its contents. So, if you have a large collection of input data, you can place it in a single directory, and then simply give the name of that directory.

No Phony Rules

For a similar reason, you cannot have "phony" rules that don't actually create the specified files. For example, it is common practice to define a clean rule in Make that deletes all derived files. This doesn't make sense in Makeflow, because such a rule does not actually create a file named clean. Instead use the -c option as shown above.

Just Plain Rules

Makeflow does not support all of the syntax that you find in various versions of Make. Each rule must have exactly one command to execute. If you have multiple commands, simply join them together with semicolons. Makeflow allows you to define and use variables, but it does not support pattern rules, wildcards, or special variables like $< or $@. You simply have to write out the rules longhand, or write a script in your favorite language to generate a large Makeflow.

Local Job Execution

Certain jobs don't make much sense to distribute. For example, if you have a very fast running job that consumes a large amount of data, then it should simply run on the same machine as Makeflow. To force this, simply add the word LOCAL to the beginning of the command line in the rule.

Rule Lexical Scope

Variables in Makeflow have global scope, that is, once defined, their value can be accessed from any rule. Sometimes it is useful to define a variable locally inside a rule, without affecting the global value. In Makeflow, this can be achieved by defining the variables after the rule's requirements, but before the rule's command, and prepending the name of the variable with @, as follows:

SOME_VARIABLE=original_value target_1: source_1 command_1 target_2: source_2 @SOME_VARIABLE=local_value_for_2 command_2 target_3: source_3 command_3 In this example, SOME_VARIABLE has the value 'original_value' for rules 1 and 3, and the value 'local_value_for_2' for rule 2.

Environment Variables

Environment variables can be defined with the export keyword inside a workflow. Makeflow will communicate explicitly named environment variables to remote batch systems, where they will override whatever local setting is present. For example, suppose you want to modify the PATH for every job in the makeflow: export PATH=/opt/tools/bin:${PATH} If no value is given, then the current value of the environment variable is passed along to the job: export USER

Remote File Renaming

With the Work Queue and Condor batch systems, Makeflow has a feature called remote file renaming. For example:

local_name->remote_name

indicates that the file local_name is called remote_name in the remote system. Consider the following example:

b.out: a.in myprog LOCAL myprog a.in > b.out c.out->out: a.in->in1 b.out myprog->prog prog in1 b.out > out

The first rule runs locally, using the executable myprog and the local file a.in to locally create b.out. The second rule runs remotely, but the remote system expects a.in to be named in1, c.out, to be named out and so on. Note that we did not need to rename the file b.out. Without remote file renaming, we would have to create either a symbolic link, or a copy of the files with the expected correct names.

Nested Makeflows

One Makeflow can be nested inside of another by writing a rule with the following syntax: output-files: input-files MAKEFLOW makeflow-file [working-dir] The input and output files are specified as usual, describing the files consumed and created by the child makeflow as a whole. Then, the MAKEFLOW keyword< introduces the child makeflow specification, and an optional working directory into which the makeflow will be executed. If not given, the current working directory is assumed.

Resources and Categories

Makeflow has the capability of automatically setting the cores, memory, and disk space requirements to the underlying batch system (currently this only works with Work Queue and Condor). Jobs are grouped into job categories , and jobs in the same category have the same cores, memory, and disk requirements.

Job categories and resources are specified with variables. Jobs are assigned to the category named in the value of the variable CATEGORY. Likewise, the values of the variables CORES, MEMORY (in MB), and DISK (in MB) describe the resources requirements for the category specified in CATEGORY.

Jobs without an explicit category are assigned to default. Jobs in the default category get their resource requirements from the value of the environment variables CORES, MEMORY, and DISK.

Consider the following example: # These tasks are assigned to the category preprocessing. # MEMORY and CORES are read from the environment, if defined. CATEGORY="preprocessing" DISK=500 one: src cmd two: src cmd # These tasks have the category "simulation". Note that now CORES, MEMORY, and DISK are specified. CATEGORY="simulation" CORES=1 MEMORY=400 DISK=400 three: src cmd four: src cmd # Another category switch. MEMORY is read from the environment. CATEGORY="analysis" CORES=4 DISK=600 five: src cmd export MEMORY=800 makeflow ...

Resources Specified

CategoryCoresMemory (MB)Disk (MB)
preprocessing (unspecified) 800 (from environment) 500
simulation 1 400 400
analysis 4 800 (from environment) 600

Transaction Log

As Makeflow runs, it creates a transaction log that records the details of each job, where it was sent, how long it ran, and what resources it consumed. By default, this log is named X.makeflowlog where X is the name of the original makeflow file.

The transaction log serves several purposes:

  1. Recovery. The transaction log allows Makeflow to continue where it left off. If you must restart Makeflow after a crash, it will read the transaction log, determine what jobs are complete (or still running) and then continue forward from there.
  2. Cleanup. The --clean option relies on the transaction log to quickly determine exactly which files have been created and which jobs have been submitted, so that they can be quickly and precisely deleted and removed. (There is no need to create a clean rule by hand, as you would in traditional Make.)
  3. Monitoring. Tools like makeflow_monitor and makeflow_graph_log read the transaction log to determine the current state of the workflow and display it to the user.

A sample transaction log might look like this:

# STARTED 1435251570723463 # 1 capitol.jpg 1435251570725086 1435251570725528 5 1 17377 5 1 0 0 0 6 # 2 capitol.jpg 1435251570876426 1435251570876486 5 2 17377 5 0 1 0 0 6 # 1 capitol.360.jpg 1435251570876521 1435251570876866 4 1 17379 4 1 1 0 0 6 # 1 capitol.270.jpg 1435251570876918 1435251570877166 3 1 17380 3 2 1 0 0 6 # 2 capitol.270.jpg 1435251570984114 1435251570984161 3 2 17380 3 1 2 0 0 6 # 1 capitol.180.jpg 1435251570984199 1435251570984533 2 1 17383 2 2 2 0 0 6 # 2 capitol.360.jpg 1435251571003847 1435251571003923 4 2 17379 2 1 3 0 0 6 # 1 capitol.90.jpg 1435251571003969 1435251571004476 1 1 17384 1 2 3 0 0 6 # 2 capitol.180.jpg 1435251571058319 1435251571058369 2 2 17383 1 1 4 0 0 6 # 2 capitol.90.jpg 1435251571094157 1435251571094214 1 2 17384 1 0 5 0 0 6 # 1 capitol.anim.gif 1435251571094257 1435251571094590 0 1 17387 0 1 5 0 0 6 # 2 capitol.anim.gif 1435251575980215 # 3 capitol.360.jpg 1435251575980270 # 3 capitol.270.jpg 1435251575980288 # 3 capitol.180.jpg 1435251575980303 # 3 capitol.90.jpg 1435251575980319 # 3 capitol.jpg 1435251575980334 1435251575980350 0 2 17387 0 0 6 0 0 6 # COMPLETED 1435251575980391

Each line in the log file represents a single action taken on a single rule in the workflow. For simplicity, rules are numbered from the beginning of the Makeflow, starting with zero. Each line contains the following items:

timestamp task_id new_state job_id tasks_waiting tasks_running tasks_complete tasks_failed tasks_aborted task_id_counter

Which are defined as follows:

In addition, lines starting with a pound sign are comments and contain additional high-level information that can be safely ignored. The transaction log begins with a comment to indicate the starting time, and ends with a comment indicating whether the entire workflow completed, failed, or was aborted.

Aside from the high-level information, file states are also recorded in the log. This allows for tracking files throughout the workflow execution. This information is shown starting with the #:

# new_state filename timestamp

Each file state line records the state change and time: