The Makeflow Workflow System - Cooperative Computing Lab

CCL Home

Software

Community

Operations

Makeflow is a workflow system for executing large complex workflows on clusters, clouds, and grids.

Makeflow is easy to use. The Makeflow language is similar to traditional Make, so if you can write a Makefile, then you can write a Makeflow. A workflow can be just a few commands chained together, or it can be a complex application consisting of thousands of tasks. It can have an arbitrary DAG structure and is not limited to specific patterns.

Makeflow is production-ready. Makeflow is used on a daily basis to execute complex scientific applications in fields such as data mining, high energy physics, image processing, and bioinformatics. It has run on campus clusters, the Open Science Grid, NSF XSEDE machines, NCSA Blue Waters, and Amazon Web Services. Here are some real examples of workflows used in production systems:

(Makeflow Examples Repository)

Makeflow is portable. A workflow is written in a technology neutral way, and then can be deployed to a variety of different systems without modification, including local execution on a single multicore machine, public cloud services such as Amazon EC2 and Amazon Lambda, batch systems like HTCondor, SGE, PBS, Torque, SLURM, or the bundled Work Queue system. Makeflow can also easily run your jobs in a container environment like Docker or Singularity on top of an existing batch system. The same specification works for all systems, so you can easily move your application from one system to another without rewriting everything.

Makeflow is powerful. Makeflow can handle workloads of millions of jobs running on thousands of machines for months at a time. Makeflow is highly fault tolerant: it can crash or be killed, and upon resuming, will reconnect to running jobs and continue where it left off. A variety of analysis tools are available to understand the performance of your jobs, measure the progress of a workflow, and visualize what is going on.

Install Makeflow

Getting Started

Makeflow User's Manual

Makeflow Tutorial Slides

Makeflow Example Repository

Install Makeflow

Getting Help with Makeflow

Online Introduction to Workflows

Research Publications

(Showing papers with tag makeflow. See all papers instead.)

Tim Shaffer, Nathaniel Kremer-Herman, and Douglas Thain,
Flexible Partitioning of Scientific Workflows Using the JX Workflow Language,
Practice and Experience in Advanced Research Computing (PEARC), July, 2019. DOI: 10.1145/3332186.3338100

Qimin Zhang, Ben Tovar, Nate Kremer-Herman, and Douglas Thain,
Reduction of Workflow Resource Consumption Using a Density-based Clustering Model,
WORKS Workshop at Supercomputing, November, 2018.

Nicholas Hazekamp and Douglas Thain,
An Algebra for Robust Workflow Transformations,
IEEE International Conference on e-Science, pages 12, October, 2018. DOI: 10.1109/eScience.2018.00031

Tim Shaffer, Kyle M.D. Sweeney, Nathaniel Kremer-Herman, and Douglas Thain,
Poster: A First Look at the JX Workflow Language,
IEEE International Conference on e-Science, October, 2018. DOI: 10.1109/eScience.2018.00094

Kyle Sweeney and Douglas Thain,
Early Experience Using Amazon Batch for Scientific Workflows,
ScienceCloud Workshop at HPDC , June, 2018. DOI: 10.1145/3217880.3217885

Kyle Sweeney and Douglas Thain,
Efficient Integration of Containers into Scientific Workflows,
Science Cloud Workshop at HPDC, June, 2018. DOI: 10.1145/3217880.3217887

Nicholas Hazekamp, Nathaniel Kremer-Herman, Benjamin Tovar, Haiyan Meng, Olivia Choudhury, Scott Emrich, and Douglas Thain,
Combining Static and Dynamic Storage Management for Data Intensive Scientific Workflows,
IEEE Transactions on Parallel and Distributed Systems, 29(2), pages 338-350, February, 2018. DOI: 10.1109/TPDS.2017.2764897

Haiyan Meng and Douglas Thain,
Facilitating the Reproducibility of Scientific Workflows with Execution Environment Specifications,
The 17th International Conference on Computational Science (ICCS), June, 2017. DOI: 10.1016/j.procs.2017.05.116

Charles (Chao) Zheng, Ben Tovar and Douglas Thain,
Deploying High Throughput Scientific Workflows on Container Schedulers with Makeflow and Mesos,
17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2017), May, 2017. DOI: 10.1109/CCGRID.2017.9

Dinesh Rajan and Douglas Thain,
Designing Self-Tuning Split-Map-Merge Applications for High Cost-Efficiency in the Cloud,
IEEE Transactions on Cloud Computing, 5(2), pages 303-316, April, 2017. DOI: 10.1109/TCC.2015.2415780

Patrick Donnelly and Douglas Thain,
Balancing push and pull in Confuga, an active storage cluster file system for scientific workflows,
Concurrency and Computation: Practice and Experience, 29(4), May, 2016. DOI: 10.1002/cpe.3834

Patrick Donnelly,
Data Locality Techniques in an Active Cluster Filesystem for Scientific Workflows,
Ph.D. Thesis, University of Notre Dame, April, 2016.

Nicholas Hazekamp, Joseph Sarro, Olivia Choudhury, Sandra Gesing, Scott Emrich, and Douglas Thain,
Scaling Up Bioinformatics Workflows with Dynamic Job Expansion,
IEEE International Conference on e-Science, August, 2015.

Charles (Chao) Zheng and Douglas Thain,
Integrating Containers into Workflows: A Case Study Using Makeflow, Work Queue, and Docker,
Workshop on Virtualization Technologies in Distributed Computing (VTDC), June, 2015. DOI: 10.1145/2755979.2755984

Patrick Donnelly, Nicholas Hazekamp, Douglas Thain,
Confuga: Scalable Data Intensive Computing for POSIX Workflows,
IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pages 392-401, May, 2015. DOI: 10.1109/CCGrid.2015.95

Olivia Choudhury, Nicholas L. Hazekamp, Douglas Thain, Scott Emrich,
Accelerating Comparative Genomics Workflows in a Distributed Environment with Optimized Data Partitioning,
C4BIO Workshop at IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), May, 2014.

Nicholas Hazekamp, Olivia Choudhury, Sandra Gesing, Scott Emrich, and Douglas Thain,
Poster: Expanding Tasks of Logical Workflows into Independent Workflows for Improved Scalability,
IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pages 548-549, January, 2014. DOI: 10.1109/CCGrid.2014.84

Casey Robinson and Douglas Thain,
Automated Packaging of Bioinformatics Workflows for Portability and Durability Using Makeflow,
Workshop on Workflows in Support of Large-Scale Science (WORKS), November, 2013. DOI: 10.1145/2534248.2534258

Peter Bui,
A Compiler Toolchain For Data Intensive Scientific Workflows,
Ph.D. Thesis, University of Notre Dame, June, 2012.

Michael Albrecht, Patrick Donnelly, Peter Bui, and Douglas Thain,
Makeflow: A Portable Abstraction for Data Intensive Computing on Clusters, Clouds, and Grids,
Workshop on Scalable Workflow Enactment Engines and Technologies (SWEET) at ACM SIGMOD, May, 2012. DOI: 10.1145/2443416.2443417

Rory Carmichael, Patrick Braga-Henebry, Douglas Thain, and Scott Emrich,
Biocompute 2.0: An Improved Collaborative Workspace for Data Intensive Bio-Science.,
Concurrency and Computation: Practice and Experience, 23(17), pages 2305-2314, December, 2011. DOI: 10.1002/cpe.1782

Peter Bui, Li Yu, Andrew Thrasher, Rory Carmichael, Irena Lanc, Patrick Donnelly, Douglas Thain,
Scripting distributed scientific workflows using Weaver,
Concurrency and Computation: Practice and Experience, 24(15), November, 2011. DOI: 10.1002/cpe.1871

Irena Lanc, Peter Bui, Douglas Thain, and Scott Emrich,
Adapting Bioinformatics Applications for Heterogeneous Systems: A Case Study,
Emerging Computational Methods for the Life Sciences Workshop at ACM HPDC, pages 7-13, June, 2011. DOI: 10.1145/1996023.1996025

Andrew Thrasher, Rory Carmichael, Peter Bui, Li Yu, Douglas Thain, and Scott Emrich,
Taming Complex Bioinformatics Workflows with Weaver, Makeflow, and Starch,
Workshop on Workflows in Support of Large Scale Science, pages 1-6, November, 2010. DOI: 10.1109/WORKS.2010.5671858

Li Yu, Christopher Moretti, Andrew Thrasher, Scott Emrich, Kenneth Judd, and Douglas Thain,
Harnessing Parallelism in Multicore Clusters with the All-Pairs, Wavefront, and Makeflow Abstractions,
Journal of Cluster Computing, 13(3), pages 243-256, September, 2010. DOI: 10.1007/s10586-010-0134-7

Douglas Thain and Christopher Moretti,
Abstractions for Cloud Computing with Condor,
Syed Ahson and Mohammad Ilyas, Cloud Computing and Software Services: Theory and Techniques, pages 153-171, CRC Press, July, 2010. ISBN: 9781439803158

Rory Carmichael, Patrick Braga-Henebry, Douglas Thain, and Scott Emrich,
Biocompute: Toward a Collaborative Workspace for Data Intensive Bio-Science,
Workshop on Emerging Computational Methods for Life Sciences at ACM HPDC 2010, pages 489-498, June, 2010. DOI: 10.1145/1851476.1851547