Makeflow is a workflow system for executing large complex workflows on clusters, clouds, and grids.
Makeflow is easy to use. The Makeflow language is similar to traditional Make, so if you can write a Makefile, then you can write a Makeflow. A workflow can be just a few commands chained together, or it can be a complex application consisting of thousands of tasks. It can have an arbitrary DAG structure and is not limited to specific patterns.
Makeflow is production-ready. Makeflow is used on a daily basis to execute complex scientific applications in fields such as data mining, high energy physics, image processing, and bioinformatics. It has run on campus clusters, the Open Science Grid, NSF XSEDE machines, NCSA Blue Waters, and Amazon Web Services. Here are some real examples of workflows used in production systems:
Makeflow is portable. A workflow is written in a technology neutral way, and then can be deployed to a variety of different systems without modification, including local execution on a single multicore machine, public cloud services such as Amazon EC2 and Amazon Lambda, batch systems like HTCondor, SGE, PBS, Torque, SLURM, or the bundled Work Queue system. Makeflow can also easily run your jobs in a container environment like Docker or Singularity on top of an existing batch system. The same specification works for all systems, so you can easily move your application from one system to another without rewriting everything.
Makeflow is powerful. Makeflow can handle workloads of millions of jobs running on thousands of machines for months at a time. Makeflow is highly fault tolerant: it can crash or be killed, and upon resuming, will reconnect to running jobs and continue where it left off. A variety of analysis tools are available to understand the performance of your jobs, measure the progress of a workflow, and visualize what is going on.
Video Introduction to Workflows
Related Publications
Flexible Partitioning of Scientific Workflows Using the JX Workflow Language
Tim Shaffer, Nathaniel Kremer-Herman, and Douglas Thain
In Practice and Experience in Advanced Research Computing (PEARC), 2019
@inproceedings{jx-pearc19,author={Shaffer, Tim and Kremer-Herman, Nathaniel and Thain, Douglas},title={{Flexible Partitioning of Scientific Workflows Using the JX Workflow Language}},booktitle={{Practice and Experience in Advanced Research Computing (PEARC)}},year={2019},note={{doi: 10.1145/3332186.3338100}},cclpaperid={961},keywords={makeflow, jx},}
Reduction of Workflow Resource Consumption Using a Density-based Clustering Model
Qimin Zhang, Ben Tovar, Nate Kremer-Herman, and Douglas Thain
@inproceedings{clustering-works-2018,author={Zhang, Qimin and Tovar, Ben and Kremer-Herman, Nate and Thain, Douglas},title={{Reduction of Workflow Resource Consumption Using a Density-based Clustering Model}},booktitle={{WORKS Workshop at Supercomputing}},year={2018},cclpaperid={956},keywords={makeflow, resource_monitor},}
An Algebra for Robust Workflow Transformations
Nicholas Hazekamp and Douglas Thain
In IEEE International Conference on e-Science, 2018
@inproceedings{transformation-escience-2018,author={Hazekamp, Nicholas and Thain, Douglas},title={{An Algebra for Robust Workflow Transformations}},booktitle={{IEEE International Conference on e-Science}},pages={12},year={2018},note={{doi: 10.1109/eScience.2018.00031}},cclpaperid={953},keywords={makeflow},}
Poster: A First Look at the JX Workflow Language
Tim Shaffer, Kyle M.D. Sweeney, Nathaniel Kremer-Herman, and Douglas Thain
In IEEE International Conference on e-Science, 2018
@inproceedings{jx-escience18,author={Shaffer, Tim and Sweeney, Kyle M.D. and Kremer-Herman, Nathaniel and Thain, Douglas},title={{Poster: A First Look at the JX Workflow Language}},booktitle={{IEEE International Conference on e-Science}},year={2018},note={{doi: 10.1109/eScience.2018.00094}},cclpaperid={954},keywords={makeflow, jx},}
Early Experience Using Amazon Batch for Scientific Workflows
@inproceedings{batch-sciencecloud-2018,author={Sweeney, Kyle and Thain, Douglas},title={{Early Experience Using Amazon Batch for Scientific Workflows}},booktitle={{ScienceCloud Workshop at HPDC }},year={2018},note={{doi: 10.1145/3217880.3217885}},cclpaperid={950},keywords={makeflow},}
Efficient Integration of Containers into Scientific Workflows
@inproceedings{containers-sciencecloud-2018,author={Sweeney, Kyle and Thain, Douglas},title={{Efficient Integration of Containers into Scientific Workflows}},booktitle={{Science Cloud Workshop at HPDC}},year={2018},note={{doi: 10.1145/3217880.3217887}},cclpaperid={951},keywords={makeflow},}
Combining Static and Dynamic Storage Management for Data Intensive Scientific Workflows
Nicholas Hazekamp, Nathaniel Kremer-Herman, Benjamin Tovar, Haiyan Meng, Olivia Choudhury, Scott Emrich, and Douglas Thain
IEEE Transactions on Parallel and Distributed Systems, 2018
@article{mf-storage-tpds17,author={Hazekamp, Nicholas and Kremer-Herman, Nathaniel and Tovar, Benjamin and Meng, Haiyan and Choudhury, Olivia and Emrich, Scott and Thain, Douglas},title={{Combining Static and Dynamic Storage Management for Data Intensive Scientific Workflows}},journal={{IEEE Transactions on Parallel and Distributed Systems}},volume={29},number={2},pages={338-350},year={2018},note={{doi: 10.1109/TPDS.2017.2764897}},cclpaperid={942},keywords={makeflow},}
Facilitating the Reproducibility of Scientific Workflows with Execution Environment Specifications
Haiyan Meng and Douglas Thain
In The 17th International Conference on Computational Science (ICCS), 2017
@inproceedings{PAPER937,author={Meng, Haiyan and Thain, Douglas},title={{Facilitating the Reproducibility of Scientific Workflows with Execution Environment Specifications}},booktitle={{The 17th International Conference on Computational Science (ICCS)}},year={2017},note={{doi: 10.1016/j.procs.2017.05.116}},cclpaperid={937},keywords={makeflow, umbrella},}
Deploying High Throughput Scientific Workflows on Container Schedulers with Makeflow and Mesos
Charles (Chao) Zheng, Ben Tovar, and Douglas Thain
In 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2017), 2017
@inproceedings{makeflow-mesos-ccgrid17.pdf,author={Zheng, Charles (Chao) and Tovar, Ben and Thain, Douglas},title={{Deploying High Throughput Scientific Workflows on Container Schedulers with Makeflow and Mesos}},booktitle={{17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2017)}},year={2017},note={{doi: 10.1109/CCGRID.2017.9}},cclpaperid={939},keywords={makeflow},}
Designing Self-Tuning Split-Map-Merge Applications for High Cost-Efficiency in the Cloud
@article{tuning-tcc-2015,author={Rajan, Dinesh and Thain, Douglas},title={{Designing Self-Tuning Split-Map-Merge Applications for High Cost-Efficiency in the Cloud}},journal={{IEEE Transactions on Cloud Computing}},volume={5},number={2},pages={303-316},year={2017},note={{doi: 10.1109/TCC.2015.2415780}},cclpaperid={909},keywords={makeflow, workqueue, hecura},}
Balancing push and pull in Confuga, an active storage cluster file system for scientific workflows
Patrick Donnelly and Douglas Thain
Concurrency and Computation: Practice and Experience, 2016
@article{ccpe-confuga,author={Donnelly, Patrick and Thain, Douglas},title={{Balancing push and pull in Confuga, an active storage cluster file system for scientific workflows}},journal={{Concurrency and Computation: Practice and Experience}},volume={29},number={4},year={2016},note={{doi: 10.1002/cpe.3834}},cclpaperid={929},keywords={makeflow, chirp, confuga},}
Data Locality Techniques in an Active Cluster Filesystem for Scientific Workflows
@thesis{pdonnelly-thesis,author={Donnelly, Patrick},title={{Data Locality Techniques in an Active Cluster Filesystem for Scientific Workflows}},editor={Thesis, Ph.D.},booktitle={{University of Notre Dame}},year={2016},cclpaperid={928},keywords={makeflow, chirp, confuga},}
Scaling Up Bioinformatics Workflows with Dynamic Job Expansion
Nicholas Hazekamp, Joseph Sarro, Olivia Choudhury, Sandra Gesing, Scott Emrich, and Douglas Thain
In IEEE International Conference on e-Science, 2015
@inproceedings{scaling-escience-2015,author={Hazekamp, Nicholas and Sarro, Joseph and Choudhury, Olivia and Gesing, Sandra and Emrich, Scott and Thain, Douglas},title={{Scaling Up Bioinformatics Workflows with Dynamic Job Expansion}},booktitle={{IEEE International Conference on e-Science}},year={2015},cclpaperid={920},keywords={makeflow},}
Integrating Containers into Workflows: A Case Study Using Makeflow, Work Queue, and Docker
Charles (Chao) Zheng and Douglas Thain
In Workshop on Virtualization Technologies in Distributed Computing (VTDC), 2015
@inproceedings{wq-docker-vtdc15,author={Zheng, Charles (Chao) and Thain, Douglas},title={{Integrating Containers into Workflows: A Case Study Using Makeflow, Work Queue, and Docker}},booktitle={{Workshop on Virtualization Technologies in Distributed Computing (VTDC)}},year={2015},note={{doi: 10.1145/2755979.2755984}},cclpaperid={910},keywords={makeflow, workqueue},}
Confuga: Scalable Data Intensive Computing for POSIX Workflows
Patrick Donnelly, Nicholas Hazekamp, and Douglas Thain
In IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2015
@inproceedings{confuga-ccgrid2015,author={Donnelly, Patrick and Hazekamp, Nicholas and Thain, Douglas},title={{Confuga: Scalable Data Intensive Computing for POSIX Workflows}},booktitle={{IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing}},pages={392-401},year={2015},note={{doi: 10.1109/CCGrid.2015.95}},cclpaperid={908},keywords={makeflow, chirp, confuga},}
Accelerating Comparative Genomics Workflows in a Distributed Environment with Optimized Data Partitioning
Olivia Choudhury, Nicholas L. Hazekamp, Douglas Thain, and Scott Emrich
In C4BIO Workshop at IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2014
@inproceedings{bio-partition-c4bio-grid14,author={Choudhury, Olivia and Hazekamp, Nicholas L. and Thain, Douglas and Emrich, Scott},title={{Accelerating Comparative Genomics Workflows in a Distributed Environment with Optimized Data Partitioning}},booktitle={{C4BIO Workshop at IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)}},year={2014},cclpaperid={903},keywords={makeflow, workqueue},}
Poster: Expanding Tasks of Logical Workflows into Independent Workflows for Improved Scalability
Nicholas Hazekamp, Olivia Choudhury, Sandra Gesing, Scott Emrich, and Douglas Thain
In IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2014
@inproceedings{workflow-expand-grid14,author={Hazekamp, Nicholas and Choudhury, Olivia and Gesing, Sandra and Emrich, Scott and Thain, Douglas},title={{Poster: Expanding Tasks of Logical Workflows into Independent Workflows for Improved Scalability}},booktitle={{IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing}},pages={548-549},year={2014},note={{doi: 10.1109/CCGrid.2014.84}},cclpaperid={901},keywords={makeflow},}
Automated Packaging of Bioinformatics Workflows for Portability and Durability Using Makeflow
Casey Robinson and Douglas Thain
In Workshop on Workflows in Support of Large-Scale Science (WORKS), 2013
@inproceedings{automated-packaging-works13,author={Robinson, Casey and Thain, Douglas},title={{Automated Packaging of Bioinformatics Workflows for Portability and Durability Using Makeflow}},booktitle={{Workshop on Workflows in Support of Large-Scale Science (WORKS)}},year={2013},note={{doi: 10.1145/2534248.2534258}},cclpaperid={899},keywords={makeflow},}
A Compiler Toolchain For Data Intensive Scientific Workflows
@thesis{pbui-dissertation.pdf,author={Bui, Peter},title={{A Compiler Toolchain For Data Intensive Scientific Workflows}},editor={Thesis, Ph.D.},booktitle={{University of Notre Dame}},year={2012},cclpaperid={889},keywords={makeflow, hecura}}
Makeflow: A Portable Abstraction for Data Intensive Computing on Clusters, Clouds, and Grids
Michael Albrecht, Patrick Donnelly, Peter Bui, and Douglas Thain
In Workshop on Scalable Workflow Enactment Engines and Technologies (SWEET) at ACM SIGMOD, 2012
@inproceedings{makeflow-sweet12,author={Albrecht, Michael and Donnelly, Patrick and Bui, Peter and Thain, Douglas},title={{Makeflow: A Portable Abstraction for Data Intensive Computing on Clusters, Clouds, and Grids}},booktitle={{Workshop on Scalable Workflow Enactment Engines and Technologies (SWEET) at ACM SIGMOD}},year={2012},note={{doi: 10.1145/2443416.2443417}},cclpaperid={104},keywords={makeflow, hecura}}
Biocompute 2.0: An Improved Collaborative Workspace for Data Intensive Bio-Science.
Rory Carmichael, Patrick Braga-Henebry, Douglas Thain, and Scott Emrich
Concurrency and Computation: Practice and Experience, 2011
@article{biocompute-ccpe,author={Carmichael, Rory and Braga-Henebry, Patrick and Thain, Douglas and Emrich, Scott},title={{Biocompute 2.0: An Improved Collaborative Workspace for Data Intensive Bio-Science.}},journal={{Concurrency and Computation: Practice and Experience}},volume={23},number={17},pages={2305-2314},year={2011},note={{doi: 10.1002/cpe.1782}},cclpaperid={96},keywords={makeflow},}
Scripting distributed scientific workflows using Weaver
Peter Bui, Li Yu, Andrew Thrasher, Rory Carmichael, Irena Lanc, Patrick Donnelly, and Douglas Thain
Concurrency and Computation: Practice and Experience, 2011
@article{weaver-ccpe,author={Bui, Peter and Yu, Li and Thrasher, Andrew and Carmichael, Rory and Lanc, Irena and Donnelly, Patrick and Thain, Douglas},title={{Scripting distributed scientific workflows using Weaver}},journal={{Concurrency and Computation: Practice and Experience}},volume={24},number={15},year={2011},note={{doi: 10.1002/cpe.1871}},cclpaperid={98},keywords={makeflow},}
Taming Complex Bioinformatics Workflows with Weaver, Makeflow, and Starch
Andrew Thrasher, Rory Carmichael, Peter Bui, Li Yu, Douglas Thain, and Scott Emrich
In Workshop on Workflows in Support of Large Scale Science, 2010
@inproceedings{taming-works10.pdf,author={Thrasher, Andrew and Carmichael, Rory and Bui, Peter and Yu, Li and Thain, Douglas and Emrich, Scott},title={{Taming Complex Bioinformatics Workflows with Weaver, Makeflow, and Starch}},booktitle={{Workshop on Workflows in Support of Large Scale Science}},pages={1-6},year={2010},note={{doi: 10.1109/WORKS.2010.5671858}},cclpaperid={92},keywords={makeflow}}
Harnessing Parallelism in Multicore Clusters with the All-Pairs, Wavefront, and Makeflow Abstractions
Li Yu, Christopher Moretti, Andrew Thrasher, Scott Emrich, Kenneth Judd, and Douglas Thain
@article{abstr-jcc,author={Yu, Li and Moretti, Christopher and Thrasher, Andrew and Emrich, Scott and Judd, Kenneth and Thain, Douglas},title={{Harnessing Parallelism in Multicore Clusters with the All-Pairs, Wavefront, and Makeflow Abstractions}},journal={{Journal of Cluster Computing}},volume={13},number={3},pages={243-256},year={2010},note={{doi: 10.1007/s10586-010-0134-7}},cclpaperid={83},keywords={makeflow, workqueue, allpairs, wavefront, hecura},}