Makeflow | The Cooperative Computing Lab

Install User Manual Tutorial Slides Example Repository

Makeflow is a workflow system for executing large complex workflows on clusters, clouds, and grids.

Makeflow is easy to use. The Makeflow language is similar to traditional Make, so if you can write a Makefile, then you can write a Makeflow. A workflow can be just a few commands chained together, or it can be a complex application consisting of thousands of tasks. It can have an arbitrary DAG structure and is not limited to specific patterns.

Makeflow is production-ready. Makeflow is used on a daily basis to execute complex scientific applications in fields such as data mining, high energy physics, image processing, and bioinformatics. It has run on campus clusters, the Open Science Grid, NSF XSEDE machines, NCSA Blue Waters, and Amazon Web Services. Here are some real examples of workflows used in production systems:

Makeflow is portable. A workflow is written in a technology neutral way, and then can be deployed to a variety of different systems without modification, including local execution on a single multicore machine, public cloud services such as Amazon EC2 and Amazon Lambda, batch systems like HTCondor, SGE, PBS, Torque, SLURM, or the bundled Work Queue system. Makeflow can also easily run your jobs in a container environment like Docker or Singularity on top of an existing batch system. The same specification works for all systems, so you can easily move your application from one system to another without rewriting everything.

Makeflow is powerful. Makeflow can handle workloads of millions of jobs running on thousands of machines for months at a time. Makeflow is highly fault tolerant: it can crash or be killed, and upon resuming, will reconnect to running jobs and continue where it left off. A variety of analysis tools are available to understand the performance of your jobs, measure the progress of a workflow, and visualize what is going on.

Video Introduction to Workflows

Related Publications

Flexible Partitioning of Scientific Workflows Using the JX Workflow Language

Tim Shaffer, Nathaniel Kremer-Herman, and Douglas Thain

In Practice and Experience in Advanced Research Computing (PEARC), 2019

doi: 10.1145/3332186.3338100

@inproceedings{jx-pearc19,
  author = {Shaffer, Tim and Kremer-Herman, Nathaniel and Thain, Douglas},
  title = {{Flexible Partitioning of Scientific Workflows Using the JX Workflow Language}},
  booktitle = {{Practice and Experience in Advanced Research Computing (PEARC)}},
  year = {2019},
  note = {{doi: 10.1145/3332186.3338100}},
  cclpaperid = {961},
  keywords = {makeflow, jx},
}

Reduction of Workflow Resource Consumption Using a Density-based Clustering Model

Qimin Zhang, Ben Tovar, Nate Kremer-Herman, and Douglas Thain

In WORKS Workshop at Supercomputing, 2018

@inproceedings{clustering-works-2018,
  author = {Zhang, Qimin and Tovar, Ben and Kremer-Herman, Nate and Thain, Douglas},
  title = {{Reduction of Workflow Resource Consumption Using a Density-based Clustering Model}},
  booktitle = {{WORKS Workshop at Supercomputing}},
  year = {2018},
  cclpaperid = {956},
  keywords = {makeflow, resource_monitor},
}

An Algebra for Robust Workflow Transformations

Nicholas Hazekamp and Douglas Thain

In IEEE International Conference on e-Science, 2018

doi: 10.1109/eScience.2018.00031

@inproceedings{transformation-escience-2018,
  author = {Hazekamp, Nicholas and Thain, Douglas},
  title = {{An Algebra for Robust Workflow Transformations}},
  booktitle = {{IEEE International Conference on e-Science}},
  pages = {12},
  year = {2018},
  note = {{doi: 10.1109/eScience.2018.00031}},
  cclpaperid = {953},
  keywords = {makeflow},
}

Poster: A First Look at the JX Workflow Language

Tim Shaffer, Kyle M.D. Sweeney, Nathaniel Kremer-Herman, and Douglas Thain

In IEEE International Conference on e-Science, 2018

doi: 10.1109/eScience.2018.00094

@inproceedings{jx-escience18,
  author = {Shaffer, Tim and Sweeney, Kyle M.D. and Kremer-Herman, Nathaniel and Thain, Douglas},
  title = {{Poster: A First Look at the JX Workflow Language}},
  booktitle = {{IEEE International Conference on e-Science}},
  year = {2018},
  note = {{doi: 10.1109/eScience.2018.00094}},
  cclpaperid = {954},
  keywords = {makeflow, jx},
}

Early Experience Using Amazon Batch for Scientific Workflows

Kyle Sweeney and Douglas Thain

In ScienceCloud Workshop at HPDC , 2018

doi: 10.1145/3217880.3217885

@inproceedings{batch-sciencecloud-2018,
  author = {Sweeney, Kyle and Thain, Douglas},
  title = {{Early Experience Using Amazon Batch for Scientific Workflows}},
  booktitle = {{ScienceCloud Workshop at HPDC }},
  year = {2018},
  note = {{doi: 10.1145/3217880.3217885}},
  cclpaperid = {950},
  keywords = {makeflow},
}

Efficient Integration of Containers into Scientific Workflows

Kyle Sweeney and Douglas Thain

In Science Cloud Workshop at HPDC, 2018

doi: 10.1145/3217880.3217887

@inproceedings{containers-sciencecloud-2018,
  author = {Sweeney, Kyle and Thain, Douglas},
  title = {{Efficient Integration of Containers into Scientific Workflows}},
  booktitle = {{Science Cloud Workshop at HPDC}},
  year = {2018},
  note = {{doi: 10.1145/3217880.3217887}},
  cclpaperid = {951},
  keywords = {makeflow},
}

Combining Static and Dynamic Storage Management for Data Intensive Scientific Workflows

Nicholas Hazekamp, Nathaniel Kremer-Herman, Benjamin Tovar, Haiyan Meng, Olivia Choudhury, Scott Emrich, and Douglas Thain

IEEE Transactions on Parallel and Distributed Systems, 2018

doi: 10.1109/TPDS.2017.2764897

@article{mf-storage-tpds17,
  author = {Hazekamp, Nicholas and Kremer-Herman, Nathaniel and Tovar, Benjamin and Meng, Haiyan and Choudhury, Olivia and Emrich, Scott and Thain, Douglas},
  title = {{Combining Static and Dynamic Storage Management for Data Intensive Scientific Workflows}},
  journal = {{IEEE Transactions on Parallel and Distributed Systems}},
  volume = {29},
  number = {2},
  pages = {338-350},
  year = {2018},
  note = {{doi: 10.1109/TPDS.2017.2764897}},
  cclpaperid = {942},
  keywords = {makeflow},
}

Facilitating the Reproducibility of Scientific Workflows with Execution Environment Specifications

Haiyan Meng and Douglas Thain

In The 17th International Conference on Computational Science (ICCS), 2017

doi: 10.1016/j.procs.2017.05.116

@inproceedings{PAPER937,
  author = {Meng, Haiyan and Thain, Douglas},
  title = {{Facilitating the Reproducibility of Scientific Workflows with Execution Environment Specifications}},
  booktitle = {{The 17th International Conference on Computational Science (ICCS)}},
  year = {2017},
  note = {{doi: 10.1016/j.procs.2017.05.116}},
  cclpaperid = {937},
  keywords = {makeflow, umbrella},
}

makeflow-mesos-ccgrid17.pdf.png

Deploying High Throughput Scientific Workflows on Container Schedulers with Makeflow and Mesos

Charles (Chao) Zheng, Ben Tovar, and Douglas Thain

In 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2017), 2017

doi: 10.1109/CCGRID.2017.9

@inproceedings{makeflow-mesos-ccgrid17.pdf,
  author = {Zheng, Charles (Chao) and Tovar, Ben and Thain, Douglas},
  title = {{Deploying High Throughput Scientific Workflows on Container Schedulers with Makeflow and Mesos}},
  booktitle = {{17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2017)}},
  year = {2017},
  note = {{doi: 10.1109/CCGRID.2017.9}},
  cclpaperid = {939},
  keywords = {makeflow},
}

Designing Self-Tuning Split-Map-Merge Applications for High Cost-Efficiency in the Cloud

Dinesh Rajan and Douglas Thain

IEEE Transactions on Cloud Computing, 2017

doi: 10.1109/TCC.2015.2415780

@article{tuning-tcc-2015,
  author = {Rajan, Dinesh and Thain, Douglas},
  title = {{Designing Self-Tuning Split-Map-Merge Applications for High Cost-Efficiency in the Cloud}},
  journal = {{IEEE Transactions on Cloud Computing}},
  volume = {5},
  number = {2},
  pages = {303-316},
  year = {2017},
  note = {{doi: 10.1109/TCC.2015.2415780}},
  cclpaperid = {909},
  keywords = {makeflow, workqueue, hecura},
}

Balancing push and pull in Confuga, an active storage cluster file system for scientific workflows

Patrick Donnelly and Douglas Thain

Concurrency and Computation: Practice and Experience, 2016

doi: 10.1002/cpe.3834

@article{ccpe-confuga,
  author = {Donnelly, Patrick and Thain, Douglas},
  title = {{Balancing push and pull in Confuga, an active storage cluster file system for scientific workflows}},
  journal = {{Concurrency and Computation: Practice and Experience}},
  volume = {29},
  number = {4},
  year = {2016},
  note = {{doi: 10.1002/cpe.3834}},
  cclpaperid = {929},
  keywords = {makeflow, chirp, confuga},
}

Data Locality Techniques in an Active Cluster Filesystem for Scientific Workflows

Patrick Donnelly

2016

@thesis{pdonnelly-thesis,
  author = {Donnelly, Patrick},
  title = {{Data Locality Techniques in an Active Cluster Filesystem for Scientific Workflows}},
  editor = {Thesis, Ph.D.},
  booktitle = {{University of Notre Dame}},
  year = {2016},
  cclpaperid = {928},
  keywords = {makeflow, chirp, confuga},
}

Scaling Up Bioinformatics Workflows with Dynamic Job Expansion

Nicholas Hazekamp, Joseph Sarro, Olivia Choudhury, Sandra Gesing, Scott Emrich, and Douglas Thain

In IEEE International Conference on e-Science, 2015

@inproceedings{scaling-escience-2015,
  author = {Hazekamp, Nicholas and Sarro, Joseph and Choudhury, Olivia and Gesing, Sandra and Emrich, Scott and Thain, Douglas},
  title = {{Scaling Up Bioinformatics Workflows with Dynamic Job Expansion}},
  booktitle = {{IEEE International Conference on e-Science}},
  year = {2015},
  cclpaperid = {920},
  keywords = {makeflow},
}

Integrating Containers into Workflows: A Case Study Using Makeflow, Work Queue, and Docker

Charles (Chao) Zheng and Douglas Thain

In Workshop on Virtualization Technologies in Distributed Computing (VTDC), 2015

doi: 10.1145/2755979.2755984

@inproceedings{wq-docker-vtdc15,
  author = {Zheng, Charles (Chao) and Thain, Douglas},
  title = {{Integrating Containers into Workflows: A Case Study Using Makeflow, Work Queue, and Docker}},
  booktitle = {{Workshop on Virtualization Technologies in Distributed Computing (VTDC)}},
  year = {2015},
  note = {{doi: 10.1145/2755979.2755984}},
  cclpaperid = {910},
  keywords = {makeflow, workqueue},
}

Confuga: Scalable Data Intensive Computing for POSIX Workflows

Patrick Donnelly, Nicholas Hazekamp, and Douglas Thain

In IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2015

doi: 10.1109/CCGrid.2015.95

@inproceedings{confuga-ccgrid2015,
  author = {Donnelly, Patrick and Hazekamp, Nicholas and Thain, Douglas},
  title = {{Confuga: Scalable Data Intensive Computing for POSIX Workflows}},
  booktitle = {{IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing}},
  pages = {392-401},
  year = {2015},
  note = {{doi: 10.1109/CCGrid.2015.95}},
  cclpaperid = {908},
  keywords = {makeflow, chirp, confuga},
}

Accelerating Comparative Genomics Workflows in a Distributed Environment with Optimized Data Partitioning

Olivia Choudhury, Nicholas L. Hazekamp, Douglas Thain, and Scott Emrich

In C4BIO Workshop at IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2014

@inproceedings{bio-partition-c4bio-grid14,
  author = {Choudhury, Olivia and Hazekamp, Nicholas L. and Thain, Douglas and Emrich, Scott},
  title = {{Accelerating Comparative Genomics Workflows in a Distributed Environment with Optimized Data Partitioning}},
  booktitle = {{C4BIO Workshop at IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)}},
  year = {2014},
  cclpaperid = {903},
  keywords = {makeflow, workqueue},
}

Poster: Expanding Tasks of Logical Workflows into Independent Workflows for Improved Scalability

Nicholas Hazekamp, Olivia Choudhury, Sandra Gesing, Scott Emrich, and Douglas Thain

In IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2014

doi: 10.1109/CCGrid.2014.84

@inproceedings{workflow-expand-grid14,
  author = {Hazekamp, Nicholas and Choudhury, Olivia and Gesing, Sandra and Emrich, Scott and Thain, Douglas},
  title = {{Poster: Expanding Tasks of Logical Workflows into Independent Workflows for Improved Scalability}},
  booktitle = {{IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing}},
  pages = {548-549},
  year = {2014},
  note = {{doi: 10.1109/CCGrid.2014.84}},
  cclpaperid = {901},
  keywords = {makeflow},
}

Automated Packaging of Bioinformatics Workflows for Portability and Durability Using Makeflow

Casey Robinson and Douglas Thain

In Workshop on Workflows in Support of Large-Scale Science (WORKS), 2013

doi: 10.1145/2534248.2534258

@inproceedings{automated-packaging-works13,
  author = {Robinson, Casey and Thain, Douglas},
  title = {{Automated Packaging of Bioinformatics Workflows for Portability and Durability Using Makeflow}},
  booktitle = {{Workshop on Workflows in Support of Large-Scale Science (WORKS)}},
  year = {2013},
  note = {{doi: 10.1145/2534248.2534258}},
  cclpaperid = {899},
  keywords = {makeflow},
}

A Compiler Toolchain For Data Intensive Scientific Workflows

Peter Bui

2012

@thesis{pbui-dissertation.pdf,
  author = {Bui, Peter},
  title = {{A Compiler Toolchain For Data Intensive Scientific Workflows}},
  editor = {Thesis, Ph.D.},
  booktitle = {{University of Notre Dame}},
  year = {2012},
  cclpaperid = {889},
  keywords = {makeflow, hecura}
}

Makeflow: A Portable Abstraction for Data Intensive Computing on Clusters, Clouds, and Grids

Michael Albrecht, Patrick Donnelly, Peter Bui, and Douglas Thain

In Workshop on Scalable Workflow Enactment Engines and Technologies (SWEET) at ACM SIGMOD, 2012

doi: 10.1145/2443416.2443417

@inproceedings{makeflow-sweet12,
  author = {Albrecht, Michael and Donnelly, Patrick and Bui, Peter and Thain, Douglas},
  title = {{Makeflow: A Portable Abstraction for Data Intensive Computing on Clusters, Clouds, and Grids}},
  booktitle = {{Workshop on Scalable Workflow Enactment Engines and Technologies (SWEET) at ACM SIGMOD}},
  year = {2012},
  note = {{doi: 10.1145/2443416.2443417}},
  cclpaperid = {104},
  keywords = {makeflow, hecura}
}

Biocompute 2.0: An Improved Collaborative Workspace for Data Intensive Bio-Science.

Rory Carmichael, Patrick Braga-Henebry, Douglas Thain, and Scott Emrich

Concurrency and Computation: Practice and Experience, 2011

doi: 10.1002/cpe.1782

@article{biocompute-ccpe,
  author = {Carmichael, Rory and Braga-Henebry, Patrick and Thain, Douglas and Emrich, Scott},
  title = {{Biocompute 2.0: An Improved Collaborative Workspace for Data Intensive Bio-Science.}},
  journal = {{Concurrency and Computation: Practice and Experience}},
  volume = {23},
  number = {17},
  pages = {2305-2314},
  year = {2011},
  note = {{doi: 10.1002/cpe.1782}},
  cclpaperid = {96},
  keywords = {makeflow},
}

Scripting distributed scientific workflows using Weaver

Peter Bui, Li Yu, Andrew Thrasher, Rory Carmichael, Irena Lanc, Patrick Donnelly, and Douglas Thain

Concurrency and Computation: Practice and Experience, 2011

doi: 10.1002/cpe.1871

@article{weaver-ccpe,
  author = {Bui, Peter and Yu, Li and Thrasher, Andrew and Carmichael, Rory and Lanc, Irena and Donnelly, Patrick and Thain, Douglas},
  title = {{Scripting distributed scientific workflows using Weaver}},
  journal = {{Concurrency and Computation: Practice and Experience}},
  volume = {24},
  number = {15},
  year = {2011},
  note = {{doi: 10.1002/cpe.1871}},
  cclpaperid = {98},
  keywords = {makeflow},
}

Taming Complex Bioinformatics Workflows with Weaver, Makeflow, and Starch

Andrew Thrasher, Rory Carmichael, Peter Bui, Li Yu, Douglas Thain, and Scott Emrich

In Workshop on Workflows in Support of Large Scale Science, 2010

doi: 10.1109/WORKS.2010.5671858

@inproceedings{taming-works10.pdf,
  author = {Thrasher, Andrew and Carmichael, Rory and Bui, Peter and Yu, Li and Thain, Douglas and Emrich, Scott},
  title = {{Taming Complex Bioinformatics Workflows with Weaver, Makeflow, and Starch}},
  booktitle = {{Workshop on Workflows in Support of Large Scale Science}},
  pages = {1-6},
  year = {2010},
  note = {{doi: 10.1109/WORKS.2010.5671858}},
  cclpaperid = {92},
  keywords = {makeflow}
}

Harnessing Parallelism in Multicore Clusters with the All-Pairs, Wavefront, and Makeflow Abstractions

Li Yu, Christopher Moretti, Andrew Thrasher, Scott Emrich, Kenneth Judd, and Douglas Thain

Journal of Cluster Computing, 2010

doi: 10.1007/s10586-010-0134-7

@article{abstr-jcc,
  author = {Yu, Li and Moretti, Christopher and Thrasher, Andrew and Emrich, Scott and Judd, Kenneth and Thain, Douglas},
  title = {{Harnessing Parallelism in Multicore Clusters with the All-Pairs, Wavefront, and Makeflow Abstractions}},
  journal = {{Journal of Cluster Computing}},
  volume = {13},
  number = {3},
  pages = {243-256},
  year = {2010},
  note = {{doi: 10.1007/s10586-010-0134-7}},
  cclpaperid = {83},
  keywords = {makeflow, workqueue, allpairs, wavefront, hecura},
}