Makeflow logo

Makeflow is a workflow system for executing large complex workflows on clusters, clouds, and grids.

Makeflow is easy to use. The Makeflow language is similar to traditional Make, so if you can write a Makefile, then you can write a Makeflow. A workflow can be just a few commands chained together, or it can be a complex application consisting of thousands of tasks. It can have an arbitrary DAG structure and is not limited to specific patterns.

Makeflow is production-ready. Makeflow is used on a daily basis to execute complex scientific applications in fields such as data mining, high energy physics, image processing, and bioinformatics. It has run on campus clusters, the Open Science Grid, NSF XSEDE machines, NCSA Blue Waters, and Amazon Web Services. Here are some real examples of workflows used in production systems:

Makeflow is portable. A workflow is written in a technology neutral way, and then can be deployed to a variety of different systems without modification, including local execution on a single multicore machine, public cloud services such as Amazon EC2 and Amazon Lambda, batch systems like HTCondor, SGE, PBS, Torque, SLURM, or the bundled Work Queue system. Makeflow can also easily run your jobs in a container environment like Docker or Singularity on top of an existing batch system. The same specification works for all systems, so you can easily move your application from one system to another without rewriting everything.

Makeflow is powerful. Makeflow can handle workloads of millions of jobs running on thousands of machines for months at a time. Makeflow is highly fault tolerant: it can crash or be killed, and upon resuming, will reconnect to running jobs and continue where it left off. A variety of analysis tools are available to understand the performance of your jobs, measure the progress of a workflow, and visualize what is going on.

Video Introduction to Workflows

Related Publications

  1. Flexible Partitioning of Scientific Workflows Using the JX Workflow Language
    Tim Shaffer, Nathaniel Kremer-Herman, and Douglas Thain
    In Practice and Experience in Advanced Research Computing (PEARC), 2019
    doi: 10.1145/3332186.3338100
  2. Reduction of Workflow Resource Consumption Using a Density-based Clustering Model
    Qimin Zhang, Ben Tovar, Nate Kremer-Herman, and Douglas Thain
    In WORKS Workshop at Supercomputing, 2018
  3. An Algebra for Robust Workflow Transformations
    Nicholas Hazekamp and Douglas Thain
    In IEEE International Conference on e-Science, 2018
    doi: 10.1109/eScience.2018.00031
  4. Poster: A First Look at the JX Workflow Language
    Tim Shaffer, Kyle M.D. Sweeney, Nathaniel Kremer-Herman, and Douglas Thain
    In IEEE International Conference on e-Science, 2018
    doi: 10.1109/eScience.2018.00094
  5. Early Experience Using Amazon Batch for Scientific Workflows
    Kyle Sweeney and Douglas Thain
    In ScienceCloud Workshop at HPDC , 2018
    doi: 10.1145/3217880.3217885
  6. Efficient Integration of Containers into Scientific Workflows
    Kyle Sweeney and Douglas Thain
    In Science Cloud Workshop at HPDC, 2018
    doi: 10.1145/3217880.3217887
  7. Combining Static and Dynamic Storage Management for Data Intensive Scientific Workflows
    Nicholas Hazekamp, Nathaniel Kremer-Herman, Benjamin Tovar, Haiyan Meng, Olivia Choudhury, Scott Emrich, and Douglas Thain
    IEEE Transactions on Parallel and Distributed Systems, 2018
    doi: 10.1109/TPDS.2017.2764897
  8. Facilitating the Reproducibility of Scientific Workflows with Execution Environment Specifications
    Haiyan Meng and Douglas Thain
    In The 17th International Conference on Computational Science (ICCS), 2017
    doi: 10.1016/j.procs.2017.05.116
  9. Deploying High Throughput Scientific Workflows on Container Schedulers with Makeflow and Mesos
    Charles (Chao) Zheng, Ben Tovar, and Douglas Thain
    In 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2017), 2017
    doi: 10.1109/CCGRID.2017.9
  10. Designing Self-Tuning Split-Map-Merge Applications for High Cost-Efficiency in the Cloud
    Dinesh Rajan and Douglas Thain
    IEEE Transactions on Cloud Computing, 2017
    doi: 10.1109/TCC.2015.2415780
  11. Balancing push and pull in Confuga, an active storage cluster file system for scientific workflows
    Patrick Donnelly and Douglas Thain
    Concurrency and Computation: Practice and Experience, 2016
    doi: 10.1002/cpe.3834
  12. Data Locality Techniques in an Active Cluster Filesystem for Scientific Workflows
    Patrick Donnelly
    2016
  13. Scaling Up Bioinformatics Workflows with Dynamic Job Expansion
    Nicholas Hazekamp, Joseph Sarro, Olivia Choudhury, Sandra Gesing, Scott Emrich, and Douglas Thain
    In IEEE International Conference on e-Science, 2015
  14. Integrating Containers into Workflows: A Case Study Using Makeflow, Work Queue, and Docker
    Charles (Chao) Zheng and Douglas Thain
    In Workshop on Virtualization Technologies in Distributed Computing (VTDC), 2015
    doi: 10.1145/2755979.2755984
  15. Confuga: Scalable Data Intensive Computing for POSIX Workflows
    Patrick Donnelly, Nicholas Hazekamp, and Douglas Thain
    In IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2015
    doi: 10.1109/CCGrid.2015.95
  16. Accelerating Comparative Genomics Workflows in a Distributed Environment with Optimized Data Partitioning
    Olivia Choudhury, Nicholas L. Hazekamp, Douglas Thain, and Scott Emrich
    In C4BIO Workshop at IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2014
  17. Poster: Expanding Tasks of Logical Workflows into Independent Workflows for Improved Scalability
    Nicholas Hazekamp, Olivia Choudhury, Sandra Gesing, Scott Emrich, and Douglas Thain
    In IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2014
    doi: 10.1109/CCGrid.2014.84
  18. Automated Packaging of Bioinformatics Workflows for Portability and Durability Using Makeflow
    Casey Robinson and Douglas Thain
    In Workshop on Workflows in Support of Large-Scale Science (WORKS), 2013
    doi: 10.1145/2534248.2534258
  19. A Compiler Toolchain For Data Intensive Scientific Workflows
    Peter Bui
    2012
  20. Makeflow: A Portable Abstraction for Data Intensive Computing on Clusters, Clouds, and Grids
    Michael Albrecht, Patrick Donnelly, Peter Bui, and Douglas Thain
    In Workshop on Scalable Workflow Enactment Engines and Technologies (SWEET) at ACM SIGMOD, 2012
    doi: 10.1145/2443416.2443417
  21. Biocompute 2.0: An Improved Collaborative Workspace for Data Intensive Bio-Science.
    Rory Carmichael, Patrick Braga-Henebry, Douglas Thain, and Scott Emrich
    Concurrency and Computation: Practice and Experience, 2011
    doi: 10.1002/cpe.1782
  22. Scripting distributed scientific workflows using Weaver
    Peter Bui, Li Yu, Andrew Thrasher, Rory Carmichael, Irena Lanc, Patrick Donnelly, and Douglas Thain
    Concurrency and Computation: Practice and Experience, 2011
    doi: 10.1002/cpe.1871
  23. Taming Complex Bioinformatics Workflows with Weaver, Makeflow, and Starch
    Andrew Thrasher, Rory Carmichael, Peter Bui, Li Yu, Douglas Thain, and Scott Emrich
    In Workshop on Workflows in Support of Large Scale Science, 2010
    doi: 10.1109/WORKS.2010.5671858
  24. Harnessing Parallelism in Multicore Clusters with the All-Pairs, Wavefront, and Makeflow Abstractions
    Li Yu, Christopher Moretti, Andrew Thrasher, Scott Emrich, Kenneth Judd, and Douglas Thain
    Journal of Cluster Computing, 2010
    doi: 10.1007/s10586-010-0134-7