Work Queue is a framework for building large distributed applications
that span thousands of machines drawn from clusters, clouds, and grids.
Work Queue applications are written in Python, Perl, or C using a simple API that allows users to
define tasks, submit them to the queue, and wait for completion.
Tasks are executed by a general worker process that can run on
any available machine. Each worker calls home to the manager process,
arranges for data transfer, and executes the tasks.
A wide variety of scheduling and resource management features are provided
to enable the efficient use of large fleets of multicore servers.
The system handles a wide variety of failures, allowing
for dynamically scalable and robust applications.
Work Queue has been used to write applications that
scale from a handful of workstations up to tens of thousands
of cores running on supercomputers.
Examples include
the Parsl workflow system,
the Coffea analysis framework, the
the Makeflow workflow engine,
SHADHO,
Lobster,
NanoReactors,
ForceBalance,
Accelerated Weighted Ensemble,
the SAND genome assembler,
and the All-Pairs and Wavefront abstractions.
The framework is easy to use, and has been used to teach courses in parallel computing,
cloud computing, distributed computing, and cyberinfrastructure at the University of Notre Dame,
the University of Arizona, the University of Wisconsin, and many other locations.
Anna Woodard, Matthias Wolf, Charles Nicholas Mueller, Ben Tovar, Patrick Donnelly, Kenyi Hurtado Anampa, Paul Brenner, Kevin Lannon, and Michael Hildreth, Exploiting Volatile Opportunistic Computing Resources with Lobster, Computing in High Energy Physics, January, 2015.
Dinesh Rajan, Andrew Thrasher, Badi Abdul-Wahid, Jesus A Izaguirre, Scott Emrich, and Douglas Thain, Case Studies in Designing Elastic Applications, 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), May, 2013. DOI: 0.1109/CCGrid.2013.46
Peter Bui, Dinesh Rajan, Badi Abdul-Wahid, Jesus Izaguirre, Douglas Thain, Work Queue + Python: A Framework For Scalable Scientific Ensemble Applications, Workshop on Python for High Performance and Scientific Computing (PyHPC) at the ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (Supercomputing) , November, 2011.
Douglas Thain and Christopher Moretti, Abstractions for Cloud Computing with Condor, Syed Ahson and Mohammad Ilyas, Cloud Computing and Software Services: Theory and Techniques, pages 153-171, CRC Press, July, 2010. ISBN: 9781439803158
Christopher Moretti, Michael Olson, Scott Emrich, and Douglas Thain, Scalable Modular Genome Assembly on Campus Grids, University of Notre Dame, Computer Science and Engineering Department, Technical Report 2009-04, July, 2009.