Confuga is an active storage cluster file system designed for executing DAG-structured scientific workflows. It is used as a collaborative distributed file system and as a platform for execution of scientific workflows with full data locality for all job dependencies.
Confuga is composed of a head node and multiple storage nodes. The head node acts as the metadata server and job scheduler for the cluster. Users interact with Confuga using the head node.
A Confuga cluster can be setup as an ordinary user or maintained as a dedicated service within the cluster. The head node and storage nodes run the Chirp file system service. Users may interact with Confuga using Chirp’s client toolset chirp(1), Parrotparrot_run(1), or FUSEchirp_fuse(1).
Confuga manages the details of scheduling and executing jobs for you. However, it does not concern itself with job ordering; it appears as a simple batch execution platform. We recommend using a high-level workflow execution system like Makeflow to manage your workflow and to handle the details of submitting jobs.
Confuga is designed to exploit the unique parameters and characteristics of POSIX scientific workflows. Jobs are single task POSIX applications that are expressed with all input files and all output files. Confuga uses this restricted job specification to achieve performance and to control load within the cluster.
Related Publications
Balancing push and pull in Confuga, an active storage cluster file system for scientific workflows
Patrick Donnelly and Douglas Thain
Concurrency and Computation: Practice and Experience, 2016
@article{ccpe-confuga,author={Donnelly, Patrick and Thain, Douglas},title={{Balancing push and pull in Confuga, an active storage cluster file system for scientific workflows}},journal={{Concurrency and Computation: Practice and Experience}},volume={29},number={4},year={2016},note={{doi: 10.1002/cpe.3834}},cclpaperid={929},keywords={makeflow, chirp, confuga},}
Data Locality Techniques in an Active Cluster Filesystem for Scientific Workflows
@thesis{pdonnelly-thesis,author={Donnelly, Patrick},title={{Data Locality Techniques in an Active Cluster Filesystem for Scientific Workflows}},editor={Thesis, Ph.D.},booktitle={{University of Notre Dame}},year={2016},cclpaperid={928},keywords={makeflow, chirp, confuga},}
Confuga: Scalable Data Intensive Computing for POSIX Workflows
Patrick Donnelly, Nicholas Hazekamp, and Douglas Thain
In IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2015
@inproceedings{confuga-ccgrid2015,author={Donnelly, Patrick and Hazekamp, Nicholas and Thain, Douglas},title={{Confuga: Scalable Data Intensive Computing for POSIX Workflows}},booktitle={{IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing}},pages={392-401},year={2015},note={{doi: 10.1109/CCGrid.2015.95}},cclpaperid={908},keywords={makeflow, chirp, confuga},}
Design of an Active Storage Cluster File System for DAG Workflows
Patrick Donnelly and Douglas Thain
In International Workshop on Data-Intensive Scalable Computing Systems, 2013
@inproceedings{confuga-discs2013,author={Donnelly, Patrick and Thain, Douglas},title={{Design of an Active Storage Cluster File System for DAG Workflows}},booktitle={{International Workshop on Data-Intensive Scalable Computing Systems}},pages={37-42},year={2013},note={{doi: 10.1145/2534645.2534656}},cclpaperid={900},keywords={chirp, confuga},}