TaskVine is a task scheduler for building large scale data intensive dynamic workflows that run on HPC clusters, GPU clusters, and commercial clouds.
As tasks access external data sources and produce their own outputs, more and more data is pulled into local storage on workers. This data is used to accelerate future tasks and avoid re-computing exisiting results. Data gradually grows "like a vine" through the cluster.
TaskVine is our third-generation workflow system, built on our twenty years of experience creating scalable applications in fields such as high energy physics, bioinformatics, molecular dynamics, and machine learning.
This work was supported in part by grant OAC #1931348 "CSSI Elements: Data Swarm: A User-Level Framework for Data Intensive Scientific Computing".
Publications
(Showing papers with tag taskvine. See all papers instead.)
|
Thanh Son Phung, Colin Thomas, Logan Ward, Kyle Chard, Douglas Thain, Accelerating Function-Centric Applications by Discovering, Distributing, and Retaining Reusable Context in Workflow Systems, ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), June, 2024. DOI: 10.1145/3625549.3658663
|
|
Barry Sly-Delgado, Thanh Son Phung, Colin Thomas, David Simonetti, Andrew Hennessee, Ben Tovar, Douglas Thain, TaskVine: Managing In-Cluster Storage for High-Throughput Data Intensive Workflows, 18th Workshop on Workflows in Support of Large-Scale Science, November, 2023. DOI: 10.1145/3624062.3624277
|
|