TaskVine: A User Level Framework for Data Intensive Scientific Applications (CSSI Element)

TaskVine is open source software for building large scale data intensive dynamic workflows that run on HPC clusters, GPU clusters, and commercial clouds. As tasks access external data sources and produce their own outputs, more and more data is pulled into local storage on workers. This data is used to accelerate future tasks and avoid re-computing exisiting results. Data gradually grows “like a vine” through the cluster. By using a variety of pioneering techniques such as distant futures, distributed provenance, and graph pruning, TaskVine is able to ourperform traditional distributed computing techniques. It has been used to build large scale applications in scientific fields such as high energy physics, bioinformatics, molecular dynamics, and machine learning. We continue to develop new techniques within TaskVine, and extend it use to new applications.

Related Publications

Scaling Up Throughput-oriented LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management

Thanh Son Phung and Douglas Thain

2025

Bib PDF

@misc{phung2025scalingthroughputorientedllminference,
  title = {Scaling Up Throughput-oriented LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management},
  author = {Phung, Thanh Son and Thain, Douglas},
  year = {2025},
  eprint = {2509.13201},
  archiveprefix = {arXiv},
  primaryclass = {cs.DC},
  url = {https://arxiv.org/abs/2509.13201},
  keywords = {taskvine, llm, gpu}
}

Reshaping Analysis for Fast Turnaround: Leveraging Concurrency to Reduce Latency in Late-Stage LHC Analysis Workflows

Kevin Lannon, Connor Moore, Barry Sly-Delgado, Benjamin Tovar, Austin Townsend, and Jin Zhou

In 27th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2024), 2025

Bib PDF

@inproceedings{reshaping-chep-2025,
  author = {Lannon, Kevin and Moore, Connor and Sly-Delgado, Barry and Tovar, Benjamin and Townsend, Austin and Zhou, Jin},
  title = {Reshaping Analysis for Fast Turnaround: Leveraging Concurrency to Reduce Latency in Late-Stage LHC Analysis Workflows},
  booktitle = {{27th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2024)}},
  year = {2025},
  keywords = {taskvine, hep},
}

Reshaping High Energy Physics Applications for Near-Interactive Execution Using TaskVine

Barry Sly-Delgado, Ben Tovar, Jin Zhou, and Douglas Thain

In ACM/IEEE Supercomputing, 2024

DOI Bib PDF

@inproceedings{reshaping-sc-2024,
  author = {Sly-Delgado, Barry and Tovar, Ben and Zhou, Jin and Thain, Douglas},
  title = {{Reshaping High Energy Physics Applications for Near-Interactive Execution Using TaskVine}},
  booktitle = {{ACM/IEEE Supercomputing}},
  pages = {1-11},
  year = {2024},
  cclpaperid = {996},
  keywords = {taskvine, hep},
  doi = {10.1109/SC41406.2024.00068}
}

Accelerating Function-Centric Applications by Discovering, Distributing, and Retaining Reusable Context in Workflow Systems

Thanh Son Phung, Colin Thomas, Logan Ward, Kyle Chard, and Douglas Thain

In ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2024

Bib PDF

@inproceedings{function-context-hpdc-2024,
  author = {Phung, Thanh Son and Thomas, Colin and Ward, Logan and Chard, Kyle and Thain, Douglas},
  title = {{Accelerating Function-Centric Applications by Discovering, Distributing, and Retaining Reusable Context in Workflow Systems}},
  booktitle = {{ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC)}},
  year = {2024},
  cclpaperid = {995},
  keywords = {taskvine},
}

Poster: Leveraging Intermediate Data Management with Parsl/TaskVine

Colin Thomas and Douglas Thain

In Greater Chicago Area Systems Research Workshop, 2024

Bib PDF

@inproceedings{data-gcasr-2024,
  author = {Thomas, Colin and Thain, Douglas},
  title = {{Poster: Leveraging Intermediate Data Management with Parsl/TaskVine}},
  booktitle = {{Greater Chicago Area Systems Research Workshop}},
  pages = {1},
  year = {2024},
  cclpaperid = {998},
  keywords = {taskvine},
}

Poster: Adaptive Task-Oriented Resource Allocation for Large Dynamic Workflows on Opportunistic Resources

Thanh Son Phùng and Douglas Thain

In Greater Chicago Area Systems Research Workshop, 2024

Bib PDF

@inproceedings{alloc-gcasr-2024,
  author = {Phùng, Thanh Son and Thain, Douglas},
  title = {{Poster: Adaptive Task-Oriented Resource Allocation for Large Dynamic Workflows on Opportunistic Resources}},
  booktitle = {{Greater Chicago Area Systems Research Workshop}},
  pages = {1},
  year = {2024},
  cclpaperid = {1000},
  keywords = {taskvine},
}

Poster: Reshaping High Energy Physics Applications for Near-Interactive Execution Using TaskVine

Barry Sly-Delgado and Douglas Thain

In Greater Chicago Area Systems Research Workshop, 2024

Bib PDF

@inproceedings{reshaping-gcasr-2024,
  author = {Sly-Delgado, Barry and Thain, Douglas},
  title = {{Poster: Reshaping High Energy Physics Applications for Near-Interactive Execution Using TaskVine}},
  booktitle = {{Greater Chicago Area Systems Research Workshop}},
  pages = {1},
  year = {2024},
  cclpaperid = {1002},
  keywords = {taskvine},
}

Maximizing Data Utility for HPC Python Workflow Execution

Thanh Son Phung, Ben Clifford, Kyle Chard, and Douglas Thain

In SC23 Workshop: High Performance Python for Science at Scale (HPPSS), 2023

Bib PDF

@inproceedings{utility-hppss-2023,
  author = {Phung, Thanh Son and Clifford, Ben and Chard, Kyle and Thain, Douglas},
  title = {{Maximizing Data Utility for HPC Python Workflow Execution}},
  booktitle = {{SC23 Workshop: High Performance Python for Science at Scale (HPPSS)}},
  year = {2023},
  cclpaperid = {990},
  keywords = {taskvine},
}

TaskVine: Managing In-Cluster Storage for High-Throughput Data Intensive Workflows

Barry Sly-Delgado, Thanh Son Phung, Colin Thomas, David Simonetti, Andrew Hennessee, Ben Tovar, and Douglas Thain

In 18th Workshop on Workflows in Support of Large-Scale Science, 2023

Bib PDF

@inproceedings{taskvine-works-2023,
  author = {Sly-Delgado, Barry and Phung, Thanh Son and Thomas, Colin and Simonetti, David and Hennessee, Andrew and Tovar, Ben and Thain, Douglas},
  title = {{TaskVine: Managing In-Cluster Storage for High-Throughput Data Intensive Workflows}},
  booktitle = {{ 18th Workshop on Workflows in Support of Large-Scale Science}},
  year = {2023},
  cclpaperid = {991},
  keywords = {taskvine},
}

Poster: Minimizing Data Movement Using Distant Futures

Barry Sly-Delgado and Douglas Thain

In ACM/IEEE Supercomputing, 2023

Bib PDF

@inproceedings{futures-sc-2023,
  author = {Sly-Delgado, Barry and Thain, Douglas},
  title = {{Poster: Minimizing Data Movement Using Distant Futures}},
  booktitle = {{ACM/IEEE Supercomputing}},
  year = {2023},
  cclpaperid = {993},
  keywords = {taskvine},
}

Poster: TaskVine: A User-Level Framework for Data Intensive Scientific Applications

Douglas Thain

In CSSI PI Meeting, 2023

Bib PDF

@inproceedings{taskvine-cssi-2023,
  author = {Thain, Douglas},
  title = {{Poster: TaskVine: A User-Level Framework for Data Intensive Scientific Applications}},
  booktitle = {{CSSI PI Meeting}},
  year = {2023},
  cclpaperid = {989},
  keywords = {taskvine},
}

Poster: Mixed Modality Workflows in TaskVine

David Simonetti, Ben Tovar, and Douglas Thain

In ACM High Performance Distributed Computing, 2023

Bib PDF

@inproceedings{mixed-hpdc-2023,
  author = {Simonetti, David and Tovar, Ben and Thain, Douglas},
  title = {{Poster: Mixed Modality Workflows in TaskVine}},
  booktitle = {{ACM High Performance Distributed Computing}},
  pages = {331-332},
  year = {2023},
  cclpaperid = {988},
  keywords = {taskvine},
}