TaskVine: A User Level Framework for Data Intensive Scientific Applications (CSSI Element)

PI: Douglas Thain

TaskVine: A User Level Framework for Data Intensive Scientific Applications (CSSI Element) image

TaskVine is open source software for building large scale data intensive dynamic workflows that run on HPC clusters, GPU clusters, and commercial clouds. As tasks access external data sources and produce their own outputs, more and more data is pulled into local storage on workers. This data is used to accelerate future tasks and avoid re-computing exisiting results. Data gradually grows “like a vine” through the cluster. By using a variety of pioneering techniques such as distant futures, distributed provenance, and graph pruning, TaskVine is able to ourperform traditional distributed computing techniques. It has been used to build large scale applications in scientific fields such as high energy physics, bioinformatics, molecular dynamics, and machine learning. We continue to develop new techniques within TaskVine, and extend it use to new applications.

Related Publications

  1. Reshaping High Energy Physics Applications for Near-Interactive Execution Using TaskVine
    Barry Sly-Delgado, Ben Tovar, Jin Zhou, and Douglas Thain
    In ACM/IEEE Supercomputing, 2024
  2. Accelerating Function-Centric Applications by Discovering, Distributing, and Retaining Reusable Context in Workflow Systems
    Thanh Son Phung, Colin Thomas, Logan Ward, Kyle Chard, and Douglas Thain
    In ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2024
  3. Poster: Leveraging Intermediate Data Management with Parsl/TaskVine
    Colin Thomas and Douglas Thain
    In Greater Chicago Area Systems Research Workshop, 2024
  4. Poster: Adaptive Task-Oriented Resource Allocation for Large Dynamic Workflows on Opportunistic Resources
    Thanh Son Phùng and Douglas Thain
    In Greater Chicago Area Systems Research Workshop, 2024
  5. Poster: Reshaping High Energy Physics Applications for Near-Interactive Execution Using TaskVine
    Barry Sly-Delgado and Douglas Thain
    In Greater Chicago Area Systems Research Workshop, 2024
  6. Maximizing Data Utility for HPC Python Workflow Execution
    Thanh Son Phung, Ben Clifford, Kyle Chard, and Douglas Thain
    In SC23 Workshop: High Performance Python for Science at Scale (HPPSS), 2023
  7. TaskVine: Managing In-Cluster Storage for High-Throughput Data Intensive Workflows
    Barry Sly-Delgado, Thanh Son Phung, Colin Thomas, David Simonetti, Andrew Hennessee, Ben Tovar, and Douglas Thain
    In 18th Workshop on Workflows in Support of Large-Scale Science, 2023
  8. Poster: Minimizing Data Movement Using Distant Futures
    Barry Sly-Delgado and Douglas Thain
    In ACM/IEEE Supercomputing, 2023
  9. Poster: TaskVine: A User-Level Framework for Data Intensive Scientific Applications
    Douglas Thain
    In CSSI PI Meeting, 2023
  10. Poster: Mixed Modality Workflows in TaskVine
    David Simonetti, Ben Tovar, and Douglas Thain
    In ACM High Performance Distributed Computing, 2023