CCL | Software | Install | Manuals | Forum | Papers
CCL Home

Research

Software Community Operations
CCTools Logo
TaskVine Logo

TaskVine is our third-generation workflow system for building scalable data intensive applications that run on HPC clusters, cloud services, and other clusters. A TaskVine application consists of a large number of dynamically generated tasks that draw in external data into the cluster, keeping common results cached and shared among nodes, resulting in data growing "like a vine" through the cluster. (more)

Work Queue Logo

Work Queue is an application framework for creating and managing dynamic manager-worker style programs that scale up to tens of thousands of machines on clusters, clouds, and grids. Work Queue has many advanced features for resource management, reliability, data management, and schedule. Applications are easy to write using Python libraries. Work Queue is used around the world to design dynamic scientific applications. (more)

Makeflow Logo

Makeflow is a workflow system for parallel and distributed computing. Create massively parallel programs by joining together existing programs into large graphs. Use the classic Make-like syntax to get started, or use the JX syntax to programmatically generate large graphs. Execute workflows on your laptop or on large cluster using HTCondor, UGE, SLURM, and other systems. (more)

JX

JX (JSON Expressions) is an expression language for unstructured data. Adding to the standard JSON data description language, it provides operators, variables, functions, list comprehensions, and other conveniences to generate and query complex documents. JX is used throughout the CCTools to describe and query data. (more)

Resource Monitor Logo
The Resource Monitor (RM) is used to accurately capture the resource consumption (CPU, RAM, I/O, Disk, GPU, etc) of applications running in distributed systems. Production applications are typically not single processes, but complex assemblies of scripts, libraries, and processes written in multiple languages. The resource monitor tracks all components accurately and provides the enforcement needed to execute applications reliable at scale. (more)
Parrot Logo
Parrot is a transparent user-level virtual filesystem that allows any ordinary program to be attached to many different remote storage services. Parrot captures the system calls (open, read, write, stat, etc) of an application through the ptrace interface, and redirects them to remote services such as HDFS, iRODS, Chirp, and FTP. This allows one to construct custom distributed filesystems on clusters without requiring special privileges. (more)
Chirp Logo
Chirp is a personal user-level distributed filesystem that can be used to export existing data into distributed systems. Chirp enables unprivileged users to share space securely, efficiently, and conveniently. When combined with Parrot, Chirp allows users to create custom wide-area distributed filesystems that span high performance computing clusters. (more)

Research Prototypes

We have also developed a number of research software prototypes that are released as open source:
  • PRUNE - The Preserving Run Environment for reproducible computing.
  • Umbrella - A configuration language for generating reproducible execution environments.
  • Confuga - An active storage cluster filesystem for scientific workflows.
  • AWE - The Accelerated Weighted Ensemble is a system for large scale molecular dynamics.
  • SAND -The Scalable Assembler at Notre Dame (SAND) is a set of modules that augment the Celera genome assembler.
  • AllPairs - A specialized framework for running massive-scale pairwise comparisons found in machine learning, genomics, biometrics.
  • Wavefront - A specialized framework for running very large dynamic programming problems found in game theory, economics, and bioinformatics.
  • FTSH - The Fault Tolerant Shell
  • AllocFS - A kernel-level filesystem with hierarchical space allocation.
  • SubID - A user-level implementation of hierarchical user identities.