TaskVine Insights: A Beginner's Map to the CCTools Codebase and a First Task Dispatch Patch
The easiest way to approach the CCTools repository is to start from the surface we actually use. With TaskVine, that surface is usually Python: import ndcctools.taskvine as vine, create a manager, submit work, and wait for results. Once that flow is clear, the repository layout and the C runtime become much easier to navigate.
That is the path we follow here: Python surface, repository map, then a small but real patch in taskvine/src/manager/vine_manager.c that prints which task was committed to which worker.
Start from the Python user view
The recent TaskVine Insights posts on Python task surfaces and DaskVine already show the first layer most of us meet: TaskVine Insights: Picking the Right Task Surface in Python and TaskVine Insights: DaskVine Executor for Practical Scientific Graphs. The key point is simple: the Python interface is intentionally small.
At the simplest level, the entry point looks like this:
import ndcctools.taskvine as vine
m = vine.Manager([9123, 9129])
print(f"Listening on port {m.port}")
From there, we usually choose one of four task surfaces:
-
Taskfor explicit command line tasks with explicit file edges. -
PythonTaskfor sending a Python callable and arguments. -
FunctionCallfor many small calls that benefit from a long lived worker side library. -
DaskVinefor driving TaskVine from a Dask graph while still using the same underlying scheduler.
That is the first useful mental model: the Python API is not separate from the runtime. It is the front door into the same TaskVine system.
Where TaskVine lives inside CCTools
Once we know the Python entry point, the next question is where that code lives. The answer is direct: start in cctools/taskvine. The whole repository is here: cooperative-computing-lab/cctools.
Here is the top level repository view on GitHub:
The repository is broad because CCTools is a toolkit repository, not a single application. For a new TaskVine reader, not every top level directory matters equally. A practical first pass is:
-
taskvine/is the main TaskVine implementation. -
doc/contains the manuals, including the official TaskVine manual at TaskVine User’s Manual. -
dttools/provides shared low level infrastructure used by TaskVine and other tools. -
resource_monitor/provides monitoring and resource measurement support that TaskVine uses directly. -
poncho/is relevant because TaskVine uses execution environments and packaged Python environments in real workflows. -
batch_job/matters because it is part of the broader execution and backend support layer in CCTools.
work_queue/ is worth placing correctly too. It is historically relevant as an older predecessor in the same manager-worker family, but it is not the best starting point if the goal is to understand current TaskVine behavior. For that, begin with taskvine/.
The Python layer inside cctools/taskvine
Inside taskvine/, the Python layer lives primarily in:
taskvine/src/bindings/python3/taskvine/src/bindings/python3/ndcctools/taskvine/
This is where we find the Python side of:
ManagerTaskPythonTaskFunctionCallDaskVine
For a first read, a good path is:
- Read the TaskVine manual and a small Python example.
- Read the Python API layer in
taskvine/src/bindings/python3/ndcctools/taskvine/. - Only then step down into the C manager and worker runtime.
That order is much friendlier than jumping straight into a large C file and trying to reverse engineer the programming model from internal state transitions.
A practical map of taskvine/
For day to day development, two directories matter most:
-
taskvine/src/contains the implementation. -
taskvine/test/contains focused regression and usage tests.
Inside taskvine/src/, the three most useful entries for a first pass are:
bindings/manager/worker/
taskvine/src/bindings/
This is where the user facing interfaces are assembled. In practice, the Python path under taskvine/src/bindings/python3/ndcctools/taskvine/ is the one we usually read first. A few files are especially useful:
-
manager.pyis a natural place to look when we want to understand the PythonManagersurface. -
task.pyshows how Python task objects are represented and exposed. -
file.pyhelps us understand the Python side of TaskVine file objects. -
futures.pyis useful when we want to see how the futures style interface is layered on top of the runtime. -
dask_executor.pyshows the modern DaskVine execution path. -
compat/dask_executor.pyshows the compatibility path for older Dask behavior. -
__init__.pyis a good quick check for what the package exports at the top level.
taskvine/src/manager/
This is the core manager side runtime and the most important C directory for understanding scheduling and dispatch. A few files are especially worth naming:
-
vine_manager.cis the large control flow file where we can see dispatch, completion handling, worker interactions, and runtime state updates. -
vine_schedule.cis the right place to look when we want to understand how tasks are matched to workers. -
vine_task.chelps explain how task state and task side bookkeeping are implemented in C. -
vine_worker_info.cshows how the manager tracks workers and their metadata. -
vine_file.cmatters when we want to understand declared files, cached files, and the manager side file abstractions. -
vine_txn_log.candvine_perf_log.care useful when we want to connect runtime behavior to logging and postmortem analysis.
taskvine/src/worker/
This is the worker side runtime. It matters less than manager/ for the specific patch in this post, but it becomes important as soon as we want to understand sandbox execution and what happens after a task arrives at a worker. Good entry points include:
-
vine_worker.cis the main worker program. -
vine_process.cis useful when we want to understand task process startup and execution. -
vine_sandbox.chelps explain task sandbox setup. -
vine_transfer.candvine_transfer_server.cmatter for understanding how data moves between manager and worker. -
vine_cache.candvine_cache_file.care useful when we want to understand worker side file caching behavior.
What taskvine/test/ is for
taskvine/test/ becomes important as soon as we change behavior. It contains focused tests and small usage programs, many paired with shell drivers. A few examples make the pattern clear:
-
vine_python.pyandTR_vine_python.shcover basic Python manager behavior. -
vine_python_task.pyandTR_vine_python_task.shfocus on Python task execution. -
vine_python_future_funcall.pyand related future tests exercise higher level Python interfaces. -
vine_python_serverless.pyandvine_python_serverless_failure.pycover the serverless and function call paths. -
vine_python_tag.pyis especially relevant when we think about logging task tags, because it exercises tagged tasks directly.
When we run make test from the top level cctools directory, the repository test driver walks into each package/test directory and executes the executable TR_* scripts it finds there. In other words, taskvine/test/ is part of the normal repository level validation path.
The runtime path under the Python layer
One level below the Python entry points, the architecture becomes simple. A TaskVine application needs a manager and one or more workers. The Python code drives the manager, the manager decides what to run and where, and the workers execute tasks and return results.
That core runtime lives in:
taskvine/src/manager/taskvine/src/worker/
For the specific question in this post, the interesting action is on the manager side. The main manager implementation lives in taskvine/src/manager/, which builds libtaskvine.a and is also the part linked by the Python binding.
Within that directory, taskvine/src/manager/vine_manager.c is one of the central files because it contains much of the manager control flow that users feel at runtime:
- worker connection handling
- task dispatch and commit
- task completion handling
- failure and recovery paths
- runtime bookkeeping and statistics
That makes vine_manager.c a sensible first C file once we understand the Python surface and want to see what happens after m.submit(...).
How a user actually runs TaskVine
The codebase is easier to navigate if we keep the runtime loop in mind. A normal user does something like this:
- Write a manager program.
- Run it so that it listens on a port.
- Start one or more
vine_workerprocesses. - Submit tasks and wait for results.
A tiny example is enough to anchor that flow:
import ndcctools.taskvine as vine
m = vine.Manager(9123)
task = vine.Task("echo hello from worker")
task.set_tag("hello-demo")
m.submit(task)
while not m.empty():
done = m.wait(5)
if done:
print(done.output)
Then we start a worker:
vine_worker localhost 9123
The official manual covers this workflow in more detail and should be the first reference for new readers: TaskVine User’s Manual.
How to clone and build the source
If we want to study or modify the code instead of only using packaged binaries, it helps to build from a clean Conda environment. A practical from-scratch path looks like this:
# Create an isolated development environment with a pinned Python version.
conda create -y -p cctools-dev -c conda-forge --strict-channel-priority python=3.11.12 conda-pack
# Activate the environment so the compiler, Python packages, and install prefix all point to the same place.
conda activate cctools-dev
# Install the build toolchain and development utilities used by CCTools.
conda install -y -c conda-forge --strict-channel-priority gcc_linux-64 gxx_linux-64 gdb m4 perl swig make zlib libopenssl-static openssl conda-pack packaging cloudpickle flake8 clang-format threadpoolctl
# Clone the repository and enter its root directory.
git clone https://github.com/cooperative-computing-lab/cctools.git
cd cctools
# Discover dependencies from the active Conda environment and install back into that same environment.
./configure --with-base-dir "${CONDA_PREFIX}" --prefix "${CONDA_PREFIX}"
# Remove stale build artifacts before compiling.
make clean
# Build the repository in parallel, then install it.
make -j8
make install
# Sanity check that the install path is active and the worker binary is usable.
vine_worker --version
This is consistent with the source build direction described in the repository and the official documentation, while being more explicit about the environment and toolchain: CCTools on GitHub and TaskVine User’s Manual.
How to rebuild after modifying TaskVine
Suppose we edit taskvine/src/manager/vine_manager.c. The simplest and safest rebuild path is still:
make clean
make -j
make install
That approach removes guesswork. In practice, our rule is even simpler: after changing any C source in TaskVine, go to taskvine/src/ and run a full clean rebuild:
cd taskvine/src
make clean && make -j && make install
It is slower than rebuilding only one subdirectory, but it is the least ambiguous path when we want to be sure the change has propagated through the TaskVine sources, libraries, and bindings.
One common exception is when we only change Python files in the bindings layer and do not touch any C source. Then we usually only need:
cd taskvine/src
make install
A first real patch in vine_manager.c
Now we reach the concrete example. The requested feature is simple: when a task is successfully dispatched, print which task was committed to which worker.
The target location sits inside send_one_task_with_cr() in taskvine/src/manager/vine_manager.c.
This is a good place to start because it captures one core manager step in a compact form: selecting a ready task, finding a worker that can take it, and committing that task to the worker.
Inside that function, the branch we care about is:
case VINE_SUCCESS: /* return on successful commit. */ {
committable_cores--;
total_inuse_cores++;
skip_list_remove_here(cur);
break;
}
This location already sits on the successful commit path, and the needed metadata is already available. By the time the commit succeeds, the runtime can identify:
- the task through
t->task_id - the Python level label through
t->tag - the worker host through
w->hostname - the worker address and port through
w->addrport
So the patch is straightforward:
case VINE_SUCCESS: /* return on successful commit. */ {
committable_cores--;
total_inuse_cores++;
notice(D_VINE,
"Committed task %d (%s) to worker %s (%s).",
t->task_id,
t->tag ? t->tag : "no-tag",
w->hostname ? w->hostname : "unknown-host",
w->addrport ? w->addrport : "unknown-addrport");
skip_list_remove_here(cur);
break;
}
This is a good beginner patch because it stays within the existing manager logging style, prints both machine precise and human readable identifiers, and shows how much easier the runtime becomes to navigate once we follow the Python-to-manager path.
How to verify the patch
After applying the change, we rebuild the source:
cd taskvine/src
make clean && make -j && make install
Then we run a tiny manager:
import ndcctools.taskvine as vine
m = vine.Manager(9123)
for i in range(3):
t = vine.Task(f"echo task-{i}")
t.set_tag(f"demo-{i}")
m.submit(t)
while not m.empty():
done = m.wait(5)
if done:
print(f"done: {done.id}")
In another terminal, we start a worker:
vine_worker localhost 9123
If everything is wired up correctly, the manager side output should now include lines like:
2026/03/18 16:42:26.94 vine_manager[867641]notice: Committed task 1 (demo-0) to worker condorfe.crc.nd.edu (127.0.0.1:35434).
2026/03/18 16:42:26.95 vine_manager[867641]notice: Committed task 2 (demo-1) to worker condorfe.crc.nd.edu (127.0.0.1:35434).
2026/03/18 16:42:26.95 vine_manager[867641]notice: Committed task 3 (demo-2) to worker condorfe.crc.nd.edu (127.0.0.1:35434).
The right mental model to keep
For a beginner, the most useful mental model is still the simplest one: start from the Python surface, not from the C internals. The codebase entry is cctools/taskvine. The manual explains the programming model, the bindings show how that model appears in Python, and the manager and worker runtime in C show how it is implemented.
Once we follow that order, even a first patch in vine_manager.c stops feeling mysterious. It becomes a small change inside a system whose layers already make sense.