cctools
ndcctools.taskvine.dask_executor.DaskVine Class Reference

TaskVine Manager specialized to compute dask graphs. More...

Inheritance diagram for ndcctools.taskvine.dask_executor.DaskVine:

Public Member Functions

def get
 Execute the task graph dsk and return the results for keys in graph. More...
 

Detailed Description

TaskVine Manager specialized to compute dask graphs.

Managers created via DaskVine can be used to execute dask graphs via the method ndcctools.taskvine.dask_executor.DaskVine.get as follows:

1 m = DaskVine(...)
2 # Initialize as any other. @see ndcctools.taskvine.manager.Manager
3 result = v.compute(scheduler= m.get)
4 
5 # or set by temporarily as the default for dask:
6 with dask.config.set(scheduler=m.get):
7  result = v.compute()

Parameters for execution can be set as arguments to the compute function. These arguments are applied to each task executed:

1 my_env = m.declare_poncho("my_env.tar.gz")
2 
3 with dask.config.set(scheduler=m.get):
4  # Each task uses at most 4 cores, they run in the my_env environment, and
5  # their allocation is set to maximum values seen.
6  # If resource_mode is different than None, then the resource monitor is activated.
7  result = v.compute(resources={"cores": 1}, resources_mode="max", environment=my_env)

Member Function Documentation

def ndcctools.taskvine.dask_executor.DaskVine.get (   self,
  dsk,
  keys,
  environment = None,
  extra_files = None,
  lazy_transfers = False,
  low_memory_mode = False,
  checkpoint_fn = None,
  resources = None,
  resources_mode = 'fixed',
  retries = 5,
  verbose = False 
)

Execute the task graph dsk and return the results for keys in graph.

Parameters
dskThe task graph to execute.
keysA single key or a possible nested list of keys to compute the value from dsk.
environmentA taskvine file representing an environment to run the tasks.
extra_filesA dictionary of {taskvine.File: "remote_name"} to add to each task.
lazy_transfersWhether to keep intermediate results only at workers (True) or to bring back each result to the manager (False, default). True is more IO efficient, but runs the risk of needing to recompute results if workers are lost.
low_memory_modeSplit graph vertices to reduce memory needed per function call. It removes some of the dask graph optimizations, thus proceed with care.
checkpoint_fnWhen using lazy_transfers, a predicate with arguments (dag, key) called before submitting a task. If True, the result is brought back to the manager.
resourcesA dictionary with optional keys of cores, memory and disk (MB) to set maximum resource usage per task.
resources_modeAutomatically resize allocation per task. One of 'fixed' (use the value of 'resources' above), 'max througput', 'max' (for maximum values seen), 'min_waste', 'greedy bucketing' or 'exhaustive bucketing'. This is done per function type in dsk.
retriesNumber of times to attempt a task. Default is 5.
verboseif true, emit additional debugging information.

The documentation for this class was generated from the following file: