NAME

Work_Queue::Task - Perl Work Queue Task bindings.

SYNOPSIS

The objects and methods provided by this package correspond to the native C API in work_queue.h for task creation and manipulation. This module is automatically loaded with Work_Queue.

                use Work_Queue;

                my $t = Work_Queue::Task->new($command);
                $t->specify_input_file(local_name => 'some_name', remote_name => 'some_other_name');
                $t->specify_output_file('some_name');

                $q->submit($t);

                $t = $q->wait(5);

                if($t) {
                                my $resources = $t->resources_measured;
                                print $resources->{resident_memory}, '\n';
                }

METHODS

Work_Queue::Task

`Work_Queue::Task->new('/some/command < input > output');`

Create a new task specification.

command: The shell command line to be exected by the task.

`specify_tag`

Attach a user defined logical name to the task.

tag: The tag to be assigned.

`specify_category`

Label the task with the given category. It is expected that tasks with the same category have similar resources requirements (e.g. for fast abort).

tag: The name of the category.

`specify_feature`

Label the task with the given user-defined feature. The task will only run on a worker that provides (--feature option) such feature.

name: The name of the required feature.

`clone`

Return a copy of this task.

`specify_command`

Set the command to be executed by the task.

command: The command to be executed.

`specify_algorithm`

Set the worker selection algorithm for task.

algorithm

One of the following algorithms to use in assigning a task to a worker:

WORK_QUEUE_SCHEDULE_FCFS
WORK_QUEUE_SCHEDULE_FILES
WORK_QUEUE_SCHEDULE_TIME
WORK_QUEUE_SCHEDULE_RAND

`specify_preferred_host`

Indicate that the task would be optimally run on a given host.

hostname: The hostname to which this task would optimally be sent.

`specify_file`

Add a file to the task.

local_name

The name of the file on local disk or shared filesystem.

remote_name

The name of the file at the execution site.

type

Must be one of the following values: $Work_Queue::WORK_QUEUE_INPUT or $Work_Queue::WORK_QUEUE_OUTPUT

flags

May be zero to indicate no special handling, or any of the following or'd together:

$Work_Queue::WORK_QUEUE_NOCACHE
$Work_Queue::WORK_QUEUE_CACHE
$Work_Queue::WORK_QUEUE_WATCH
$Work_Queue::WORK_QUEUE_FAILURE_ONLY

cache

Whether the file should be cached at workers. By default this is enabled.

failure_only

For output files only, whether the file should be retrieved only when the task fails. On successful executions the file will not be retrieved.

                $t->specify_file(local_name => ...);

                $t->specify_file(local_name => ..., remote_name => ..., );

`specify_file_command`

Add a file to the task which will be transfered with a command at the worker.

remote_name

The name of the file at the execution site.

cmd

The shell command to transfer the file. Any occurance of the string %% will be replaced with the internal name that work queue uses for the file.

type

Must be one of the following values: $Work_Queue::WORK_QUEUE_INPUT or $Work_Queue::WORK_QUEUE_OUTPUT

flags

May be zero to indicate no special handling, or any of the following or'd together:

$Work_Queue::WORK_QUEUE_NOCACHE
$Work_Queue::WORK_QUEUE_CACHE
$Work_Queue::WORK_QUEUE_FAILURE_ONLY

cache

Whether the file should be cached at workers. By default this is enabled.

failure_only

For output files only, whether the file should be retrieved only when the task fails. On successful executions the file will not be retrieved.

        $t->specify_file_command("my.result", "chirp_put %% chirp://somewhere/result.file", type=$Work_Queue::WORK_QUEUE_OUTPUT)

`specify_file_piece`

Add a file piece to the task.

local_name: The name of the file on local disk or shared filesystem.
remote_name: The name of the file at the execution site.
start_byte: The starting byte offset of the file piece to be transferred.
end_byte: The ending byte offset of the file piece to be transferred.
type: Must be one of the following values: $Work_Queue::WORK_QUEUE_INPUT or $Work_Queue::WORK_QUEUE_OUTPUT.
flags: May be zero to indicate no special handling, or any of the following or'd together. See Work_Queue::Task->specify_file
cache: Whether the file should be cached at workers. By default this is enabled.
failure_only: For output files only, whether the file should be retrieved only when the task fails. On successful executions the file will not be retrieved.

                $t->specify_file_piece(local_name => ..., start_byte => ..., ...);

                $t->specify_file_piece(local_name => ..., remote_name => ..., ...);

`specify_input_file`

Add a input file to the task.

This is just a wrapper for Work_Queue::Task->specify_file with type set to $Work_Queue::WORK_QUEUE_INPUT. If only one argument is given, it defaults to both local_name and remote_name.

`specify_output_file`

Add a output file to the task.

This is just a wrapper for Work_Queue::Task->specify_file with type set to $Work_Queue::WORK_QUEUE_OUTPUT. If only one argument is given, then it defaults to both local_name and remote_name.

`specify_directory`

Add a directory to the task.

local_name: The name of the directory on local disk or shared filesystem. Optional if the directory is empty.
remote_name: The name of the directory at the remote execution site.
type: Must be one of $Work_Queue::WORK_QUEUE_INPUT or $Work_Queue::WORK_QUEUE_OUTPUT.
flags May be zero to indicate no special handling. See Work_Queue::Task->specify_file.
recursive: Indicates whether just the directory (0) or the directory and all of its contents (1) should be included.
cache: Whether the file should be cached at workers. By default this is enabled.
failure_only: For output files only, whether the file should be retrieved only when the task fails. On successful executions the file will not be retrieved.

Returns 1 if the task directory is successfully specified, 0 if either of @a local_name, or @a remote_name is null or @a remote_name is an absolute path.

`specify_buffer`

Add an input bufer to the task.

buffer: The contents of the buffer to pass as input.
remote_name: The name of the remote file to create.
flags: May take the same values as Work_Queue::Task->specify_file.
cache: Whether the file should be cached at workers. By default this is enabled.
failure_only: For output files only, whether the file should be retrieved only when the task fails. On successful executions the file will not be retrieved.

`specify_snapshot_file`

When monitoring, indicates a json-encoded file that instructs the monitor to take a snapshot of the task resources. Snapshots appear in the JSON summary file of the task, under the key "snapshots". Snapshots are taken on events on files described in the monitor_snapshot_file. The monitor_snapshot_file is a json encoded file with the following format:

  {
      "FILENAME": {
          "from-start":boolean,
          "from-start-if-truncated":boolean,
          "delete-if-found":boolean,
          "events": [
              {
                  "label":"EVENT_NAME",
                  "on-create":boolean,
                  "on-truncate":boolean,
                  "pattern":"REGEXP",
                  "count":integer
              },
              {
                  "label":"EVENT_NAME",
                  ...
              }
          ]
      },
      "FILENAME": {
          ...
  }

All keys but "label" are optional:

  from-start:boolean         If FILENAME exits when task starts running, process from line 1. Default: false, as the task may be appending to an already existing file.
  from-start-if-truncated    If FILENAME is truncated, process from line 1. Default: true, to account for log rotations.
  delete-if-found            Delete FILENAME when found. Default: false

  events:
  label        Name that identifies the snapshot. Only alphanumeric, -, and _
               characters are allowed. 
  on-create    Take a snapshot every time the file is created. Default: false
  on-truncate  Take a snapshot when the file is truncated.    Default: false
  on-pattern   Take a snapshot when a line matches the regexp pattern.    Default: none
  count        Maximum number of snapshots for this label. Default: -1 (no limit)

Exactly one of on-create, on-truncate, or on-pattern should be specified. For more information, consult the manual of the resource_monitor.

remote_name: @param filename The name of the snapshot events specification

`specify_max_retries`

Specify the number of times this task is retried on worker errors. If less than one, the task is retried indefinitely (this the default). A task that did not succeed after the given number of retries is returned with result $WORK_QUEUE_RESULT_MAX_RETRIES.

max_retries: Number of retries.

`specify_cores`

Specify the number of cores the task requires.

n: Number of cores.

`specify_memory`

Specify the size of the memory the task requires.

n: Memory size, in megabytes.

`specify_disk`

Specify the size of disk the task requires.

n: Disk size, in megabytes.

`specify_gpus`

Specify the number of gpus the task requires.

n: Number of gpus.

`specify_end_time`

Indicate the maximum end time (absolute, in microseconds from the Epoch) of this task. This is useful, for example, when the task uses certificates that expire. If less than 1, or not specified, no limit is imposed.

useconds: Number of microseconds.

`specify_running_time`

Indicate the maximum running time (in microseconds) for a task in a worker (relative to when the task starts to run). If less than 1, or not specified, no limit is imposed. Note: Same as specify_running_time_max, but specified in microseconds. Kept for backwards compatibility.

useconds: Number of microseconds.

`specify_running_time_max`

Indicate the maximum running time (in seconds) for a task in a worker (relative to when the task starts to run). If less than 1, or not specified, no limit is imposed.

seconds: Number of seconds.

`specify_running_time_min`

Indicate the minimum running time (in seconds) the task needs (relative to when the task starts to run). If less than 1, or not specified, no minimum time is defined.

seconds: Number of seconds.

`specify_priority`

Indicate the the priority of this task (larger means better priority, default is 0).

n: Integer priority.

`specify_environment_variable`

Set the environment variable to value before the task is run.

name: Name of the environment variable.
value: Value of the environment variable. Variable is unset if value is not given.

`specify_monitor_output`

Set the directory name for the resource output from the monitor.

directory: Name of the directory.

`tag`

Get the tag value of the task.

`priority`

Get the priority value of the task.

`command`

Get the command line of the task.

`algorithm`

Get the algorithm specified for this task to be dispatched.

`output`

Get the standard output of the task. Must be called only after the task completes execution.

`id`

Get the task id number.

`return_status`

Get the exit code of the command executed by the task. Must be called only after the task completes execution.

`result`

Get the result of the task as in integer (successful, failed return_status, missing input file, missing output file).

Must be called only after the task completes execution.

`result_str`

Returns a string that explains the result of a task. (SUCCESS, INPUT_MISS, OUTPUT_MISS, etc.)

`total_submissions`

Get the number of times the task has been resubmitted internally.

Must be called only after the task completes execution.

`host`

Get the address and port of the host on which the task ran. Must be called only after the task completes execution.

`hostname`

Get the name of the host on which the task ran. Must be called only after the task completes execution.

`commit_time`

Get the time at which this task was committed to a worker. Must be called only after the task completes execution.

`submit_time`

Get the time at which this task was submitted.

Must be called only after the task completes execution.

`finish_time`

Get the time at which this task was finished.

Must be called only after the task completes execution.

`time_app_delay`

Get the time spent in upper-level application (outside of work_queue_wait).

Must be called only after the task completes execution.

`send_input_start`

Get the time at which the task started to transfer input files.

Must be called only after the task completes execution.

`send_input_finish`

Get the time at which the task finished transferring input files.

Must be called only after the task completes execution.

`execute_cmd_start`

The time at which the task began.

Must be called only after the task completes execution.

`execute_cmd_finish`

Get the time at which the task finished (discovered by the manager).

Must be called only after the task completes execution.

`receive_output_start`

Get the time at which the task started to transfer output files.

Must be called only after the task completes execution.

`receive_output_finish`

Get the time at which the task finished transferring output files.

Must be called only after the task completes execution.

`total_bytes_received`

Get the number of bytes received since task started receiving input data.

Must be called only after the task completes execution.

`total_bytes_sent`

Get the number of bytes sent since task started sending input data.

Must be called only after the task completes execution.

`total_bytes_transferred`

Get the number of bytes transferred since task started transferring input data.

Must be called only after the task completes execution.

`total_transfer_time`

Get the time comsumed in microseconds for transferring total_bytes_transferred.

Must be called only after the task completes execution.

`cmd_execution_time`

Get the time spent in microseconds for executing the command on the worker.

Must be called only after the task completes execution.

`total_cmd_execution_time`

        Get the time spent in microseconds for executing the command on any worker.

        Must be called only after the task completes execution.

`resources_measured`

        Get the resources measured when monitoring is enabled.

        Must be called only after the task completes execution.

                start:                     microseconds at the start of execution, since epoch.

                $t->resources_measured{start};

                end:                       microseconds at the end of execution, since epoch.

                $t->resources_measured{end};

                wall_time:                 microseconds spent during execution

                $t->resources_measured{wall_time};

                cpu_time:                  user + system time of the execution

                $t->resources_measured{cpu_time};

                cores:                     peak number of cores used

                $t->resources_measured{cores};

                cores_avg:                 number of cores computed as cpu_time/wall_time

                $t->resources_measured{cores_avg};

                max_concurrent_processes:  the maximum number of processes running concurrently

                $t->resources_measured{max_concurrent_processes};

                total_processes:           count of all of the processes created

                $t->resources_measured{total_processes};

                virtual_memory:            maximum virtual memory across all processes

                $t->resources_measured{virtual_memory};

                resident_memory:           maximum resident size across all processes

                $t->resources_measured{memory};

                swap_memory:               maximum swap usage across all processes

                $t->resources_measured{swap_memory};

                bytes_read:                number of bytes read from disk

                $t->resources_measured{bytes_read};

                bytes_written:             number of bytes written to disk

                $t->resources_measured{bytes_written};

                bytes_received:            number of bytes read from the network

                $t->resources_measured{bytes_received};

                bytes_sent:                number of bytes written to the network

                $t->resources_measured{bytes_sent};

                bandwidth:                 maximum network bits/s (average over one minute)

                $t->resources_measured{bandwidth};

                total_files:         total maximum number of files and directories of all the working directories in the tree

                $t->resources_measured{total_files};

                disk:                      size in MB of all working directories in the tree

                $t->resources_measured{disk};

`resources_requested`

        Get the resources requested by the task. See @resources_measured for possible fields.

        Must be called only after the task completes execution.

`resources_allocated`

        Get the resources allocatet to the task in its latest attempt. See @resources_measured for possible fields.

POD ERRORS

Hey! The above document had some coding errors, which are explained below:

Around line 646:: '=item' outside of any '=over'
Around line 863:: '=item' outside of any '=over'
Around line 1275:: =cut found outside a pod block. Skipping to next block.