NAME

Work_Queue::Task - Perl Work Queue Task bindings.

SYNOPSIS

The objects and methods provided by this package correspond to the native C API in work_queue.h for task creation and manipulation. This module is automatically loaded with Work_Queue.

                use Work_Queue;

                my $t = Work_Queue::Task->new($command);
                $t->specify_input_file(local_name => 'some_name', remote_name => 'some_other_name');
                $t->specify_output_file('some_name');

                $q->submit($t);

                $t = $q->wait(5);

                if($t) {
                                my $resources = $t->resources_measured;
                                print $resources->{resident_memory}, '\n';
                }

METHODS

Work_Queue::Task

Work_Queue::Task->new('/some/command < input > output');

Create a new task specification.

command

The shell command line to be exected by the task.

specify_tag

Attach a user defined logical name to the task.

tag

The tag to be assigned.

specify_category

Label the task with the given category. It is expected that tasks with the same category have similar resources requirements (e.g. for fast abort).

tag

The name of the category.

specify_feature

Label the task with the given user-defined feature. The task will only run on a worker that provides (--feature option) such feature.

name

The name of the required feature.

clone

Return a copy of this task.

specify_command

Set the command to be executed by the task.

command

The command to be executed.

specify_algorithm

Set the worker selection algorithm for task.

algorithm

One of the following algorithms to use in assigning a task to a worker:

WORK_QUEUE_SCHEDULE_FCFS
WORK_QUEUE_SCHEDULE_FILES
WORK_QUEUE_SCHEDULE_TIME
WORK_QUEUE_SCHEDULE_RAND

specify_preferred_host

Indicate that the task would be optimally run on a given host.

hostname

The hostname to which this task would optimally be sent.

specify_file

Add a file to the task.

local_name

The name of the file on local disk or shared filesystem.

remote_name

The name of the file at the execution site.

type

Must be one of the following values: $Work_Queue::WORK_QUEUE_INPUT or $Work_Queue::WORK_QUEUE_OUTPUT

flags

May be zero to indicate no special handling, or any of the following or'd together:

$Work_Queue::WORK_QUEUE_NOCACHE
$Work_Queue::WORK_QUEUE_CACHE
$Work_Queue::WORK_QUEUE_WATCH
$Work_Queue::WORK_QUEUE_FAILURE_ONLY
cache

Whether the file should be cached at workers. By default this is enabled.

failure_only

For output files only, whether the file should be retrieved only when the task fails. On successful executions the file will not be retrieved.

                $t->specify_file(local_name => ...);

                $t->specify_file(local_name => ..., remote_name => ..., );

specify_file_command

Add a file to the task which will be transfered with a command at the worker.

remote_name

The name of the file at the execution site.

cmd

The shell command to transfer the file. Any occurance of the string %% will be replaced with the internal name that work queue uses for the file.

type

Must be one of the following values: $Work_Queue::WORK_QUEUE_INPUT or $Work_Queue::WORK_QUEUE_OUTPUT

flags

May be zero to indicate no special handling, or any of the following or'd together:

$Work_Queue::WORK_QUEUE_NOCACHE
$Work_Queue::WORK_QUEUE_CACHE
$Work_Queue::WORK_QUEUE_FAILURE_ONLY
cache

Whether the file should be cached at workers. By default this is enabled.

failure_only

For output files only, whether the file should be retrieved only when the task fails. On successful executions the file will not be retrieved.

        $t->specify_file_command("my.result", "chirp_put %% chirp://somewhere/result.file", type=$Work_Queue::WORK_QUEUE_OUTPUT)

specify_file_piece

Add a file piece to the task.

local_name

The name of the file on local disk or shared filesystem.

remote_name

The name of the file at the execution site.

start_byte

The starting byte offset of the file piece to be transferred.

end_byte

The ending byte offset of the file piece to be transferred.

type

Must be one of the following values: $Work_Queue::WORK_QUEUE_INPUT or $Work_Queue::WORK_QUEUE_OUTPUT.

flags

May be zero to indicate no special handling, or any of the following or'd together. See Work_Queue::Task->specify_file

cache

Whether the file should be cached at workers. By default this is enabled.

failure_only

For output files only, whether the file should be retrieved only when the task fails. On successful executions the file will not be retrieved.

                $t->specify_file_piece(local_name => ..., start_byte => ..., ...);

                $t->specify_file_piece(local_name => ..., remote_name => ..., ...);

specify_input_file

Add a input file to the task.

This is just a wrapper for Work_Queue::Task->specify_file with type set to $Work_Queue::WORK_QUEUE_INPUT. If only one argument is given, it defaults to both local_name and remote_name.

specify_output_file

Add a output file to the task.

This is just a wrapper for Work_Queue::Task->specify_file with type set to $Work_Queue::WORK_QUEUE_OUTPUT. If only one argument is given, then it defaults to both local_name and remote_name.

specify_directory

Add a directory to the task.

local_name

The name of the directory on local disk or shared filesystem. Optional if the directory is empty.

remote_name

The name of the directory at the remote execution site.

type

Must be one of $Work_Queue::WORK_QUEUE_INPUT or $Work_Queue::WORK_QUEUE_OUTPUT.

flags May be zero to indicate no special handling. See Work_Queue::Task->specify_file.
recursive

Indicates whether just the directory (0) or the directory and all of its contents (1) should be included.

cache

Whether the file should be cached at workers. By default this is enabled.

failure_only

For output files only, whether the file should be retrieved only when the task fails. On successful executions the file will not be retrieved.

Returns 1 if the task directory is successfully specified, 0 if either of @a local_name, or @a remote_name is null or @a remote_name is an absolute path.

specify_buffer

Add an input bufer to the task.

buffer

The contents of the buffer to pass as input.

remote_name

The name of the remote file to create.

flags

May take the same values as Work_Queue::Task->specify_file.

cache

Whether the file should be cached at workers. By default this is enabled.

failure_only

For output files only, whether the file should be retrieved only when the task fails. On successful executions the file will not be retrieved.

specify_snapshot_file

When monitoring, indicates a json-encoded file that instructs the monitor to take a snapshot of the task resources. Snapshots appear in the JSON summary file of the task, under the key "snapshots". Snapshots are taken on events on files described in the monitor_snapshot_file. The monitor_snapshot_file is a json encoded file with the following format:

  {
      "FILENAME": {
          "from-start":boolean,
          "from-start-if-truncated":boolean,
          "delete-if-found":boolean,
          "events": [
              {
                  "label":"EVENT_NAME",
                  "on-create":boolean,
                  "on-truncate":boolean,
                  "pattern":"REGEXP",
                  "count":integer
              },
              {
                  "label":"EVENT_NAME",
                  ...
              }
          ]
      },
      "FILENAME": {
          ...
  }

All keys but "label" are optional:

  from-start:boolean         If FILENAME exits when task starts running, process from line 1. Default: false, as the task may be appending to an already existing file.
  from-start-if-truncated    If FILENAME is truncated, process from line 1. Default: true, to account for log rotations.
  delete-if-found            Delete FILENAME when found. Default: false

  events:
  label        Name that identifies the snapshot. Only alphanumeric, -, and _
               characters are allowed. 
  on-create    Take a snapshot every time the file is created. Default: false
  on-truncate  Take a snapshot when the file is truncated.    Default: false
  on-pattern   Take a snapshot when a line matches the regexp pattern.    Default: none
  count        Maximum number of snapshots for this label. Default: -1 (no limit)

Exactly one of on-create, on-truncate, or on-pattern should be specified. For more information, consult the manual of the resource_monitor.

remote_name

@param filename The name of the snapshot events specification

specify_max_retries

Specify the number of times this task is retried on worker errors. If less than one, the task is retried indefinitely (this the default). A task that did not succeed after the given number of retries is returned with result $WORK_QUEUE_RESULT_MAX_RETRIES.

max_retries

Number of retries.

specify_cores

Specify the number of cores the task requires.

n

Number of cores.

specify_memory

Specify the size of the memory the task requires.

n

Memory size, in megabytes.

specify_disk

Specify the size of disk the task requires.

n

Disk size, in megabytes.

specify_gpus

Specify the number of gpus the task requires.

n

Number of gpus.

specify_end_time

Indicate the maximum end time (absolute, in microseconds from the Epoch) of this task. This is useful, for example, when the task uses certificates that expire. If less than 1, or not specified, no limit is imposed.

useconds

Number of microseconds.

specify_running_time

Indicate the maximum running time (in microseconds) for a task in a worker (relative to when the task starts to run). If less than 1, or not specified, no limit is imposed. Note: Same as specify_running_time_max, but specified in microseconds. Kept for backwards compatibility.

useconds

Number of microseconds.

specify_running_time_max

Indicate the maximum running time (in seconds) for a task in a worker (relative to when the task starts to run). If less than 1, or not specified, no limit is imposed.

seconds

Number of seconds.

specify_running_time_min

Indicate the minimum running time (in seconds) the task needs (relative to when the task starts to run). If less than 1, or not specified, no minimum time is defined.

seconds

Number of seconds.

specify_priority

Indicate the the priority of this task (larger means better priority, default is 0).

n

Integer priority.

specify_environment_variable

Set the environment variable to value before the task is run.

name

Name of the environment variable.

value

Value of the environment variable. Variable is unset if value is not given.

specify_monitor_output

Set the directory name for the resource output from the monitor.

directory

Name of the directory.

tag

Get the tag value of the task.

priority

Get the priority value of the task.

command

Get the command line of the task.

algorithm

Get the algorithm specified for this task to be dispatched.

output

Get the standard output of the task. Must be called only after the task completes execution.

id

Get the task id number.

return_status

Get the exit code of the command executed by the task. Must be called only after the task completes execution.

result

Get the result of the task as in integer (successful, failed return_status, missing input file, missing output file).

Must be called only after the task completes execution.

result_str

Returns a string that explains the result of a task. (SUCCESS, INPUT_MISS, OUTPUT_MISS, etc.)

total_submissions

Get the number of times the task has been resubmitted internally.

Must be called only after the task completes execution.

host

Get the address and port of the host on which the task ran. Must be called only after the task completes execution.

hostname

Get the name of the host on which the task ran. Must be called only after the task completes execution.

commit_time

Get the time at which this task was committed to a worker. Must be called only after the task completes execution.

submit_time

Get the time at which this task was submitted.

Must be called only after the task completes execution.

finish_time

Get the time at which this task was finished.

Must be called only after the task completes execution.

time_app_delay

Get the time spent in upper-level application (outside of work_queue_wait).

Must be called only after the task completes execution.

send_input_start

Get the time at which the task started to transfer input files.

Must be called only after the task completes execution.

send_input_finish

Get the time at which the task finished transferring input files.

Must be called only after the task completes execution.

execute_cmd_start

The time at which the task began.

Must be called only after the task completes execution.

execute_cmd_finish

Get the time at which the task finished (discovered by the manager).

Must be called only after the task completes execution.

receive_output_start

Get the time at which the task started to transfer output files.

Must be called only after the task completes execution.

receive_output_finish

Get the time at which the task finished transferring output files.

Must be called only after the task completes execution.

total_bytes_received

Get the number of bytes received since task started receiving input data.

Must be called only after the task completes execution.

total_bytes_sent

Get the number of bytes sent since task started sending input data.

Must be called only after the task completes execution.

total_bytes_transferred

Get the number of bytes transferred since task started transferring input data.

Must be called only after the task completes execution.

total_transfer_time

Get the time comsumed in microseconds for transferring total_bytes_transferred.

Must be called only after the task completes execution.

cmd_execution_time

Get the time spent in microseconds for executing the command on the worker.

Must be called only after the task completes execution.

total_cmd_execution_time

        Get the time spent in microseconds for executing the command on any worker.

        Must be called only after the task completes execution.

resources_measured

        Get the resources measured when monitoring is enabled.

        Must be called only after the task completes execution.

                start:                     microseconds at the start of execution, since epoch.

                $t->resources_measured{start};

                end:                       microseconds at the end of execution, since epoch.

                $t->resources_measured{end};

                wall_time:                 microseconds spent during execution

                $t->resources_measured{wall_time};

                cpu_time:                  user + system time of the execution

                $t->resources_measured{cpu_time};

                cores:                     peak number of cores used

                $t->resources_measured{cores};

                cores_avg:                 number of cores computed as cpu_time/wall_time

                $t->resources_measured{cores_avg};

                max_concurrent_processes:  the maximum number of processes running concurrently

                $t->resources_measured{max_concurrent_processes};

                total_processes:           count of all of the processes created

                $t->resources_measured{total_processes};

                virtual_memory:            maximum virtual memory across all processes

                $t->resources_measured{virtual_memory};

                resident_memory:           maximum resident size across all processes

                $t->resources_measured{memory};

                swap_memory:               maximum swap usage across all processes

                $t->resources_measured{swap_memory};

                bytes_read:                number of bytes read from disk

                $t->resources_measured{bytes_read};

                bytes_written:             number of bytes written to disk

                $t->resources_measured{bytes_written};

                bytes_received:            number of bytes read from the network

                $t->resources_measured{bytes_received};

                bytes_sent:                number of bytes written to the network

                $t->resources_measured{bytes_sent};

                bandwidth:                 maximum network bits/s (average over one minute)

                $t->resources_measured{bandwidth};

                total_files:         total maximum number of files and directories of all the working directories in the tree

                $t->resources_measured{total_files};

                disk:                      size in MB of all working directories in the tree

                $t->resources_measured{disk};

resources_requested

        Get the resources requested by the task. See @resources_measured for possible fields.

        Must be called only after the task completes execution.

resources_allocated

        Get the resources allocatet to the task in its latest attempt. See @resources_measured for possible fields.

POD ERRORS

Hey! The above document had some coding errors, which are explained below:

Around line 646:

'=item' outside of any '=over'

Around line 863:

'=item' outside of any '=over'

Around line 1275:

=cut found outside a pod block. Skipping to next block.