Work_Queue::Task - Perl Work Queue Task bindings.
The objects and methods provided by this package correspond to the native C API in work_queue.h for task creation and manipulation. This module is automatically loaded with Work_Queue
.
use Work_Queue;
my $t = Work_Queue::Task->new($command);
$t->specify_input_file(local_name => 'some_name', remote_name => 'some_other_name');
$t->specify_output_file('some_name');
$q->submit($t);
$t = $q->wait(5);
if($t) {
my $resources = $t->resources_measured;
print $resources->{resident_memory}, '\n';
}
Work_Queue::Task->new('/some/command < input > output');
Create a new task specification.
The shell command line to be exected by the task.
specify_tag
Attach a user defined logical name to the task.
The tag to be assigned.
specify_category
Label the task with the given category. It is expected that tasks with the same category have similar resources requirements (e.g. for fast abort).
The name of the category.
specify_feature
Label the task with the given user-defined feature. The task will only run on a worker that provides (--feature option) such feature.
The name of the required feature.
clone
Return a copy of this task.
specify_command
Set the command to be executed by the task.
The command to be executed.
specify_algorithm
Set the worker selection algorithm for task.
One of the following algorithms to use in assigning a task to a worker:
specify_preferred_host
Indicate that the task would be optimally run on a given host.
The hostname to which this task would optimally be sent.
specify_file
Add a file to the task.
The name of the file on local disk or shared filesystem.
The name of the file at the execution site.
Must be one of the following values: $Work_Queue::WORK_QUEUE_INPUT or $Work_Queue::WORK_QUEUE_OUTPUT
May be zero to indicate no special handling, or any of the following or'd together:
Whether the file should be cached at workers. By default this is enabled.
For output files only, whether the file should be retrieved only when the task fails. On successful executions the file will not be retrieved.
$t->specify_file(local_name => ...);
$t->specify_file(local_name => ..., remote_name => ..., );
specify_file_command
Add a file to the task which will be transfered with a command at the worker.
The name of the file at the execution site.
The shell command to transfer the file. Any occurance of the string %% will be replaced with the internal name that work queue uses for the file.
Must be one of the following values: $Work_Queue::WORK_QUEUE_INPUT or $Work_Queue::WORK_QUEUE_OUTPUT
May be zero to indicate no special handling, or any of the following or'd together:
Whether the file should be cached at workers. By default this is enabled.
For output files only, whether the file should be retrieved only when the task fails. On successful executions the file will not be retrieved.
$t->specify_file_command("my.result", "chirp_put %% chirp://somewhere/result.file", type=$Work_Queue::WORK_QUEUE_OUTPUT)
specify_file_piece
Add a file piece to the task.
The name of the file on local disk or shared filesystem.
The name of the file at the execution site.
The starting byte offset of the file piece to be transferred.
The ending byte offset of the file piece to be transferred.
Must be one of the following values: $Work_Queue::WORK_QUEUE_INPUT or $Work_Queue::WORK_QUEUE_OUTPUT.
May be zero to indicate no special handling, or any of the following or'd together. See Work_Queue::Task->specify_file
Whether the file should be cached at workers. By default this is enabled.
For output files only, whether the file should be retrieved only when the task fails. On successful executions the file will not be retrieved.
$t->specify_file_piece(local_name => ..., start_byte => ..., ...);
$t->specify_file_piece(local_name => ..., remote_name => ..., ...);
specify_input_file
Add a input file to the task.
This is just a wrapper for Work_Queue::Task->specify_file with type set to $Work_Queue::WORK_QUEUE_INPUT. If only one argument is given, it defaults to both local_name and remote_name.
specify_output_file
Add a output file to the task.
This is just a wrapper for Work_Queue::Task->specify_file with type set to $Work_Queue::WORK_QUEUE_OUTPUT. If only one argument is given, then it defaults to both local_name and remote_name.
specify_directory
Add a directory to the task.
The name of the directory on local disk or shared filesystem. Optional if the directory is empty.
The name of the directory at the remote execution site.
Must be one of $Work_Queue::WORK_QUEUE_INPUT or $Work_Queue::WORK_QUEUE_OUTPUT.
Indicates whether just the directory (0) or the directory and all of its contents (1) should be included.
Whether the file should be cached at workers. By default this is enabled.
For output files only, whether the file should be retrieved only when the task fails. On successful executions the file will not be retrieved.
Returns 1 if the task directory is successfully specified, 0 if either of @a local_name, or @a remote_name is null or @a remote_name is an absolute path.
specify_buffer
Add an input bufer to the task.
The contents of the buffer to pass as input.
The name of the remote file to create.
May take the same values as Work_Queue::Task->specify_file.
Whether the file should be cached at workers. By default this is enabled.
For output files only, whether the file should be retrieved only when the task fails. On successful executions the file will not be retrieved.
specify_snapshot_file
When monitoring, indicates a json-encoded file that instructs the monitor to take a snapshot of the task resources. Snapshots appear in the JSON summary file of the task, under the key "snapshots". Snapshots are taken on events on files described in the monitor_snapshot_file. The monitor_snapshot_file is a json encoded file with the following format:
{
"FILENAME": {
"from-start":boolean,
"from-start-if-truncated":boolean,
"delete-if-found":boolean,
"events": [
{
"label":"EVENT_NAME",
"on-create":boolean,
"on-truncate":boolean,
"pattern":"REGEXP",
"count":integer
},
{
"label":"EVENT_NAME",
...
}
]
},
"FILENAME": {
...
}
All keys but "label" are optional:
from-start:boolean If FILENAME exits when task starts running, process from line 1. Default: false, as the task may be appending to an already existing file.
from-start-if-truncated If FILENAME is truncated, process from line 1. Default: true, to account for log rotations.
delete-if-found Delete FILENAME when found. Default: false
events:
label Name that identifies the snapshot. Only alphanumeric, -, and _
characters are allowed.
on-create Take a snapshot every time the file is created. Default: false
on-truncate Take a snapshot when the file is truncated. Default: false
on-pattern Take a snapshot when a line matches the regexp pattern. Default: none
count Maximum number of snapshots for this label. Default: -1 (no limit)
Exactly one of on-create, on-truncate, or on-pattern should be specified. For more information, consult the manual of the resource_monitor.
@param filename The name of the snapshot events specification
specify_max_retries
Specify the number of times this task is retried on worker errors. If less than one, the task is retried indefinitely (this the default). A task that did not succeed after the given number of retries is returned with result $WORK_QUEUE_RESULT_MAX_RETRIES.
Number of retries.
specify_cores
Specify the number of cores the task requires.
Number of cores.
specify_memory
Specify the size of the memory the task requires.
Memory size, in megabytes.
specify_disk
Specify the size of disk the task requires.
Disk size, in megabytes.
specify_gpus
Specify the number of gpus the task requires.
Number of gpus.
specify_end_time
Indicate the maximum end time (absolute, in microseconds from the Epoch) of this task. This is useful, for example, when the task uses certificates that expire. If less than 1, or not specified, no limit is imposed.
Number of microseconds.
specify_running_time
Indicate the maximum running time (in microseconds) for a task in a worker (relative to when the task starts to run). If less than 1, or not specified, no limit is imposed. Note: Same as specify_running_time_max, but specified in microseconds. Kept for backwards compatibility.
Number of microseconds.
specify_running_time_max
Indicate the maximum running time (in seconds) for a task in a worker (relative to when the task starts to run). If less than 1, or not specified, no limit is imposed.
Number of seconds.
specify_running_time_min
Indicate the minimum running time (in seconds) the task needs (relative to when the task starts to run). If less than 1, or not specified, no minimum time is defined.
Number of seconds.
specify_priority
Indicate the the priority of this task (larger means better priority, default is 0).
Integer priority.
specify_environment_variable
Set the environment variable to value before the task is run.
Name of the environment variable.
Value of the environment variable. Variable is unset if value is not given.
specify_monitor_output
Set the directory name for the resource output from the monitor.
Name of the directory.
tag
Get the tag value of the task.
priority
Get the priority value of the task.
command
Get the command line of the task.
algorithm
Get the algorithm specified for this task to be dispatched.
output
Get the standard output of the task. Must be called only after the task completes execution.
id
Get the task id number.
return_status
Get the exit code of the command executed by the task. Must be called only after the task completes execution.
result
Get the result of the task as in integer (successful, failed return_status, missing input file, missing output file).
Must be called only after the task completes execution.
result_str
Returns a string that explains the result of a task. (SUCCESS, INPUT_MISS, OUTPUT_MISS, etc.)
total_submissions
Get the number of times the task has been resubmitted internally.
Must be called only after the task completes execution.
host
Get the address and port of the host on which the task ran. Must be called only after the task completes execution.
hostname
Get the name of the host on which the task ran. Must be called only after the task completes execution.
commit_time
Get the time at which this task was committed to a worker. Must be called only after the task completes execution.
submit_time
Get the time at which this task was submitted.
Must be called only after the task completes execution.
finish_time
Get the time at which this task was finished.
Must be called only after the task completes execution.
time_app_delay
Get the time spent in upper-level application (outside of work_queue_wait).
Must be called only after the task completes execution.
send_input_start
Get the time at which the task started to transfer input files.
Must be called only after the task completes execution.
send_input_finish
Get the time at which the task finished transferring input files.
Must be called only after the task completes execution.
execute_cmd_start
The time at which the task began.
Must be called only after the task completes execution.
execute_cmd_finish
Get the time at which the task finished (discovered by the manager).
Must be called only after the task completes execution.
receive_output_start
Get the time at which the task started to transfer output files.
Must be called only after the task completes execution.
receive_output_finish
Get the time at which the task finished transferring output files.
Must be called only after the task completes execution.
total_bytes_received
Get the number of bytes received since task started receiving input data.
Must be called only after the task completes execution.
total_bytes_sent
Get the number of bytes sent since task started sending input data.
Must be called only after the task completes execution.
total_bytes_transferred
Get the number of bytes transferred since task started transferring input data.
Must be called only after the task completes execution.
total_transfer_time
Get the time comsumed in microseconds for transferring total_bytes_transferred.
Must be called only after the task completes execution.
cmd_execution_time
Get the time spent in microseconds for executing the command on the worker.
Must be called only after the task completes execution.
total_cmd_execution_time
Get the time spent in microseconds for executing the command on any worker.
Must be called only after the task completes execution.
resources_measured
Get the resources measured when monitoring is enabled.
Must be called only after the task completes execution.
start: microseconds at the start of execution, since epoch.
$t->resources_measured{start};
end: microseconds at the end of execution, since epoch.
$t->resources_measured{end};
wall_time: microseconds spent during execution
$t->resources_measured{wall_time};
cpu_time: user + system time of the execution
$t->resources_measured{cpu_time};
cores: peak number of cores used
$t->resources_measured{cores};
cores_avg: number of cores computed as cpu_time/wall_time
$t->resources_measured{cores_avg};
max_concurrent_processes: the maximum number of processes running concurrently
$t->resources_measured{max_concurrent_processes};
total_processes: count of all of the processes created
$t->resources_measured{total_processes};
virtual_memory: maximum virtual memory across all processes
$t->resources_measured{virtual_memory};
resident_memory: maximum resident size across all processes
$t->resources_measured{memory};
swap_memory: maximum swap usage across all processes
$t->resources_measured{swap_memory};
bytes_read: number of bytes read from disk
$t->resources_measured{bytes_read};
bytes_written: number of bytes written to disk
$t->resources_measured{bytes_written};
bytes_received: number of bytes read from the network
$t->resources_measured{bytes_received};
bytes_sent: number of bytes written to the network
$t->resources_measured{bytes_sent};
bandwidth: maximum network bits/s (average over one minute)
$t->resources_measured{bandwidth};
total_files: total maximum number of files and directories of all the working directories in the tree
$t->resources_measured{total_files};
disk: size in MB of all working directories in the tree
$t->resources_measured{disk};
resources_requested
Get the resources requested by the task. See @resources_measured for possible fields.
Must be called only after the task completes execution.
resources_allocated
Get the resources allocatet to the task in its latest attempt. See @resources_measured for possible fields.
Hey! The above document had some coding errors, which are explained below:
'=item' outside of any '=over'
'=item' outside of any '=over'
=cut found outside a pod block. Skipping to next block.