resource_monitor - monitors the cpu, memory, io, and disk usage of a tree of processes.


resource_monitor [options] -- command [command-options] resource_monitorv [options] -- command [command-options]


resource_monitor is a tool to monitor the computational resources used by the process created by the command given as an argument, and all its descendants. The monitor works 'indirectly', that is, by observing how the environment changed while a process was running, therefore all the information reported should be considered just as an estimate (this is in contrast with direct methods, such as ptrace). It has been tested in Linux, FreeBSD, and Darwin, and can be used automatically by makeflow and work queue applications. Additionally, the user can specify maximum resource limits in the form of a file, or a string given at the command line. If one of the resources goes over the limit specified, then the monitor terminates the task, and reports which resource went over the respective limits. In systems that support it, resource_monitor wraps some libc functions to obtain a better estimate of the resources used. In contrast, resource_monitorv disables this wrapping, which means, among others, that it can only monitor the root process, but not its descendants. Currently, the monitor does not support interactive applications. That is, if a process issues a read call from standard input, and standard input has not been redirected, then the tree process is terminated. This is likely to change in future versions of the tool. resource_monitor generates up to three log files: a summary file encoded as json with the maximum values of resource used, a time-series that shows the resources used at given time intervals, and a list of files that were opened during execution. The summary file has the following fields:
command:                   [the command line given as an argument]
start:                     [microseconds at the start of execution, since the epoch, int]
end:                       [microseconds at the end of execution, since the epoch,   int]
exit_type:                 [one of normal, signal or limit,                       string]
signal:                    [number of the signal that terminated the process.
                            Only present if exit_type is signal                      int]
cores:                     [number of cores. Computed as cpu_time/wall_time        float]
limits_exceeded:           [resources over the limit. Only present if
                            exit_type is limit,                                   string]
exit_status:               [final status of the parent process,                      int]
max_concurrent_processes:  [the maximum number of processes running concurrently,    int]
total_processes:           [count of all of the processes created,                   int]
wall_time:                 [microseconds spent during execution, end - start,        int]
cpu_time:                  [user + system time of the execution, in microseconds,    int]
virtual_memory:            [maximum virtual memory across all processes, in MB,      int]
memory:                    [maximum resident size across all processes, in MB,       int]
swap_memory:               [maximum swap usage across all processes, in MB,          int]
bytes_read:                [amount of data read from disk, in MB,                    int]
bytes_written:             [amount of data written to disk, in MB,                   int]
total_files:               [total maximum number of files and directories of
                            all the working directories in the tree,                 int]
disk:                      [size in MB of all working directories in the tree,       int]
The time-series log has a row per time sample. For each row, the columns have the following meaning:
wall_clock                [the sample time, since the epoch, in microseconds,      int]
cpu_time                  [accumulated user + kernel time, in microseconds,        int]
concurrent                [concurrent processes at the time of the sample,         int]
virtual                   [current virtual memory size, in MB,                     int]
resident                  [current resident memory size, in MB,                    int]
swap                      [current swap usage, in MB,                              int]
bytes_read                [accumulated number of bytes read,                       int]
bytes_written             [accumulated number of bytes written,                    int]
files                     [current number of files and directories, across all
                           working directories in the tree,                        int]
disk                      [current size of working directories in the tree, in MB  int]


-d,--debug <subsystem>
Enable debugging for this subsystem.
-o,--debug-file <file>
Write debugging output to this file. By default, debugging is sent to stderr (":stderr"). You may specify logs be sent to stdout (":stdout"), to the system syslog (":syslog"), or to the systemd journal (":journal").
-i,--interval <n>
Interval between observations, in seconds (default=1).
-c,--sh <str>
Read command line from , and execute as '/bin/sh -c '.
-l,--limits-file <file>
Use maxfile with list of var: value pairs for resource limits.
-L,--limits <string>
String of the form "var: value, var: value\ to specify resource limits. (Could be specified multiple times.)
-f, --child-in-foreground Keep the monitored process in foreground (for interactive use).
-O,--with-output-files <template>
Specify template for log files (default=resource-pid-).
--with-time-series Write resource time series to