The resource_monitor is a tool to monitor the computational resources
of complex, multi-process applications. This is an essential capability for executing
large scale applications reliably in clusters, clouds, and grids.
It works on Linux, FreeBSD, and OSX, and can be used as a standalone tool,
or automatically with distributed systems like Makeflow and Work Queue.
When invoked, the resource monitor tracks all of the processes and threads
created by the subject program, and monitors their individual resource and I/O behavior.
It generates up to three report files: a summary file with the maximum values
of resource used, a time-series that shows the resources used at given time
intervals, and a list of files that were opened during execution, together with
the count of read and write operations.
Additionally, the monitor can be used as a watchdog. Maximum resource limits
can be specified, and if one of the resources goes over the limit, then the
monitor terminates the task, including a report of the resource that was above
the limit.
The resource_monitor_visualizer creates a series of webpages
summarizing the logs produced by the resource_monitor. It generates
histograms for each resource and each group. For example, the histogram to the
right shows the distribution of cpu usage of a workflow with 5,000 tasks. To
use the resource_monitor_visualizer specify the location of the
resource logs and the location for the output.
For More Information
Resource Monitor User's Manual
Download the Resource Monitor
Getting Help with the Resource Monitor
Publications
(Showing papers with tag resource_monitor. See all papers instead.)
|
Tim Shaffer, Zhuozhao Li, Ben Tovar, Yadu Babuji, TJ Dasso, Zoe Surma, Kyle Chard, Ian Foster, and Douglas Thain, Lightweight Function Monitors for Fine-Grained Management in Large Scale Python Applications, IEEE International Parallel and Distributed Processing Symposium, May, 2021. DOI: 10.1109/IPDPS49936.2021.00088
|
|
Benjamin Tovar, Rafael Ferreira da Silva, Gideon Juve, Ewa Deelman, William Allcock, Douglas Thain, and Miron Livny, A Job Sizing Strategy for High-Throughput Scientific Workflows, IEEE Transactions on Parallel and Distributed Systems, 29(2), pages 240-253, February, 2018. DOI: 10.1109/TPDS.2017.2762310
|
|
Gideon Juve, Benjamin Tovar, Rafael Ferreira da Silva, Dariusz Krol, Douglas Thain, Ewa Deelman, William Allcock, and Miron Livny, Practical Resource Monitoring for Robust High Throughput Computing, Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications at IEEE Cluster Computing, September, 2015.
|
|