Network Monitoring

From CompBio
Jump to: navigation, search

Goals

Monitoring should allow interested parties to make intelligent, pro-active decisions regarding infrastructure.

Monitoring should also provide a very detailed, real-time vantage point of use and health.

Perspectives

While some may care whether a specific piece of hardware has failed, others may only care how long their job is going to run. While these are related at a high level, they aren't similar in details. As we design our clusters to be resilient to component failure, any one or two failures should not impact the average performance of a cluster.

Performance monitoring and data trending gives engineers and management valuable insight into a broad range of system level issues. From job performance, did changes in code have an adverse impact to performance? Are clusters sized appropriately to expectations? Have more jobs been submitted to the same resource pool?

Tools

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox