Key Network Metrics for Reliability
For senior engineers, network monitoring goes beyond checking if a host is up. We focus on the USE Method (Utilization, Saturation, Errors) for resources and the RED Method (Rate, Errors, Duration) for services.
- Throughput: Bytes sent/received (Bandwidth utilization).
- Latency: Round Trip Time (RTT), connection establishment time.
- Packet Loss & Errors: Retransmissions, dropped packets, CRC errors.
- Saturation: Conntrack table usage, file descriptor limits.
Prometheus & Exporters
Prometheus is the industry standard for metric collection. For network data, we primarily rely on:
1. Node Exporter
Exposes hardware and OS metrics exposed by *NIX kernels.