mirror of https://github.com/rcourtman/Pulse.git synced 2026-02-18 00:17:39 +01:00

Files

rcourtman 2b48b0a459 feat: add --kube-include-all-deployments flag for Kubernetes agent

Adds IncludeAllDeployments option to show all deployments, not just
problem ones (where replicas don't match desired). This provides parity
with the existing --kube-include-all-pods flag.

- Add IncludeAllDeployments to kubernetesagent.Config
- Add --kube-include-all-deployments flag and PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS env var
- Update collectDeployments to respect the new flag
- Add test for IncludeAllDeployments functionality
- Update UNIFIED_AGENT.md documentation

Addresses feedback from PR #855

2025-12-18 20:58:30 +00:00

1.8 KiB

Raw Blame History

📊 Prometheus Metrics

Pulse exposes metrics at /metrics (default port 9091).

Example scrape target:

http://<pulse-host>:9091/metrics

This listener is separate from the main UI/API port (7655). In Docker and Kubernetes you must expose 9091 explicitly if you want to scrape it from outside the container/pod.

🌐 HTTP Ingress

Metric	Type	Description
`pulse_http_request_duration_seconds`	Histogram	Latency buckets by `method`, `route`, `status`.
`pulse_http_requests_total`	Counter	Total requests.
`pulse_http_request_errors_total`	Counter	4xx/5xx errors.

🔄 Polling & Nodes

Metric	Type	Description
`pulse_monitor_node_poll_duration_seconds`	Histogram	Per-node poll latency.
`pulse_monitor_node_poll_total`	Counter	Success/error counts per node.
`pulse_monitor_node_poll_staleness_seconds`	Gauge	Seconds since last success.
`pulse_monitor_poll_queue_depth`	Gauge	Global queue depth.

🧠 Scheduler Health

Metric	Type	Description
`pulse_scheduler_queue_depth`	Gauge	Queue depth per instance type.
`pulse_scheduler_dead_letter_depth`	Gauge	DLQ depth per instance.
`pulse_scheduler_breaker_state`	Gauge	`0`=Closed, `1`=Half-Open, `2`=Open.

⚡ Diagnostics Cache

Metric	Type	Description
`pulse_diagnostics_cache_hits_total`	Counter	Cache hits.
`pulse_diagnostics_refresh_duration_seconds`	Histogram	Refresh latency.

🚨 Alerting Examples

High Error Rate: rate(pulse_http_request_errors_total[5m]) > 0.05
Stale Node: pulse_monitor_node_poll_staleness_seconds > 300
Breaker Open: pulse_scheduler_breaker_state == 2

1.8 KiB Raw Blame History