Files
Pulse/docs/monitoring/ADAPTIVE_POLLING.md
rcourtman 0ca6001bad docs: update documentation after sensor proxy deprecation
Update docs to reflect the simplified temperature monitoring architecture:
- Remove references to pulse-sensor-proxy throughout
- Update TEMPERATURE_MONITORING.md to focus on unified agent approach
- Update CONFIGURATION.md, DEPLOYMENT_MODELS.md, FAQ.md
- Remove SECURITY_CHANGELOG.md (proxy-specific security notes)
- Clarify current recommended setup in various guides
2026-01-21 12:00:59 +00:00

3.6 KiB

📉 Adaptive Polling

Pulse uses an adaptive scheduler to optimize polling based on instance health and activity.

🧠 Architecture

  • Scheduler: Calculates intervals based on health/staleness.
  • Priority Queue: Min-heap keyed by NextRun.
  • Circuit Breaker: Prevents hot loops on failing instances using success/failure counters.
  • Backoff: Exponential retry delays (5s min to 5m max).
  • Worker Pool: One worker per configured instance (PVE/PBS/PMG), capped at 10.
  • Global Concurrency Cap: At most 2 polling cycles run at once to avoid resource spikes.

🔬 Implementation Details (Developer Info)

Staleness Scoring

The AdaptiveScheduler (internal/monitoring/scheduler.go) relies on the StalenessTracker to compute a StalenessScore (0.0 to 1.0) based on how long it has been since the last successful poll.

  • 0.0 = fresh (recent success)
  • 1.0 = very stale or never succeeded

The staleness score is normalized against AdaptivePollingMaxInterval (default 5 minutes).

The scheduler applies Exponential Smoothing (alpha 0.6) and a small jitter (5%) to avoid oscillation.

Additional influences:

  • Error penalty: retries tighten the interval based on the error count.
  • Queue stretch: large queues gently stretch intervals to avoid overload.

Circuit Breaker Recovery

The circuitBreaker (internal/monitoring/circuit_breaker.go) follows a standard state machine:

  1. Closing the Circuit: One successful poll moves Half-OpenClosed and resets failure count.
  2. Backoff Calculation: Retries use exponential backoff starting at 5s (multiplier 2, jitter 0.2) capped at 5m.
  3. Transient vs. Permanent:
    • Transient errors (retryable) are retried up to 5 times before moving to the Dead Letter Queue.
    • Permanent errors move directly to the Dead Letter Queue.

Note: When AdaptivePollingMaxInterval is set to 15 seconds or less, the retry backoff is shortened (750ms initial, 4s max) to keep fast feedback loops during tight polling windows.

⚙️ Configuration

Adaptive polling is disabled by default.

UI

There is currently no dedicated UI for adaptive polling in v5.

Environment Variables

Variable Default Description
ADAPTIVE_POLLING_ENABLED false Enable/disable.
ADAPTIVE_POLLING_BASE_INTERVAL 10s Healthy poll rate.
ADAPTIVE_POLLING_MIN_INTERVAL 5s Active/busy rate.
ADAPTIVE_POLLING_MAX_INTERVAL 5m Idle/backoff rate.

system.json

You can also set adaptivePollingEnabled (and related interval fields) in system.json and restart Pulse.

📊 Metrics

Exposed at :9091/metrics.

Metric Type Description
pulse_monitor_poll_total Counter Total poll attempts.
pulse_monitor_poll_duration_seconds Histogram Poll latency.
pulse_monitor_poll_staleness_seconds Gauge Age since last success.
pulse_monitor_poll_queue_depth Gauge Queue size.
pulse_monitor_poll_errors_total Counter Error counts by category.
pulse_scheduler_queue_due_soon Gauge Tasks due in the next 12 seconds.

Circuit Breaker

State Trigger Recovery
Closed Normal operation.
Open ≥3 failures. Backoff (max 5m).
Half-open Retry window elapsed. Success = Closed; Fail = Open.

Dead Letter Queue: After 5 transient or 1 permanent failure, tasks move to DLQ (30m retry).

🩺 Health API

GET /api/monitoring/scheduler/health (Auth required)

Returns:

  • Queue depth & breakdown.
  • Dead-letter tasks.
  • Circuit breaker states.
  • Per-instance staleness.