Files
Pulse/docs/TEMPERATURE_MONITORING.md
2025-12-18 21:51:25 +00:00

776 lines
29 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 🌡️ Temperature Monitoring
Monitor real-time CPU and NVMe temperatures for your Proxmox nodes.
> **Deprecation notice (v5):** `pulse-sensor-proxy` is deprecated and not recommended for new deployments. Temperature monitoring should be done via the unified agent (`pulse-agent --enable-proxmox`). Existing proxy installs can continue during the migration window, but plan to migrate to the agent. In v5, legacy sensor-proxy endpoints are disabled by default unless `PULSE_ENABLE_SENSOR_PROXY=true` is set on the Pulse server.
## Recommended: Pulse Agent
For new installations, prefer the unified agent on Proxmox hosts. It reads sensors locally and reports temperatures to Pulse without SSH keys or proxy wiring.
```bash
curl -fsSL http://<pulse-ip>:7655/install.sh | \
bash -s -- --url http://<pulse-ip>:7655 --token <api-token> --enable-proxmox
```
If you use the agent method, the rest of this document (sensor proxy) is optional. See `docs/security/TEMPERATURE_MONITORING.md` for the security model overview.
## Migration: pulse-sensor-proxy → pulse-agent
If you already deployed `pulse-sensor-proxy`, migrate to the agent to avoid proxy maintenance and remove SSH-from-container complexity:
1. Install `lm-sensors` on each Proxmox host (if not already): `apt install lm-sensors && sensors-detect`
2. Install the agent on each Proxmox host:
```bash
curl -fsSL http://<pulse-ip>:7655/install.sh | \
bash -s -- --url http://<pulse-ip>:7655 --token <api-token> --enable-proxmox
```
3. Confirm temperatures are updating in the dashboard.
4. Disable the proxy service on hosts where it was installed:
```bash
sudo systemctl disable --now pulse-sensor-proxy
```
5. If your Pulse container had a proxy socket mount, remove the mount and remove `PULSE_SENSOR_PROXY_SOCKET` from the Pulse `.env` (for example `/data/.env` in Docker) before restarting Pulse.
## 🚀 Quick Start
### 1. Install the agent on Proxmox hosts
Install the unified agent on each Proxmox host with Proxmox integration enabled (example in the section above).
### 2. Enable temperature monitoring (optional)
Go to **Settings → Proxmox → [Node] → Advanced Monitoring** and enable "Temperature monitoring" if you want to collect temperatures for that node.
## Deprecated: pulse-sensor-proxy (existing installs only)
This section is retained for existing installations during the migration window.
If you are starting fresh on Pulse v5, do not deploy `pulse-sensor-proxy`. Use the agent method above.
If you already have the proxy deployed:
- Keep it running while you migrate to `pulse-agent --enable-proxmox`.
- Expect future removal in a major release. Do not treat the proxy as a long-term solution.
## 📦 Docker Setup (Manual)
If running Pulse in Docker, you must install the proxy on the host and share the socket.
1. **Install Proxy on Host**:
```bash
curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install-sensor-proxy.sh | \
sudo bash -s -- --standalone --pulse-server http://<pulse-ip>:7655
```
2. **Update `docker-compose.yml`**:
Add the socket volume to your Pulse service:
```yaml
volumes:
- /mnt/pulse-proxy:/run/pulse-sensor-proxy:ro
```
> **Note**: The standalone installer creates the socket at `/mnt/pulse-proxy` on the host. Map it to `/run/pulse-sensor-proxy` inside the container.
3. **Restart Pulse**: `docker compose up -d`
## 🌐 Multi-Server Proxmox Setup
If you have Pulse running on **Server A** and want to monitor temperatures on **Server B** (a separate Proxmox host without Pulse):
1. **Run Installer on Server B** (the remote Proxmox host):
```bash
curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install-sensor-proxy.sh | \
sudo bash -s -- --ctid <PULSE_CONTAINER_ID> --pulse-server http://<pulse-ip>:7655
```
Replace `<PULSE_CONTAINER_ID>` with the LXC container ID where Pulse runs on Server A (e.g., `100`).
2. The installer will detect that the container doesn't exist locally and install in **host monitoring only** mode:
```
[WARN] Container 100 does not exist on this node
[WARN] Will install sensor-proxy for host temperature monitoring only
```
3. **Verify**: `systemctl status pulse-sensor-proxy`
> **Note**: The `--standalone --http-mode` flags shown in the Pulse UI quick-setup are for Docker deployments, not bare Proxmox hosts. For multi-server Proxmox setups, use the `--ctid` approach above.
## 🔧 Troubleshooting
| Issue | Solution |
| :--- | :--- |
| **No Data** | Check **Settings → Diagnostics** (Temperature Proxy section). |
| **Proxy Unreachable** | Ensure port `8443` is open on the remote node. |
| **"Permission Denied"** | Re-run the installer to fix permissions or SSH keys. |
| **LXC Issues** | Ensure the container has the bind mount: `lxc.mount.entry: /run/pulse-sensor-proxy ...` |
### Check Proxy Status
On the Proxmox host:
```bash
systemctl status pulse-sensor-proxy
```
### View Logs
```bash
journalctl -u pulse-sensor-proxy -f
```
## 🧠 How It Works
1. **Pulse Sensor Proxy**: A lightweight service runs on the Proxmox host.
2. **Secure Access**: It reads sensors (via `lm-sensors`) and exposes them securely.
3. **Transport**:
* **Local**: Uses a Unix socket (`/run/pulse-sensor-proxy`) for zero-latency, secure access.
* **Remote**: Uses mutual TLS over HTTPS (port 8443).
4. **No SSH Keys**: Pulse containers no longer need SSH keys to read temperatures.
---
## 🔧 Advanced Configuration
#### Manual Configuration (No Script)
If you can't run the installer script, create the configuration manually:
**1. Download binary:**
```bash
curl -L https://github.com/rcourtman/Pulse/releases/latest/download/pulse-sensor-proxy-linux-amd64 \
-o /tmp/pulse-sensor-proxy
install -D -m 0755 /tmp/pulse-sensor-proxy /usr/local/bin/pulse-sensor-proxy
```
**2. Create service user:**
```bash
useradd --system --user-group --no-create-home --shell /usr/sbin/nologin pulse-sensor-proxy
usermod -aG www-data pulse-sensor-proxy # For pvecm access
```
**3. Create directories:**
```bash
install -d -o pulse-sensor-proxy -g pulse-sensor-proxy -m 0750 /var/lib/pulse-sensor-proxy
install -d -o pulse-sensor-proxy -g pulse-sensor-proxy -m 0700 /var/lib/pulse-sensor-proxy/ssh
install -d -o pulse-sensor-proxy -g pulse-sensor-proxy -m 0755 /etc/pulse-sensor-proxy
```
**4. Create config (optional, for Docker):**
```yaml
# /etc/pulse-sensor-proxy/config.yaml
allowed_nodes_file: /etc/pulse-sensor-proxy/allowed_nodes.yaml
allowed_peer_uids: [1000] # Docker container UID
allow_idmapped_root: true
allowed_idmap_users:
- root
```
Allowed nodes live in `/etc/pulse-sensor-proxy/allowed_nodes.yaml`; change them via `pulse-sensor-proxy config set-allowed-nodes` so the proxy can lock and validate the file safely. Control-plane settings are added automatically when you register via Pulse, but you can supply them manually if you cannot reach the API (`pulse_control_plane.url`, `.token_file`, `.refresh_interval`).
**5. Install systemd service:**
```bash
# Download from: https://github.com/rcourtman/Pulse/releases/latest/download/install-sensor-proxy.sh
# Extract the systemd unit from the installer (ExecStartPre/ExecStart typically uses /usr/local/bin/pulse-sensor-proxy)
systemctl daemon-reload
systemctl enable --now pulse-sensor-proxy
```
**6. Verify:**
```bash
systemctl status pulse-sensor-proxy
ls -l /run/pulse-sensor-proxy/pulse-sensor-proxy.sock
```
#### Configuration File Format
The proxy reads `/etc/pulse-sensor-proxy/config.yaml` plus an allow-list in `/etc/pulse-sensor-proxy/allowed_nodes.yaml`:
```yaml
allowed_source_subnets:
- 192.168.1.0/24
- 10.0.0.0/8
# Capability-based access control (legacy UID/GID lists still work)
allowed_peers:
- uid: 0
capabilities: [read, write, admin]
- uid: 1000
capabilities: [read]
allowed_peer_uids: []
allowed_peer_gids: []
allow_idmapped_root: true
allowed_idmap_users:
- root
log_level: info
metrics_address: default
read_timeout: 5s
write_timeout: 10s
max_ssh_output_bytes: 1048576
require_proxmox_hostkeys: false
# Allow list persistence (managed by installer/control-plane/CLI)
allowed_nodes_file: /etc/pulse-sensor-proxy/allowed_nodes.yaml
strict_node_validation: false
# Rate limiting (per calling UID)
rate_limit:
per_peer_interval_ms: 1000
per_peer_burst: 5
# HTTPS mode (for remote nodes)
http_enabled: false
http_listen_addr: ":8443"
http_tls_cert: /etc/pulse-sensor-proxy/tls/server.crt
http_tls_key: /etc/pulse-sensor-proxy/tls/server.key
http_auth_token: "" # Populated during registration
# Control-plane sync (keeps allowed_nodes.yaml updated)
pulse_control_plane:
url: https://pulse.example.com:7655
token_file: /etc/pulse-sensor-proxy/.pulse-control-token
refresh_interval: 60
insecure_skip_verify: false
```
`allowed_nodes.yaml` is the source of truth for valid nodes. Avoid editing it directly—use `pulse-sensor-proxy config set-allowed-nodes` so the proxy can lock, dedupe, and write atomically. `allowed_peers` scopes socket access; legacy UID/GID lists remain for backward compatibility and imply full capabilities.
**Environment Variable Overrides:**
Config values can also be set via environment variables (useful for containerized proxy deployments):
```bash
# Add allowed subnets (comma-separated, appends to config file values)
PULSE_SENSOR_PROXY_ALLOWED_SUBNETS=192.168.1.0/24,10.0.0.0/8
# Allow/disallow ID-mapped root (overrides config file)
PULSE_SENSOR_PROXY_ALLOW_IDMAPPED_ROOT=true
# HTTP listener controls
PULSE_SENSOR_PROXY_HTTP_ENABLED=true
PULSE_SENSOR_PROXY_HTTP_ADDR=":8443"
PULSE_SENSOR_PROXY_HTTP_TLS_CERT=/etc/pulse-sensor-proxy/tls/server.crt
PULSE_SENSOR_PROXY_HTTP_TLS_KEY=/etc/pulse-sensor-proxy/tls/server.key
PULSE_SENSOR_PROXY_HTTP_AUTH_TOKEN="$(cat /etc/pulse-sensor-proxy/.http-auth-token)"
```
Additional overrides include `PULSE_SENSOR_PROXY_ALLOWED_PEER_UIDS`, `PULSE_SENSOR_PROXY_ALLOWED_PEER_GIDS`, `PULSE_SENSOR_PROXY_ALLOWED_NODES`, `PULSE_SENSOR_PROXY_READ_TIMEOUT`, `PULSE_SENSOR_PROXY_WRITE_TIMEOUT`, `PULSE_SENSOR_PROXY_METRICS_ADDR`, and `PULSE_SENSOR_PROXY_STRICT_NODE_VALIDATION`.
Example systemd override:
```ini
# /etc/systemd/system/pulse-sensor-proxy.service.d/override.conf
[Service]
Environment="PULSE_SENSOR_PROXY_ALLOWED_SUBNETS=192.168.1.0/24"
```
**Note:** Socket path, SSH key directory, and audit log path are configured via command-line flags (see main.go), not the YAML config file.
#### Re-running After Changes
The installer is idempotent and safe to re-run:
```bash
# After adding a new Proxmox node to cluster
bash install-sensor-proxy.sh --standalone --pulse-server http://pulse:7655 --quiet
# Verify installation
systemctl status pulse-sensor-proxy
```
### Legacy SSH Security Concerns
SSH-based temperature collection from inside containers is unsafe. Pulse blocks this by default for container deployments.
In legacy/non-container setups where you intentionally use SSH, the main risks are:
- Compromised container = exposed SSH keys
- Even with forced commands, keys could be extracted
- Required manual hardening (key rotation, IP restrictions, etc.)
### Hardening Recommendations (Legacy/Native Installs Only)
#### 1. Key Rotation
Rotate SSH keys periodically (e.g., every 90 days):
```bash
# On Pulse server
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519_new -N ""
# Update all nodes' authorized_keys
# Test connectivity
ssh -i ~/.ssh/id_ed25519_new node "sensors -j"
# Replace old key
mv ~/.ssh/id_ed25519_new ~/.ssh/id_ed25519
```
#### 2. Secret Mounts (Docker)
Mount SSH keys from secure volumes:
```yaml
version: '3'
services:
pulse:
image: rcourtman/pulse:latest
volumes:
- pulse-ssh-keys:/home/pulse/.ssh:ro # Read-only
- pulse-data:/data
volumes:
pulse-ssh-keys:
driver: local
driver_opts:
type: tmpfs # Memory-only, not persisted
device: tmpfs
```
#### 3. Monitoring & Alerts
Enable SSH audit logging on Proxmox nodes:
```bash
# Install auditd
apt-get install auditd
# Watch SSH access
auditctl -w /root/.ssh -p wa -k ssh_access
# Monitor for unexpected commands
tail -f /var/log/audit/audit.log | grep ssh
```
#### 4. IP Restrictions
Limit SSH access to your Pulse server IP in `/etc/ssh/sshd_config`:
```ssh
Match User root Address 192.168.1.100
ForceCommand sensors -j
PermitOpen none
AllowAgentForwarding no
AllowTcpForwarding no
```
### Verifying Proxy Installation
To check if your deployment is using the secure proxy:
```bash
# On Proxmox host - check proxy service
systemctl status pulse-sensor-proxy
# Check if socket exists
ls -l /run/pulse-sensor-proxy/pulse-sensor-proxy.sock
# View proxy logs
journalctl -u pulse-sensor-proxy -f
```
Forward these logs off-host for retention by following
[operations/SENSOR_PROXY_LOGS.md](operations/SENSOR_PROXY_LOGS.md).
In the Pulse container, check the logs at startup:
```bash
# Should see: "Temperature proxy detected - using secure host-side bridge"
journalctl -u pulse | grep -i proxy
```
### Disabling Temperature Monitoring
To remove SSH access:
```bash
# On each Proxmox node
sed -i '/pulse@/d' /root/.ssh/authorized_keys
# Or remove just the forced command entry
sed -i '/command="sensors -j"/d' /root/.ssh/authorized_keys
```
Temperature data will stop appearing in the dashboard after the next polling cycle.
## Operations & Troubleshooting
### Managing the Proxy Service
The pulse-sensor-proxy service runs on the Proxmox host (outside the container).
**Service Management:**
```bash
# Check service status
systemctl status pulse-sensor-proxy
# Restart the proxy
systemctl restart pulse-sensor-proxy
# Stop the proxy (disables temperature monitoring)
systemctl stop pulse-sensor-proxy
# Start the proxy
systemctl start pulse-sensor-proxy
# Enable proxy to start on boot
systemctl enable pulse-sensor-proxy
# Disable proxy autostart
systemctl disable pulse-sensor-proxy
```
### Log Locations
**Proxy Logs (on Proxmox host):**
```bash
# Follow proxy logs in real-time
journalctl -u pulse-sensor-proxy -f
# View last 50 lines
journalctl -u pulse-sensor-proxy -n 50
# View logs since last boot
journalctl -u pulse-sensor-proxy -b
# View logs with timestamps
journalctl -u pulse-sensor-proxy --since "1 hour ago"
```
**Pulse Logs (in container):**
```bash
# Check if proxy is being used
journalctl -u pulse | grep -i "proxy\|temperature"
# Should see: "Temperature proxy detected - using secure host-side bridge"
```
### SSH Key Rotation
Rotate SSH keys periodically for security (recommended every 90 days).
**Automated Rotation (Recommended):**
The `pulse-proxy-rotate-keys.sh` helper script handles rotation safely with staging, verification, and rollback support:
```bash
# 1. Dry-run first (recommended)
curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/pulse-proxy-rotate-keys.sh | \
sudo bash -s -- --dry-run
# 2. Perform rotation
curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/pulse-proxy-rotate-keys.sh | sudo bash
```
**What the script does:**
- Generates new Ed25519 keypair in staging directory
- Pushes new key to all cluster nodes via proxy RPC
- Verifies SSH connectivity with new key on each node
- Atomically swaps keys (current → backup, staging → active)
- Preserves old keys for rollback
**If rotation fails, rollback:**
```bash
curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/pulse-proxy-rotate-keys.sh | \
sudo bash -s -- --rollback
```
**Manual Rotation (Fallback):**
If the automated script fails or is unavailable:
```bash
# 1. On Proxmox host, backup old keys
cd /var/lib/pulse-sensor-proxy/ssh/
cp id_ed25519 id_ed25519.backup
cp id_ed25519.pub id_ed25519.pub.backup
# 2. Generate new keypair
ssh-keygen -t ed25519 -f id_ed25519 -N "" -C "pulse-sensor-proxy-rotated"
# 3. Re-run setup to push keys to cluster
curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install-sensor-proxy.sh | \
bash -s -- --ctid <your-container-id>
# 4. Verify temperature data still works in Pulse UI
```
### Automatic Cleanup When Nodes Are Removed
SSH keys are automatically removed when you delete a node from Pulse:
1. **When you remove a node** in Pulse (**Settings → Proxmox**), Pulse signals the temperature proxy
2. **The proxy creates a cleanup request** file at `/var/lib/pulse-sensor-proxy/cleanup-request.json`
3. **A systemd path unit detects the request** and triggers the cleanup service
4. **The cleanup script automatically:**
- SSHs to the specified node (or localhost if it's local)
- Removes the SSH key entries (`# pulse-managed-key` and `# pulse-proxy-key`)
- Logs the cleanup action via syslog
**Automatic cleanup works for:**
- ✅ **Cluster nodes** - Full automatic cleanup (Proxmox clusters have unrestricted passwordless SSH)
- ⚠️ **Standalone nodes** - Cannot auto-cleanup due to forced command security (see below)
**Standalone Node Limitation:**
Standalone nodes use forced commands (`command="sensors -j"`) for security. This same restriction prevents the cleanup script from running `sed` to remove keys. This is a **security feature, not a bug** - adding a workaround would defeat the forced command protection.
For standalone nodes:
- Keys remain after removal (but they're **read-only** - only `sensors -j` access)
- **Low security risk** - no shell access, no write access, no port forwarding
- **Auto-cleanup on re-add** - Setup script removes old keys when node is re-added
- **Manual cleanup if needed:**
```bash
ssh root@standalone-node "sed -i '/# pulse-proxy-key$/d' /root/.ssh/authorized_keys"
```
**Monitoring Cleanup:**
```bash
# Watch cleanup operations in real-time
journalctl -u pulse-sensor-cleanup -f
# View cleanup history
journalctl -u pulse-sensor-cleanup --since "1 week ago"
# Check if cleanup system is active
systemctl status pulse-sensor-cleanup.path
```
**Manual Cleanup (if needed):**
If automatic cleanup fails or you need to manually revoke access:
```bash
# On the node being removed, remove all Pulse SSH keys
ssh root@old-node "sed -i -e '/# pulse-managed-key\$/d' -e '/# pulse-proxy-key\$/d' /root/.ssh/authorized_keys"
# Or remove them locally
sed -i -e '/# pulse-managed-key$/d' -e '/# pulse-proxy-key$/d' /root/.ssh/authorized_keys
# No restart needed - proxy will fail gracefully for that node
# Temperature monitoring will continue for remaining nodes
```
### Failure Modes
**Proxy Not Running:**
- Symptom: No temperature data in Pulse UI
- Check: `systemctl status pulse-sensor-proxy` on Proxmox host
- Fix: `systemctl start pulse-sensor-proxy`
**Socket Not Accessible in Container:**
- Symptom: Pulse logs show "Temperature proxy not available - using direct SSH"
- Check: `ls -l /run/pulse-sensor-proxy/pulse-sensor-proxy.sock` in container
- Fix: Verify bind mount in LXC config (`/etc/pve/lxc/<CTID>.conf`)
- Should have: `lxc.mount.entry: /run/pulse-sensor-proxy run/pulse-sensor-proxy none bind,create=dir 0 0`
**pvecm Not Available:**
- Symptom: Proxy fails to discover cluster nodes
- Cause: Pulse runs on non-Proxmox host
- Fallback: Use legacy direct SSH method (native installation)
**Pulse Running Off-Cluster:**
- Symptom: Proxy discovers local host but not remote cluster nodes
- Limitation: Proxy requires passwordless SSH between cluster nodes
- Solution: Ensure Proxmox host running Pulse has SSH access to all cluster nodes
**Unauthorized Connection Attempts:**
- Symptom: Proxy logs show "Unauthorized connection attempt"
- Cause: Process with non-root UID trying to access socket
- Normal: Only root (UID 0) or proxy's own user can access socket
- Check: Look for suspicious processes trying to access the socket
### Monitoring the Proxy
**Manual Monitoring (v1):**
The proxy service includes systemd restart-on-failure, which handles most issues automatically. For additional monitoring:
```bash
# Check proxy health
systemctl is-active pulse-sensor-proxy && echo "Proxy is running" || echo "Proxy is down"
# Monitor logs for errors
journalctl -u pulse-sensor-proxy --since "1 hour ago" | grep -i error
# Verify socket exists and is accessible
test -S /run/pulse-sensor-proxy/pulse-sensor-proxy.sock && echo "Socket OK" || echo "Socket missing"
```
**Alerting:**
- Rely on systemd's automatic restart (`Restart=on-failure`)
- Monitor via journalctl for persistent failures
- Check Pulse UI for missing temperature data
**Future:** Integration with pulse-watchdog is planned for automated health checks and alerting (see #528).
### Known Limitations
**Single Proxy = Single Point of Failure:**
- Each Proxmox host runs one pulse-sensor-proxy instance
- If the proxy service dies, temperature monitoring stops for all containers on that host
- This is acceptable for read-only telemetry, but be aware of the failure mode
- Systemd auto-restart (`Restart=on-failure`) mitigates most outages
- If multiple Pulse containers run on same host, they share the same proxy
**Sensors Output Parsing Brittleness:**
- Pulse depends on `sensors -j` JSON output format from lm-sensors
- Changes to sensor names, structure, or output format could break parsing
- Consider adding schema validation and instrumentation to detect issues early
- Monitor proxy logs for parsing errors: `journalctl -u pulse-sensor-proxy | grep -i "parse\|error"`
**Cluster Discovery Limitations:**
- Proxy uses `pvecm status` to discover cluster nodes (requires Proxmox IPC access)
- If Proxmox hardens IPC access or cluster topology changes unexpectedly, discovery may fail
- Standalone Proxmox nodes work but only monitor that single node
- Fallback: re-run the proxy installer script to reconfigure cluster access
**Rate Limiting & Scaling** (updated in commit 46b8b8d):
**What changed:** pulse-sensor-proxy now defaults to 1 request per second with a burst of 5 per calling UID. Earlier builds throttled after two calls every five seconds, which caused temperature tiles to flicker or fall back to `--` as soon as clusters reached three or more nodes.
**Symptoms of saturation:**
- Temperature widgets flicker between values and `--`, or entire node rows disappear after adding new hardware
- `Settings → System → Updates` shows no proxy restarts, yet scheduler health reports breaker openings for temperature pollers
- Proxy logs include `limiter.rejection` or `Rate limit exceeded` entries for the container UID
**Diagnose:**
1. Check scheduler health for temperature pollers:
```bash
curl -s http://localhost:7655/api/monitoring/scheduler/health \
| jq '.instances[] | select(.key | contains("temperature")) \
| {key, lastSuccess: .pollStatus.lastSuccess, breaker: .breaker.state, deadLetter: .deadLetter.present}'
```
Breakers that remain `open` or repeated dead letters indicate the proxy is rejecting calls.
2. Inspect limiter metrics on the host:
```bash
curl -s http://127.0.0.1:9127/metrics \
| grep -E 'pulse_proxy_limiter_(rejects|penalties)_total'
```
A rising counter confirms the limiter is backing off callers.
3. Review logs for throttling:
```bash
journalctl -u pulse-sensor-proxy -n 100 | grep -i "rate limit"
```
**Tuning guidance:** Add a `rate_limit` block to `/etc/pulse-sensor-proxy/config.yaml` (see `cmd/pulse-sensor-proxy/config.example.yaml`) when clusters grow beyond the defaults. Use the formula `per_peer_interval_ms = polling_interval_ms / node_count` and set `per_peer_burst ≥ node_count` to allow one full sweep per polling window.
| Deployment size | Nodes | 10 s poll interval → interval_ms | Suggested burst | Notes |
| --- | --- | --- | --- | --- |
| Small | 13 | 1000 (default) | 5 | Works for most single Proxmox hosts. |
| Medium | 410 | 500 | 10 | Halves wait time; keep burst ≥ node count. |
| Large | 1020 | 250 | 20 | Monitor CPU on proxy; consider staggering polls. |
| XL | 30+ | 100150 | 3050 | Only enable after validating proxy host capacity. |
**Security note:** Lower intervals increase throughput and reduce UI staleness, but they also allow untrusted callers to issue more RPCs per second. Keep `per_peer_interval_ms ≥ 100` in production and continue to rely on UID allow-lists plus audit logs when raising limits.
**SSH latency monitoring:**
- Monitor SSH latency metrics: `curl -s http://127.0.0.1:9127/metrics | grep pulse_proxy_ssh_latency`
**Requires Proxmox Cluster Membership:**
- Proxy requires passwordless root SSH between cluster nodes
- Standard for Proxmox clusters, but hardened environments may differ
- Alternative: Create dedicated service account with sudo access to `sensors`
**No Cross-Cluster Support:**
- Proxy only manages the cluster its host belongs to
- Cannot bridge temperature monitoring across multiple disconnected clusters
- Each cluster needs its own Pulse instance with its own proxy
### Common Issues
**Temperature Data Stops Appearing:**
1. Check proxy service: `systemctl status pulse-sensor-proxy`
2. Check proxy logs: `journalctl -u pulse-sensor-proxy -n 50`
3. Test SSH manually: `ssh root@node "sensors -j"`
4. Verify socket exists: `ls -l /run/pulse-sensor-proxy/pulse-sensor-proxy.sock`
**New Cluster Node Not Showing Temperatures:**
1. Ensure lm-sensors installed: `ssh root@new-node "sensors -j"`
2. Proxy auto-discovers on next poll (may take up to 1 minute)
3. Re-run the proxy installer script to configure SSH keys on the new node: `curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install-sensor-proxy.sh | bash -s -- --ctid <CTID>`
**Permission Denied Errors:**
1. Verify socket permissions: `ls -l /run/pulse-sensor-proxy/pulse-sensor-proxy.sock`
2. Should be: `srw-rw---- 1 root root`
3. Check Pulse runs as root in container: `pct exec <CTID> -- whoami`
**Proxy Service Won't Start:**
1. Check logs: `journalctl -u pulse-sensor-proxy -n 50`
2. Verify binary exists: `ls -l /usr/local/bin/pulse-sensor-proxy`
3. Test manually: `/usr/local/bin/pulse-sensor-proxy --version`
4. Check socket directory: `ls -ld /var/run`
## Configuration Management
The sensor proxy includes a built-in CLI for safe configuration management. It uses locking and atomic writes to prevent config corruption.
### Quick Reference
```bash
# Validate config files
pulse-sensor-proxy config validate
# Add nodes to allowed list
pulse-sensor-proxy config set-allowed-nodes --merge 192.168.0.1 --merge node1.local
# Replace entire allowed list
pulse-sensor-proxy config set-allowed-nodes --replace --merge 192.168.0.1
```
**Key benefits:**
- Atomic writes with file locking prevent corruption
- Automatic deduplication and normalization
- systemd validation prevents startup with bad config
- Installer uses CLI (no more shell/Python divergence)
**See also:**
- [Sensor Proxy Config Management Guide](operations/SENSOR_PROXY_CONFIG.md) - Complete runbook
- [Sensor Proxy CLI Reference](../cmd/pulse-sensor-proxy/README.md) - Full command documentation
## Control-Plane Sync & Migration
The sensor proxy can register with Pulse and sync its authorized node list via `/api/temperature-proxy/authorized-nodes`. This avoids manual `allowed_nodes` maintenance and reduces reliance on `/etc/pve` access.
### New installs
Always pass the Pulse URL when installing:
```bash
curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install-sensor-proxy.sh | \
sudo bash -s -- --ctid <pulse-lxc-id> --pulse-server http://<pulse-ip>:7655
```
The installer now:
- Registers the proxy with Pulse (even for socket-only mode)
- Saves `/etc/pulse-sensor-proxy/.pulse-control-token`
- Appends a `pulse_control_plane` block to `/etc/pulse-sensor-proxy/config.yaml`
### Migrating existing hosts
If your proxy was installed without control-plane sync enabled, run the migration helper on each host:
```bash
curl -fsSL http://<pulse-ip>:7655/api/install/migrate-sensor-proxy-control-plane.sh | \
sudo bash -s -- --pulse-server http://<pulse-ip>:7655
```
The script registers the existing proxy, writes the control token, updates the config, and restarts the service (use `--skip-restart` if you prefer to bounce it yourself). Once migrated, temperatures for every node defined in Pulse will continue working even if the proxy cant reach `/etc/pve` or Corosync IPC.
After migration you should see `Temperature data fetched successfully` entries for each node in `journalctl -u pulse-sensor-proxy`, and Settings → Diagnostics will show the last control-plane sync time.
### Getting Help
If temperature monitoring isn't working:
1. **Collect diagnostic info:**
```bash
# On Proxmox host
systemctl status pulse-sensor-proxy
journalctl -u pulse-sensor-proxy -n 100 > /tmp/proxy-logs.txt
ls -la /run/pulse-sensor-proxy/pulse-sensor-proxy.sock
# In Pulse container
journalctl -u pulse -n 100 | grep -i temp > /tmp/pulse-temp-logs.txt
```
2. **Test manually:**
```bash
# On Proxmox host - test SSH to a cluster node
ssh root@cluster-node "sensors -j"
```
3. **Check GitHub Issues:** https://github.com/rcourtman/Pulse/issues
4. **Include in bug report:**
- Pulse version
- Deployment type (LXC/Docker/native)
- Proxy logs
- Pulse logs
- Output of manual SSH test