diff --git a/SECURITY.md b/SECURITY.md index ceec98fc4..85e8a77e2 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -50,11 +50,10 @@ Preferred option (no SSH keys, no proxy wiring): sudo bash -s -- --url http://pulse.example.com:7655 --token --enable-proxmox ``` -Deprecated option (existing installs only): +Legacy sensor proxy (removed): -- `pulse-sensor-proxy` is deprecated in Pulse v5 and is not recommended for new deployments. In v5, legacy sensor-proxy endpoints are disabled by default unless `PULSE_ENABLE_SENSOR_PROXY=true` is set on the Pulse server. -- Existing installs continue to work during the migration window, but plan to move to `pulse-agent --enable-proxmox`. -- Canonical temperature docs: `docs/TEMPERATURE_MONITORING.md` +- `pulse-sensor-proxy` is no longer supported. Migrate to `pulse-agent --enable-proxmox` or SSH-based collection. +- Cleanup steps are in `docs/TEMPERATURE_MONITORING.md`. #### Removing Old SSH Keys @@ -297,7 +296,7 @@ for sensitive data. - Rollback actions are logged with timestamps and metadata - Scheduler health escalations recorded in audit trail - Runtime logging configuration changes tracked - - Security status uses `PULSE_AUDIT_LOG=true` (or legacy `AUDIT_LOG_ENABLED=true`) to mark audit logging as active in the UI + - Security status reflects whether persistent audit logging is active (Pulse Pro) ### What's Encrypted in Exports - Node credentials (passwords, API tokens) diff --git a/docs/API.md b/docs/API.md index 3deca604d..9d77077fb 100644 --- a/docs/API.md +++ b/docs/API.md @@ -25,6 +25,8 @@ Standard browser session cookie (used by the UI). Public endpoints include: - `GET /api/health` - `GET /api/version` +- `GET /api/agent/version` (agent update checks) +- `GET /api/setup-script` (requires a setup token) ## πŸ” Scopes and Admin Access @@ -33,6 +35,7 @@ Some endpoints require admin privileges and/or scopes. Common scopes include: - `settings:read` - `settings:write` - `host-agent:config:read` +- `host-agent:manage` Endpoints that require admin access are noted below. @@ -44,13 +47,28 @@ Endpoints that require admin access are noted below. `GET /api/health` Check if Pulse is running. ```json -{ "status": "healthy", "uptime": 3600 } +{ + "status": "healthy", + "timestamp": 1700000000, + "uptime": 3600, + "devModeSSH": false +} ``` ### System State `GET /api/state` Returns the complete state of your infrastructure (Nodes, VMs, Containers, Storage, Alerts). This is the main endpoint used by the dashboard. +### Unified Resources +`GET /api/resources` +Returns a unified, flattened resource list. Requires `monitoring:read`. + +`GET /api/resources/stats` +Summary counts and health rollups. + +`GET /api/resources/{id}` +Fetch a single resource by ID. + ### Version Info `GET /api/version` Returns version, build time, and update status. @@ -110,6 +128,34 @@ Request body: --- +## 🧭 Setup & Discovery + +### Setup Script (Public) +`GET /api/setup-script` +Returns the Proxmox/PBS setup script. Requires a temporary setup token (`auth_token`) in the query. + +### Setup Script URL +`POST /api/setup-script-url` (auth) +Generates a one-time setup token and URL for `/api/setup-script`. + +### Auto-Register (Public) +`POST /api/auto-register` +Auto-registers a node using the temporary setup token. + +### Agent Install Command +`POST /api/agent-install-command` (auth) +Generates an API token and install command for agent-based Proxmox setup. + +### Discovery +`GET /api/discover` (auth) +Runs network discovery. + +### Test Notification +`POST /api/test-notification` (auth) +Broadcasts a WebSocket test event. + +--- + ## πŸ“Š Metrics & Charts ### Chart Data @@ -328,6 +374,17 @@ Returns scheduler health, DLQ, and breaker status. Requires `monitoring:read`. - `GET /api/updates/history` - `GET /api/updates/history/entry?id=` +### Infrastructure Updates +- `GET /api/infra-updates` (requires `monitoring:read`) +- `GET /api/infra-updates/summary` (requires `monitoring:read`) +- `POST /api/infra-updates/check` (requires `monitoring:write`) +- `GET /api/infra-updates/host/{hostId}` (requires `monitoring:read`) +- `GET /api/infra-updates/{resourceId}` (requires `monitoring:read`) + +### Diagnostics +- `GET /api/diagnostics` (auth) +- `POST /api/diagnostics/docker/prepare-token` (admin, `settings:write`) + --- ## πŸ”‘ OIDC / SSO @@ -483,6 +540,12 @@ Returns stats for the persistent metrics store (SQLite-backed). `GET /api/metrics-store/history` Returns historical metric series for a resource and time range. +Query params: +- `resourceType` (required): `node`, `vm`, `container`, `storage`, `dockerHost`, `dockerContainer` +- `resourceId` (required) +- `metric` (optional): `cpu`, `memory`, `disk`, etc. Omit for all metrics +- `range` (optional): `1h`, `6h`, `12h`, `24h`, `7d`, `30d`, `90d` (default `24h`) + --- ## πŸ€– Agent Endpoints @@ -500,6 +563,10 @@ The unified agent combines host, Docker, and Kubernetes monitoring. Use `--enabl See [UNIFIED_AGENT.md](UNIFIED_AGENT.md) for installation instructions. +### Agent Version +`GET /api/agent/version` +Returns the current server version for agent update checks. + ### Unified Agent Installer Script `GET /install.sh` Serves the universal `install.sh` used to install `pulse-agent` on target machines. @@ -560,17 +627,4 @@ Updates server-side config for an agent (e.g., `commandsEnabled`). --- -## 🌑️ Temperature Proxy (Legacy) - -These endpoints are only available when legacy `pulse-sensor-proxy` support is enabled. - -- `POST /api/temperature-proxy/register` (proxy registration) -- `GET /api/temperature-proxy/authorized-nodes` (proxy sync) -- `DELETE /api/temperature-proxy/unregister` (admin) -- `GET /api/temperature-proxy/install-command` (admin, `settings:write`) -- `GET /api/temperature-proxy/host-status` (admin, `settings:read`) - -Legacy migration helper: -- `GET /api/install/migrate-temperature-proxy.sh` - > **Note**: This is a summary of the most common endpoints. For a complete list, inspect the network traffic of the Pulse dashboard or check the source code in `internal/api/router.go`. diff --git a/docs/AUTO_UPDATE.md b/docs/AUTO_UPDATE.md index 1b18300a2..3a8c4bbb0 100644 --- a/docs/AUTO_UPDATE.md +++ b/docs/AUTO_UPDATE.md @@ -49,7 +49,7 @@ In **Settings β†’ System β†’ Updates**: | Setting | Description | |---------|-------------| | **Update Channel** | Stable (recommended) or Release Candidate | -| **Auto-Check** | Stored UI preference (server currently checks for updates hourly regardless) | +| **Auto-Check** | Background update check interval (hours); `0` disables | ### Stored Settings (system.json) @@ -64,7 +64,7 @@ Auto-update preferences are stored in `system.json` and edited via the UI. } ``` -**Note:** `autoUpdateTime` is stored for UI reference. The systemd timer still runs on its own schedule (02:00 + jitter). In-app update checks are driven by `autoUpdateCheckInterval`. +**Note:** `autoUpdateTime` is stored for UI reference. The systemd timer still runs on its own schedule (02:00 + jitter). Background update checks follow `autoUpdateCheckInterval`. ## Manual Update Methods diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md index edfd8a867..23d0f0621 100644 --- a/docs/CONFIGURATION.md +++ b/docs/CONFIGURATION.md @@ -42,7 +42,6 @@ All files are located in `/etc/pulse/` (Systemd) or `/data/` (Docker/Kubernetes) Path overrides: - `PULSE_DATA_DIR` sets the base directory for `system.json`, encrypted files, and the bootstrap token. -- `PULSE_AUTH_CONFIG_DIR` sets the directory for `.env` (auth-only) if you need auth on a separate volume. --- @@ -101,11 +100,17 @@ Environment overrides (lock the corresponding UI fields): | `OIDC_ALLOWED_GROUPS` | Allowed groups (space or comma-separated) | | `OIDC_ALLOWED_DOMAINS` | Allowed email domains (space or comma-separated) | | `OIDC_ALLOWED_EMAILS` | Allowed emails (space or comma-separated) | -| `OIDC_GROUP_ROLE_MAPPINGS` | Group-to-role mappings (Pro). Format: `group1=role1,group2=role2` | +| `OIDC_GROUP_ROLE_MAPPINGS` | Comma-separated group=role mappings (Pulse Pro) | | `OIDC_CA_BUNDLE` | Custom CA bundle path | +Legacy token flag (backwards compatibility): + +| Variable | Description | +| ---------- | ------------- | +| `API_TOKEN_ENABLED` | Legacy toggle for API token auth (defaults to enabled when tokens exist) | + > **Note**: `API_TOKEN` / `API_TOKENS` are legacy and will be migrated into `api_tokens.json` on startup. > Manage API tokens in the UI for long-term support. @@ -113,7 +118,7 @@ Environment overrides (lock the corresponding UI fields): ## πŸ–₯️ System Settings (`system.json`) -Controls runtime behavior like ports, logging, and polling intervals. Most of these can be changed in **Settings β†’ System**. +Controls runtime behavior like logging, polling intervals, and UI preferences. Legacy port fields in `system.json` are ignored; use `FRONTEND_PORT` instead.
Example system.json @@ -122,7 +127,7 @@ Controls runtime behavior like ports, logging, and polling intervals. Most of th { "pvePollingInterval": 10, // Seconds "backendPort": 3000, // Legacy (unused) - "frontendPort": 7655, // Public port + "frontendPort": 7655, // Legacy (ignored; use FRONTEND_PORT) "logLevel": "info", // debug, info, warn, error "autoUpdateEnabled": false, // Enable auto-update checks "adaptivePollingEnabled": false, // Smart polling for large clusters @@ -144,10 +149,12 @@ Environment variables take precedence over `system.json`. | ---------- | ------------- | --------- | | `FRONTEND_PORT` | Public listening port | `7655` | | `PORT` | Legacy alias for `FRONTEND_PORT` | *(unset)* | -| `BACKEND_HOST` | Bind host for the HTTP server and metrics listener (advanced) | *(unset)* | -| `BACKEND_PORT` | Legacy internal API port (unused) | `3000` | | `LOG_LEVEL` | Log verbosity (see below) | `info` | | `LOG_FORMAT` | Log output format (`auto`, `json`, `console`) | `auto` | +| `LOG_FILE` | Log file path (enables file logging) | *(unset)* | +| `LOG_MAX_SIZE` | Log rotation size (MB) | `100` | +| `LOG_MAX_AGE` | Keep rotated logs for N days (`0` disables cleanup) | `30` | +| `LOG_COMPRESS` | Gzip rotated logs | `true` | #### Log Levels @@ -179,12 +186,11 @@ Environment variables take precedence over `system.json`. | `DISCOVERY_SCAN_GATEWAYS` | Include gateway IPs in discovery (`true`/`false`) | `true` | | `DISCOVERY_DIAL_TIMEOUT_MS` | TCP dial timeout (ms) | `1000` | | `DISCOVERY_HTTP_TIMEOUT_MS` | HTTP probe timeout (ms) | `2000` | -| `PULSE_ENABLE_SENSOR_PROXY` | Enable legacy `pulse-sensor-proxy` endpoints (deprecated, unsupported) | `false` | | `PULSE_AUTH_HIDE_LOCAL_LOGIN` | Hide username/password form | `false` | | `DEMO_MODE` | Enable read-only demo mode | `false` | | `PULSE_TRUSTED_PROXY_CIDRS` | Comma-separated IPs/CIDRs trusted to supply `X-Forwarded-For`/`X-Real-IP` | *(unset)* | | `PULSE_TRUSTED_NETWORKS` | Comma-separated CIDRs treated as trusted local networks (does not bypass auth) | *(unset)* | -| `PULSE_SENSOR_PROXY_SOCKET` | Legacy sensor-proxy socket override (deprecated) | *(unset)* | +| `ALLOW_UNPROTECTED_EXPORT` | Allow unauthenticated config export on public networks when no auth is configured (use with caution) | `false` | ### Iframe Embedding (system.json) @@ -202,13 +208,12 @@ When `allowEmbedding` is `false`, Pulse sends `X-Frame-Options: DENY` and `frame | `PVE_POLLING_INTERVAL` | PVE metrics polling frequency | `10s` | | `PBS_POLLING_INTERVAL` | PBS metrics polling frequency | `60s` | | `PMG_POLLING_INTERVAL` | PMG metrics polling frequency | `60s` | -| `CONCURRENT_POLLING` | Enable concurrent polling for multi-node clusters | `true` | | `CONNECTION_TIMEOUT` | API connection timeout | `60s` | | `BACKUP_POLLING_CYCLES` | Poll cycles between backup checks | `10` | | `ENABLE_BACKUP_POLLING` | Enable backup job monitoring | `true` | | `BACKUP_POLLING_INTERVAL` | Backup polling frequency | `0` (Auto) | | `ENABLE_TEMPERATURE_MONITORING` | Enable temperature monitoring (where supported) | `true` | -| `SSH_PORT` | SSH port for legacy SSH-based temperature collection | `22` | +| `SSH_PORT` | SSH port for temperature collection over SSH | `22` | | `ADAPTIVE_POLLING_ENABLED` | Enable smart polling for large clusters | `false` | | `ADAPTIVE_POLLING_BASE_INTERVAL` | Base interval for adaptive polling | `10s` | | `ADAPTIVE_POLLING_MIN_INTERVAL` | Minimum adaptive polling interval | `5s` | @@ -219,18 +224,8 @@ When `allowEmbedding` is `false`, Pulse sends `X-Frame-Options: DENY` and `frame | `GUEST_METADATA_MAX_CONCURRENT` | Max concurrent guest metadata fetches | `4` | | `DNS_CACHE_TIMEOUT` | Cache TTL for DNS lookups | `5m` | | `MAX_POLL_TIMEOUT` | Maximum time per polling cycle | `3m` | -| `WEBHOOK_BATCH_DELAY` | Delay before sending batched webhooks | `10s` | | `PULSE_DISABLE_DOCKER_UPDATE_ACTIONS` | Hide Docker update buttons (read-only mode) | `false` | -### Logging Overrides - -| Variable | Description | Default | -| ---------- | ------------- | --------- | -| `LOG_FILE` | Log file path (empty = stdout) | *(unset)* | -| `LOG_MAX_SIZE` | Log file max size (MB) | `100` | -| `LOG_MAX_AGE` | Log file retention (days) | `30` | -| `LOG_COMPRESS` | Compress rotated logs | `true` | - ### Update Settings (system.json) These are stored in `system.json` and managed via the UI. @@ -239,9 +234,12 @@ These are stored in `system.json` and managed via the UI. | ----- | ------------- | --------- | | `updateChannel` | Update channel (`stable` or `rc`) | `stable` | | `autoUpdateEnabled` | Allow one-click updates | `false` | -| `autoUpdateCheckInterval` | Stored UI preference (server currently checks hourly) | `24` | +| `autoUpdateCheckInterval` | Background update check interval in hours (`0` disables) | `24` | | `autoUpdateTime` | Stored UI preference (systemd timer has its own schedule) | `03:00` | + +> **Note**: Update settings are stored in `system.json`. Legacy `.env` entries (`UPDATE_CHANNEL`, `AUTO_UPDATE_ENABLED`, `AUTO_UPDATE_CHECK_INTERVAL`, `AUTO_UPDATE_TIME`) are kept in sync for backwards compatibility but are not read at runtime. + ### Auto-Import (Bootstrap) You can auto-import an encrypted backup on first startup. This is useful for automated provisioning and test environments. @@ -263,6 +261,7 @@ These are primarily for development or test harnesses and should not be used in | `PULSE_UPDATE_SERVER` | Override update server base URL (testing only) | *(unset)* | | `PULSE_UPDATE_STAGE_DELAY_MS` | Adds artificial delays between update stages (testing only) | *(unset)* | | `PULSE_ALLOW_DOCKER_UPDATES` | Expose update UI/actions in Docker (debug only) | `false` | +| `PULSE_DEV_ALLOW_CONTAINER_SSH` | Allow SSH-based temperature collection from containers (dev/test only) | `false` | | `PULSE_AI_ALLOW_LOOPBACK` | Allow AI tool HTTP fetches to loopback addresses | `false` | | `PULSE_LICENSE_PUBLIC_KEY` | Override embedded license public key (base64, dev only) | *(unset)* | | `PULSE_LICENSE_DEV_MODE` | Skip license verification (development only) | `false` | @@ -351,6 +350,8 @@ API tokens provide scoped, revocable access to Pulse. Manage tokens in **Setting | `kubernetes:report` | Kubernetes agent telemetry submission | | `kubernetes:manage` | Kubernetes cluster management | | `host-agent:report` | Host agent metrics submission | +| `host-agent:config:read` | Read host-agent config payloads | +| `host-agent:manage` | Manage host agents (unlink/delete/config) | | `settings:read` | Read configuration | | `settings:write` | Modify configuration | diff --git a/docs/DEPLOYMENT_MODELS.md b/docs/DEPLOYMENT_MODELS.md index 9701c5ce6..a85795ddb 100644 --- a/docs/DEPLOYMENT_MODELS.md +++ b/docs/DEPLOYMENT_MODELS.md @@ -31,6 +31,12 @@ Pulse uses a split config model: - **Legacy token suppressions**: `env_token_suppressions.json` - **AI config**: `ai.enc` (encrypted) - **AI patrol data**: `ai_findings.json`, `ai_patrol_runs.json`, `ai_usage_history.json` +- **AI baseline data**: `baselines.json` +- **AI correlation data**: `ai_correlations.json` +- **AI pattern data**: `ai_patterns.json` +- **AI remediation data**: `ai_remediations.json` +- **AI incident tracking**: `ai_incidents.json` +- **Audit log database**: `audit.db` (Pulse Pro, SQLite) - **Pulse Pro license**: `license.enc` (encrypted) - **Host metadata**: `host_metadata.json` - **Docker metadata**: `docker_metadata.json` diff --git a/docs/FAQ.md b/docs/FAQ.md index 36048884e..6165b0427 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -55,13 +55,12 @@ Yes! If Pulse detects Ceph storage, it automatically queries cluster health, OSD Yes. Go to **Alerts β†’ Thresholds** and set any value to `-1` to disable it. You can do this globally or per-resource (VM/Node). ### How do I monitor temperature? -Install the unified agent on your Proxmox hosts with Proxmox integration enabled: +Recommended: install the unified agent on your Proxmox hosts with Proxmox integration enabled: 1. Install `lm-sensors` on the host (`apt install lm-sensors && sensors-detect`) 2. Install `pulse-agent` with `--enable-proxmox` -`pulse-sensor-proxy` is deprecated in v5 and is not recommended for new deployments. -See [Temperature Monitoring](TEMPERATURE_MONITORING.md). +If you do not run the agent, Pulse can collect temperatures over SSH. See [Temperature Monitoring](TEMPERATURE_MONITORING.md). --- diff --git a/docs/PULSE_PRO.md b/docs/PULSE_PRO.md index 4da0dbd1b..65e49920a 100644 --- a/docs/PULSE_PRO.md +++ b/docs/PULSE_PRO.md @@ -8,9 +8,10 @@ Pulse Pro unlocks advanced AI automation features on top of the free Pulse platf - Persistent audit trail with SQLite storage and HMAC signing. - Queryable via `/api/audit` and verified per event in the Security β†’ Audit Log UI. - Supports filtering, verification badges, and signature checks for tamper detection. -- Configure with `PULSE_AUDIT_SIGNING_KEY`, `PULSE_AUDIT_RETENTION_DAYS`, and `PULSE_AUDIT_CLEANUP_INTERVAL_HOURS`. +- Signing uses an auto-generated HMAC key stored (encrypted) at `.audit-signing.key` in the Pulse data directory. +- Retention defaults to 90 days (not currently configurable via environment variables). - API reference: `docs/API.md`. -- If no signing key is set, events are stored without signatures and verification will fail. +- If signing is disabled (for example, encryption is unavailable), events are stored without signatures and verification will fail. ### Audit Webhooks - real-time delivery of audit events to external endpoints (SIEM, ELK, etc.). diff --git a/docs/README.md b/docs/README.md index 44ac10cd2..a4ebcfaf0 100644 --- a/docs/README.md +++ b/docs/README.md @@ -57,7 +57,7 @@ Pulse Pro unlocks **LLM-backed AI Patrol** β€” automated background monitoring t - **[Centralized Agent Management (Pro)](CENTRALIZED_MANAGEMENT.md)** – Agent profiles and remote config. - **[Proxmox Backup Server](PBS.md)** – PBS integration, direct API vs PVE passthrough, token setup. - **[VM Disk Monitoring](VM_DISK_MONITORING.md)** – Enabling QEMU Guest Agent for disk stats. -- **[Temperature Monitoring](TEMPERATURE_MONITORING.md)** – Agent-based temperature monitoring (`pulse-agent --enable-proxmox`). Sensor proxy is deprecated in v5. +- **[Temperature Monitoring](TEMPERATURE_MONITORING.md)** – Agent-based temperature monitoring (`pulse-agent --enable-proxmox`). Sensor proxy has been removed. - **[Webhooks](WEBHOOKS.md)** – Custom notification payloads. ## πŸ’» Development diff --git a/docs/SECURITY_CHANGELOG.md b/docs/SECURITY_CHANGELOG.md deleted file mode 100644 index 432ce9338..000000000 --- a/docs/SECURITY_CHANGELOG.md +++ /dev/null @@ -1,399 +0,0 @@ -# Security Changelog - Pulse Sensor Proxy - -> **Deprecated in v5:** `pulse-sensor-proxy` is deprecated and not recommended for new deployments. This changelog is retained for existing installations and historical reference. - -## 2025-11-07: Critical Security Hardening - -### Summary - -Comprehensive security audit and hardening of the pulse-sensor-proxy architecture. Four critical vulnerabilities were identified and fixed, significantly improving the security posture against container compromise scenarios. - -### Security Fixes - -#### 1. **Read-Only Socket Mount (CRITICAL)** βœ… FIXED - -**Vulnerability:** Socket directory was mounted read-write into containers, allowing compromised containers to: -- Unlink the socket and create man-in-the-middle proxies -- Fill `/run/pulse-sensor-proxy/` to exhaust tmpfs -- Race the proxy service on restart to hijack the socket path - -**Fix:** Changed all socket mounts to read-only (`:ro`) -- **Files Modified:** `docker-compose.yml`, `docs/TEMPERATURE_MONITORING.md` -- **Impact:** Breaking change for existing deployments (must update mount to `:ro`) -- **Migration:** Change `:/run/pulse-sensor-proxy:rw` to `:/run/pulse-sensor-proxy:ro` - -**Security Benefit:** Compromised containers can no longer tamper with socket infrastructure. - ---- - -#### 2. **Node Allowlist Validation (CRITICAL)** βœ… FIXED - -**Vulnerability:** Proxy would SSH to ANY hostname/IP that passed format validation, enabling: -- Internal network reconnaissance via SSH handshakes -- Port scanning using the proxy as a relay -- Resource exhaustion via slow-loris SSH attacks -- Complete bypass of network security controls - -**Fix:** Multi-layer node validation system -- **New Files:** `cmd/pulse-sensor-proxy/validation.go` -- **Modified Files:** `cmd/pulse-sensor-proxy/config.go`, `cmd/pulse-sensor-proxy/main.go`, `cmd/pulse-sensor-proxy/metrics.go` -- **Features:** - - Configurable `allowed_nodes` list (supports hostnames, IPs, CIDR ranges) - - Automatic cluster membership validation on Proxmox hosts - - 5-minute cache of cluster membership to reduce pvecm overhead - - `strict_node_validation` option for strict vs. permissive modes - - Prometheus metric: `pulse_proxy_node_validation_failures_total` - -**Configuration Example:** -```yaml -# Only allow specific nodes -allowed_nodes: - - "pve1" - - "pve2.example.com" - - "192.168.1.0/24" - -# Require cluster membership validation -strict_node_validation: true -```text - -**Default Behavior:** If `allowed_nodes` is empty and proxy runs on Proxmox host, automatically validates against cluster membership (secure by default). - -**Security Benefit:** Eliminates SSRF attack vector completely. Containers can only request temperatures from approved nodes. - ---- - -#### 3. **Read/Write Deadlines (CRITICAL)** βœ… FIXED - -**Vulnerability:** No read deadline allowed attackers to: -- Hold connection slots indefinitely by connecting but not sending data -- Starve legitimate requests (4 UIDs could consume all 8 global slots) -- Trivial DoS with minimal resources - -**Fix:** Comprehensive deadline management -- **Modified Files:** `cmd/pulse-sensor-proxy/config.go`, `cmd/pulse-sensor-proxy/main.go`, `cmd/pulse-sensor-proxy/metrics.go` -- **Features:** - - Configurable `read_timeout` (default: 5s) and `write_timeout` (default: 10s) - - Read deadline set before request parsing, cleared before handler execution - - Write deadline set before response transmission - - Automatic penalty applied on timeout - - Prometheus metrics: `pulse_proxy_read_timeouts_total`, `pulse_proxy_write_timeouts_total` - -**Configuration Example:** -```yaml -read_timeout: 5s # Max time to wait for request -write_timeout: 10s # Max time to send response -``` - -**Security Benefit:** Connection slot exhaustion attacks no longer possible. Slow/stalled clients automatically disconnected. - ---- - -#### 4. **Range-Based Rate Limiting (HIGH PRIORITY)** βœ… FIXED - -**Vulnerability:** Rate limiting was per-UID, easily bypassed by: -- Creating multiple users in container (each mapped to unique host UID) -- 100+ subordinate UIDs available in typical ID-mapping (100000-165535) -- Each UID got separate rate limit quota -- Attackers could drive proxy to 100% CPU with parallel requests - -**Fix:** Range-based rate limiting for containers -- **Modified Files:** `cmd/pulse-sensor-proxy/throttle.go`, `cmd/pulse-sensor-proxy/main.go`, `cmd/pulse-sensor-proxy/auth.go`, `cmd/pulse-sensor-proxy/metrics.go` -- **Features:** - - Automatic detection of ID-mapped UID ranges from `/etc/subuid` and `/etc/subgid` - - Rate limits applied per-range for container UIDs - - Rate limits applied per-UID for host UIDs (backwards compatible) - - Metrics show `peer="range:100000-165535"` or `peer="uid:0"` - -**Technical Details:** -- `identifyPeer()` checks if BOTH UID AND GID are in mapped ranges -- If in range: all UIDs in that range share rate limits -- If NOT in range: legacy per-UID limiting (for host processes) - -**Security Benefit:** Multi-UID bypass attacks no longer possible. Entire container limited as single entity. - ---- - -#### 5. **GID Authorization Fix (MEDIUM PRIORITY)** βœ… FIXED - -**Vulnerability:** `allowed_peer_gids` populated from config but never checked: -- Created false sense of security for administrators -- GID-based policies silently ignored -- No way to authorize by group membership - -**Fix:** Implemented proper GID authorization -- **Modified Files:** `cmd/pulse-sensor-proxy/auth.go` -- **New Files:** `cmd/pulse-sensor-proxy/auth_test.go` -- **Features:** - - Peer authorized if UID **OR** GID matches allowlist - - Debug logging shows which rule granted access - - Full test coverage - -**Security Benefit:** GID-based policies now actually enforced as administrators expect. - ---- - -#### 6. **SSH Output Size Limits (MEDIUM PRIORITY)** βœ… FIXED - -**Vulnerability:** No cap on SSH command output size: -- Malicious remote node could stream gigabytes -- Memory exhaustion possible -- CPU spike during parsing - -**Fix:** Implemented configurable output size limits -- **Modified Files:** `cmd/pulse-sensor-proxy/config.go`, `cmd/pulse-sensor-proxy/ssh.go`, `cmd/pulse-sensor-proxy/metrics.go` -- **New Files:** `cmd/pulse-sensor-proxy/ssh_test.go` -- **Features:** - - `max_ssh_output_bytes` config option (default: 1MB) - - Stream with `io.LimitReader` to cap size - - Error returned if limit exceeded - - Prometheus metric: `pulse_proxy_ssh_output_oversized_total{node}` - -**Configuration Example:** -```yaml -max_ssh_output_bytes: 1048576 # 1MB default -``` - -**Security Benefit:** Remote nodes cannot exhaust proxy memory or CPU via oversized outputs. - ---- - -#### 7. **Improved Host Key Management (MEDIUM PRIORITY)** βœ… FIXED - -**Vulnerability:** Trust-On-First-Use (TOFU) via ssh-keyscan: -- Trusts whatever key remote offers on first contact -- No administrator approval for new fingerprints -- Vulnerable to MITM if container influences routing -- No alerting on fingerprint changes - -**Fix:** Multi-phase host key hardening -- **Modified Files:** `internal/ssh/knownhosts/manager.go`, `cmd/pulse-sensor-proxy/ssh.go`, `cmd/pulse-sensor-proxy/config.go`, `cmd/pulse-sensor-proxy/metrics.go` -- **New Files:** `internal/ssh/knownhosts/manager_test.go` -- **Features:** - - Seed host keys from Proxmox cluster store (`/etc/pve/priv/known_hosts`) - - Falls back to ssh-keyscan only if Proxmox unavailable (with WARN) - - Fingerprint change detection with ERROR logging - - `require_proxmox_hostkeys` config option for strict mode - - Prometheus metric: `pulse_proxy_hostkey_changes_total{node}` - -**Configuration Example:** -```yaml -require_proxmox_hostkeys: false # true = strict mode (reject unknown hosts) -``` - -**Security Benefit:** Significantly reduces MITM attack surface. Administrators can detect and respond to fingerprint changes. - ---- - -#### 8. **Capability-Based Authorization (MEDIUM PRIORITY)** βœ… FIXED - -**Vulnerability:** Any UID in allowlist could call privileged methods: -- No separation between read-only and admin capabilities -- If another service's UID in list, inherits full host-level control - -**Fix:** Comprehensive capability system -- **New Files:** `cmd/pulse-sensor-proxy/capabilities.go` -- **Modified Files:** `cmd/pulse-sensor-proxy/config.go`, `cmd/pulse-sensor-proxy/auth.go`, `cmd/pulse-sensor-proxy/main.go` -- **Features:** - - Three capability levels: `read`, `write`, `admin` - - Per-UID capability assignment - - Privileged methods require `admin` capability - - Backwards compatible with legacy `allowed_peer_uids` format - -**Configuration Example:** -```yaml -allowed_peers: - - uid: 0 - capabilities: [read, write, admin] # Root gets everything - - uid: 1000 - capabilities: [read] # Docker user: read-only - - uid: 1001 - capabilities: [read, write] # Temperature access but not key distribution -``` - -**Security Benefit:** Proper least-privilege model. Services can be granted only the capabilities they need. - ---- - -#### 9. **Additional Systemd Hardening (LOW PRIORITY)** βœ… FIXED - -**Gap:** Additional systemd hardening directives available but not enabled: -- `MemoryDenyWriteExecute` (prevents RWX memory) -- `RestrictRealtime` (denies realtime scheduling) -- `ProtectHostname` (hostname protection) -- `ProtectKernelLogs` (kernel log protection) -- `SystemCallArchitectures` (native only) - -**Fix:** Enhanced systemd unit file -- **Modified Files:** `scripts/pulse-sensor-proxy.service` -- **Added Directives:** - - `MemoryDenyWriteExecute=true` - - `RestrictRealtime=true` - - `ProtectHostname=true` - - `ProtectKernelLogs=true` - - `SystemCallArchitectures=native` - -**Security Benefit:** Defense in depth. Additional layers to slow/prevent post-compromise exploitation. - ---- - -### Additional Improvements - -#### Enhanced Metrics - -New Prometheus metrics for security monitoring: -```text -pulse_proxy_node_validation_failures_total{reason} -pulse_proxy_read_timeouts_total -pulse_proxy_write_timeouts_total -pulse_proxy_rate_limit_hits_total -pulse_proxy_limiter_rejections_total{reason, peer} -pulse_proxy_limiter_penalties_total{reason, peer} -pulse_proxy_global_concurrency_inflight -``` - -#### Better Logging - -- Node validation failures logged at WARN with "potential SSRF attempt" -- Read timeouts logged with "slow client or attack" -- All security events include correlation IDs for tracing -- Peer identification shows "range:X-Y" for containers - -#### Configuration Flexibility - -All new features have sensible defaults and can be tuned via: -- YAML config file (`/etc/pulse-sensor-proxy/config.yaml`) -- Environment variables (e.g., `PULSE_SENSOR_PROXY_READ_TIMEOUT`) -- Command-line flags - ---- - -### Migration Guide - -#### For Existing Deployments - -**1. Update Socket Mounts (REQUIRED):** - -Docker: -```yaml -# OLD: -- /run/pulse-sensor-proxy:/run/pulse-sensor-proxy:rw - -# NEW: -- /run/pulse-sensor-proxy:/run/pulse-sensor-proxy:ro -``` - -LXC (Proxmox): -```bash -# Mounts created by install script are already correct -# If manually configured, ensure mount is read-only -``` - -**2. Optional Configuration:** - -Create `/etc/pulse-sensor-proxy/config.yaml`: -```yaml -# Restrict nodes (optional, auto-detects cluster by default) -allowed_nodes: - - "10.0.0.0/24" # Your cluster network - -# Adjust timeouts if needed (defaults are good for most) -read_timeout: 5s -write_timeout: 10s - -# Tune rate limits if necessary (defaults are reasonable) -rate_limit: - per_peer_interval_ms: 1000 - per_peer_burst: 5 -``` - -**3. Update Monitoring:** - -Add new metrics to your Prometheus alerts: -```yaml -# Alert on SSRF attempts -- alert: PulseSensorSSRFAttempt - expr: rate(pulse_proxy_node_validation_failures_total[5m]) > 0 - -# Alert on read timeout attacks -- alert: PulseSensorReadTimeouts - expr: rate(pulse_proxy_read_timeouts_total[5m]) > 1 -``` - -**4. Restart Proxy:** - -```bash -systemctl restart pulse-sensor-proxy -``` - ---- - -### Backwards Compatibility - -**Preserved:** -- Empty `allowed_nodes` + Proxmox host = auto-validate cluster (secure default) -- Empty `allowed_nodes` + non-Proxmox = allow all (legacy behavior) -- Host UID rate limiting unchanged -- All existing config files continue to work - -**Breaking Changes:** -- Socket mounts MUST be changed to `:ro` (security fix) -- Containers with multiple users now share rate limits (security fix) - ---- - -### Testing - -All fixes include comprehensive tests: -```bash -# Run test suite -go test ./cmd/pulse-sensor-proxy -v - -# Build binary -go build ./cmd/pulse-sensor-proxy - -# Test configuration -./pulse-sensor-proxy --config /etc/pulse-sensor-proxy/config.yaml version -``` - ---- - -### Security Impact Assessment - -**Before Fixes:** -- **SSRF:** Trivially exploitable, full internal network access -- **DoS:** 4 UIDs could completely starve service -- **Container Bypass:** 100+ UIDs available for rate limit bypass -- **Socket Tampering:** Compromised container could MITM all proxy traffic - -**After Fixes:** -- **SSRF:** βœ… Eliminated (node validation) -- **DoS:** βœ… Eliminated (read deadlines) -- **Container Bypass:** βœ… Eliminated (range-based limiting) -- **Socket Tampering:** βœ… Eliminated (read-only mount) - -**Overall Risk Reduction:** Critical vulnerabilities eliminated. System now resilient to container compromise scenarios. - ---- - -### References - -- **Temperature Monitoring Overview:** `docs/security/TEMPERATURE_MONITORING.md` -- **Sensor Proxy Hardening:** Standardized security controls for legacy deployments. - ---- - -### Credits - -Security audit performed by Claude + Codex collaboration. - -Issues identified: -1. Socket directory tampering (Codex) -2. Unrestricted SSRF (Codex) -3. Missing read deadline (Codex) -4. Multi-UID rate limit bypass (Codex) - -All fixes implemented and tested 2025-11-07. - ---- - -**For questions or security concerns, file issues at:** diff --git a/docs/TEMPERATURE_MONITORING.md b/docs/TEMPERATURE_MONITORING.md index 98d4c81dd..fe76c95b9 100644 --- a/docs/TEMPERATURE_MONITORING.md +++ b/docs/TEMPERATURE_MONITORING.md @@ -1,795 +1,86 @@ -# 🌑️ Temperature Monitoring +# Temperature Monitoring -Monitor real-time CPU and NVMe temperatures for your Proxmox nodes. +Pulse can collect host temperatures in two supported ways: -> **Deprecation notice (v5):** `pulse-sensor-proxy` is deprecated and not recommended for new deployments. Temperature monitoring should be done via the unified agent (`pulse-agent --enable-proxmox`). Existing proxy installs can continue during the migration window, but plan to migrate to the agent. In v5, legacy sensor-proxy endpoints are disabled by default unless `PULSE_ENABLE_SENSOR_PROXY=true` is set on the Pulse server. +- Pulse agent on Proxmox hosts (recommended) +- SSH-based collection from the Pulse server (fallback or for non-agent hosts) -## Recommended: Pulse Agent +If you are upgrading from older releases that used `pulse-sensor-proxy`, see the legacy cleanup section below. The sensor proxy is no longer supported in Pulse. -For new installations, prefer the unified agent on Proxmox hosts. It reads sensors locally and reports temperatures to Pulse without SSH keys or proxy wiring. +## Recommended: Pulse Agent (Proxmox) + +The unified agent runs on each Proxmox host and reports temperatures locally with no SSH keys needed. ```bash curl -fsSL http://:7655/install.sh | \ bash -s -- --url http://:7655 --token --enable-proxmox ``` -If you use the agent method, the rest of this document (sensor proxy) is optional. +Notes: +- Install `lm-sensors` on each host (`apt install lm-sensors && sensors-detect --auto`). +- Temperatures appear automatically once the agent reports. -## Migration: pulse-sensor-proxy β†’ pulse-agent +## SSH-Based Collection (Fallback) -If you already deployed `pulse-sensor-proxy`, migrate to the agent to avoid proxy maintenance and remove SSH-from-container complexity: +Pulse can also collect temperatures by SSHing into each host and running `sensors -j`, with a fallback to `/sys/class/thermal/thermal_zone0/temp` when available (for example, on Raspberry Pi). -1. Install `lm-sensors` on each Proxmox host (if not already): `apt install lm-sensors && sensors-detect` -2. Install the agent on each Proxmox host: - ```bash - curl -fsSL http://:7655/install.sh | \ - bash -s -- --url http://:7655 --token --enable-proxmox - ``` -3. Confirm temperatures are updating in the dashboard. -4. Disable the proxy service on hosts where it was installed: - ```bash - sudo systemctl disable --now pulse-sensor-proxy - ``` -5. If your Pulse container had a proxy socket mount, remove the mount and remove `PULSE_SENSOR_PROXY_SOCKET` from the Pulse `.env` (for example `/data/.env` in Docker) before restarting Pulse. +### Requirements -## πŸš€ Quick Start +- SSH connectivity from the Pulse server to each host +- `lm-sensors` installed and `sensors -j` returning JSON on the host +- A restricted SSH key entry that only allows `sensors -j` -### 1. Install the agent on Proxmox hosts -Install the unified agent on each Proxmox host with Proxmox integration enabled (example in the section above). +### Setup -### 2. Enable temperature monitoring (optional) -Go to **Settings β†’ Proxmox β†’ [Node] β†’ Advanced Monitoring** and enable "Temperature monitoring" if you want to collect temperatures for that node. +1. Generate the node setup command from the UI: + **Settings -> Proxmox -> Add Node** +2. Run the command on each Proxmox host. The setup script can: + - Create the required API user and permissions + - Add a restricted SSH key entry for temperature collection + - Install `lm-sensors` (optional) ---- +The SSH entry added to `authorized_keys` is restricted to `sensors -j`, for example: -## Troubleshooting - -**No temperature data appearing:** -1. Ensure `lm-sensors` is installed: `apt install lm-sensors && sensors-detect` -2. Verify the agent is running: `systemctl status pulse-agent` -3. Check agent logs: `journalctl -u pulse-agent -f` -4. Confirm `--enable-proxmox` flag is set - -**Temperatures show as `--` or missing:** -1. Run `sensors` on the host to verify sensor detection -2. Some hardware may not expose temperature sensors -3. Check if the agent has permission to read sensor data - ---- - -
-Legacy: pulse-sensor-proxy (deprecated, click to expand) - -## Deprecated: pulse-sensor-proxy (existing installs only) - -This section is retained for existing installations during the migration window. - -If you are starting fresh on Pulse v5, do not deploy `pulse-sensor-proxy`. Use the agent method above. - -If you already have the proxy deployed: - -- Keep it running while you migrate to `pulse-agent --enable-proxmox`. -- Expect future removal in a major release. Do not treat the proxy as a long-term solution. - -## πŸ“¦ Docker Setup (Manual) - -If running Pulse in Docker, you must install the proxy on the host and share the socket. - -1. **Install Proxy on Host**: - ```bash - curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install-sensor-proxy.sh | \ - sudo bash -s -- --standalone --pulse-server http://:7655 - ```text - -2. **Update `docker-compose.yml`**: - Add the socket volume to your Pulse service: - ```yaml - volumes: - - /mnt/pulse-proxy:/run/pulse-sensor-proxy:ro - ```text - > **Note**: The standalone installer creates the socket at `/mnt/pulse-proxy` on the host. Map it to `/run/pulse-sensor-proxy` inside the container. - -3. **Restart Pulse**: `docker compose up -d` - -## 🌐 Multi-Server Proxmox Setup - -If you have Pulse running on **Server A** and want to monitor temperatures on **Server B** (a separate Proxmox host without Pulse): - -1. **Run Installer on Server B** (the remote Proxmox host): - ```bash - curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install-sensor-proxy.sh | \ - sudo bash -s -- --ctid --pulse-server http://:7655 - ``` - Replace `` with the LXC container ID where Pulse runs on Server A (e.g., `100`). - -2. The installer will detect that the container doesn't exist locally and install in **host monitoring only** mode: - ```text - [WARN] Container 100 does not exist on this node - [WARN] Will install sensor-proxy for host temperature monitoring only - ``` - -3. **Verify**: `systemctl status pulse-sensor-proxy` - -> **Note**: The `--standalone --http-mode` flags shown in the Pulse UI quick-setup are for Docker deployments, not bare Proxmox hosts. For multi-server Proxmox setups, use the `--ctid` approach above. - -## πŸ”§ Troubleshooting - -| Issue | Solution | -| :--- | :--- | -| **No Data** | Check **Settings β†’ Diagnostics** (Temperature Proxy section). | -| **Proxy Unreachable** | Ensure port `8443` is open on the remote node. | -| **"Permission Denied"** | Re-run the installer to fix permissions or SSH keys. | -| **LXC Issues** | Ensure the container has the bind mount: `lxc.mount.entry: /run/pulse-sensor-proxy ...` | - -### Check Proxy Status -On the Proxmox host: -```bash -systemctl status pulse-sensor-proxy +```text +command="sensors -j",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty # pulse-sensors ``` -### View Logs -```bash -journalctl -u pulse-sensor-proxy -f -``` +If you use a non-standard SSH port, set `SSH_PORT` (system-wide) or configure it in **Settings -> System**. -## 🧠 How It Works +### Containerized Pulse -1. **Pulse Sensor Proxy**: A lightweight service runs on the Proxmox host. -2. **Secure Access**: It reads sensors (via `lm-sensors`) and exposes them securely. -3. **Transport**: - - **Local**: Uses a Unix socket (`/run/pulse-sensor-proxy`) for zero-latency, secure access. - - **Remote**: Uses mutual TLS over HTTPS (port 8443). -4. **No SSH Keys**: Pulse containers no longer need SSH keys to read temperatures. - ---- - -## πŸ”§ Advanced Configuration - -### Manual Configuration (No Script) - -If you can't run the installer script, create the configuration manually: - -**1. Download binary:** -```bash -curl -L https://github.com/rcourtman/Pulse/releases/latest/download/pulse-sensor-proxy-linux-amd64 \ - -o /tmp/pulse-sensor-proxy -install -D -m 0755 /tmp/pulse-sensor-proxy /usr/local/bin/pulse-sensor-proxy -``` - -**2. Create service user:** -```bash -useradd --system --user-group --no-create-home --shell /usr/sbin/nologin pulse-sensor-proxy -usermod -aG www-data pulse-sensor-proxy # For pvecm access -``` - -**3. Create directories:** -```bash -install -d -o pulse-sensor-proxy -g pulse-sensor-proxy -m 0750 /var/lib/pulse-sensor-proxy -install -d -o pulse-sensor-proxy -g pulse-sensor-proxy -m 0700 /var/lib/pulse-sensor-proxy/ssh -install -d -o pulse-sensor-proxy -g pulse-sensor-proxy -m 0755 /etc/pulse-sensor-proxy -``` - -**4. Create config (optional, for Docker):** -```yaml -# /etc/pulse-sensor-proxy/config.yaml -allowed_nodes_file: /etc/pulse-sensor-proxy/allowed_nodes.yaml -allowed_peer_uids: [1000] # Docker container UID -allow_idmapped_root: true -allowed_idmap_users: - - root -``` -Allowed nodes live in `/etc/pulse-sensor-proxy/allowed_nodes.yaml`; change them via `pulse-sensor-proxy config set-allowed-nodes` so the proxy can lock and validate the file safely. Control-plane settings are added automatically when you register via Pulse, but you can supply them manually if you cannot reach the API (`pulse_control_plane.url`, `.token_file`, `.refresh_interval`). - -**5. Install systemd service:** -```bash -# Download from: https://github.com/rcourtman/Pulse/releases/latest/download/install-sensor-proxy.sh -# Extract the systemd unit from the installer (ExecStartPre/ExecStart typically uses /usr/local/bin/pulse-sensor-proxy) -systemctl daemon-reload -systemctl enable --now pulse-sensor-proxy -``` - -**6. Verify:** -```bash -systemctl status pulse-sensor-proxy -ls -l /run/pulse-sensor-proxy/pulse-sensor-proxy.sock -``` - -#### Configuration File Format - -The proxy reads `/etc/pulse-sensor-proxy/config.yaml` plus an allow-list in `/etc/pulse-sensor-proxy/allowed_nodes.yaml`: - -```yaml -allowed_source_subnets: - - 192.168.1.0/24 - - 10.0.0.0/8 - -# Capability-based access control (legacy UID/GID lists still work) -allowed_peers: - - uid: 0 - capabilities: [read, write, admin] - - uid: 1000 - capabilities: [read] -allowed_peer_uids: [] -allowed_peer_gids: [] -allow_idmapped_root: true -allowed_idmap_users: - - root - -log_level: info -metrics_address: default -read_timeout: 5s -write_timeout: 10s -max_ssh_output_bytes: 1048576 -require_proxmox_hostkeys: false - -# Allow list persistence (managed by installer/control-plane/CLI) -allowed_nodes_file: /etc/pulse-sensor-proxy/allowed_nodes.yaml -strict_node_validation: false - -# Rate limiting (per calling UID) -rate_limit: - per_peer_interval_ms: 1000 - per_peer_burst: 5 - -# HTTPS mode (for remote nodes) -http_enabled: false -http_listen_addr: ":8443" -http_tls_cert: /etc/pulse-sensor-proxy/tls/server.crt -http_tls_key: /etc/pulse-sensor-proxy/tls/server.key -http_auth_token: "" # Populated during registration - -# Control-plane sync (keeps allowed_nodes.yaml updated) -pulse_control_plane: - url: https://pulse.example.com:7655 - token_file: /etc/pulse-sensor-proxy/.pulse-control-token - refresh_interval: 60 - insecure_skip_verify: false -``` - -`allowed_nodes.yaml` is the source of truth for valid nodes. Avoid editing it directlyβ€”use `pulse-sensor-proxy config set-allowed-nodes` so the proxy can lock, dedupe, and write atomically. `allowed_peers` scopes socket access; legacy UID/GID lists remain for backward compatibility and imply full capabilities. - -**Environment Variable Overrides:** - -Config values can also be set via environment variables (useful for containerized proxy deployments): +SSH-based collection from inside a container is not recommended for production. Prefer the agent method or run Pulse on the host. For dev/test, you can allow SSH from the container with: ```bash -# Add allowed subnets (comma-separated, appends to config file values) -PULSE_SENSOR_PROXY_ALLOWED_SUBNETS=192.168.1.0/24,10.0.0.0/8 - -# Allow/disallow ID-mapped root (overrides config file) -PULSE_SENSOR_PROXY_ALLOW_IDMAPPED_ROOT=true - -# HTTP listener controls -PULSE_SENSOR_PROXY_HTTP_ENABLED=true -PULSE_SENSOR_PROXY_HTTP_ADDR=":8443" -PULSE_SENSOR_PROXY_HTTP_TLS_CERT=/etc/pulse-sensor-proxy/tls/server.crt -PULSE_SENSOR_PROXY_HTTP_TLS_KEY=/etc/pulse-sensor-proxy/tls/server.key -PULSE_SENSOR_PROXY_HTTP_AUTH_TOKEN="$(cat /etc/pulse-sensor-proxy/.http-auth-token)" -``` -Additional overrides include `PULSE_SENSOR_PROXY_ALLOWED_PEER_UIDS`, `PULSE_SENSOR_PROXY_ALLOWED_PEER_GIDS`, `PULSE_SENSOR_PROXY_ALLOWED_NODES`, `PULSE_SENSOR_PROXY_READ_TIMEOUT`, `PULSE_SENSOR_PROXY_WRITE_TIMEOUT`, `PULSE_SENSOR_PROXY_METRICS_ADDR`, and `PULSE_SENSOR_PROXY_STRICT_NODE_VALIDATION`. - -Example systemd override: -```ini -# /etc/systemd/system/pulse-sensor-proxy.service.d/override.conf -[Service] -Environment="PULSE_SENSOR_PROXY_ALLOWED_SUBNETS=192.168.1.0/24" +PULSE_DEV_ALLOW_CONTAINER_SSH=true ``` -**Note:** Socket path, SSH key directory, and audit log path are configured via command-line flags (see main.go), not the YAML config file. +### Verification -#### Re-running After Changes - -The installer is idempotent and safe to re-run: +From the Pulse server, verify that SSH and sensors output work: ```bash -# After adding a new Proxmox node to cluster -bash install-sensor-proxy.sh --standalone --pulse-server http://pulse:7655 --quiet - -# Verify installation -systemctl status pulse-sensor-proxy +ssh -i /path/to/key root@node "sensors -j" ``` -### Legacy SSH Security Concerns - -SSH-based temperature collection from inside containers is unsafe. Pulse blocks this by default for container deployments. - -In legacy/non-container setups where you intentionally use SSH, the main risks are: - -- Compromised container = exposed SSH keys -- Even with forced commands, keys could be extracted -- Required manual hardening (key rotation, IP restrictions, etc.) - -### Hardening Recommendations (Legacy/Native Installs Only) - -#### 1. Key Rotation -Rotate SSH keys periodically (e.g., every 90 days): +For platforms that expose a thermal zone file: ```bash -# On Pulse server -ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519_new -N "" - -# Update all nodes' authorized_keys -# Test connectivity -ssh -i ~/.ssh/id_ed25519_new node "sensors -j" - -# Replace old key -mv ~/.ssh/id_ed25519_new ~/.ssh/id_ed25519 +ssh -i /path/to/key root@node "cat /sys/class/thermal/thermal_zone0/temp" ``` -#### 2. Secret Mounts (Docker) -Mount SSH keys from secure volumes: +### Troubleshooting -```yaml -version: '3' -services: - pulse: - image: rcourtman/pulse:latest - volumes: - - pulse-ssh-keys:/home/pulse/.ssh:ro # Read-only - - pulse-data:/data -volumes: - pulse-ssh-keys: - driver: local - driver_opts: - type: tmpfs # Memory-only, not persisted - device: tmpfs -``` +- If `sensors -j` returns empty output, run `sensors-detect --auto` and retry. +- If temperatures show as unavailable, confirm the host actually exposes sensor data. +- Ensure the SSH key entry is present and restricted to `sensors -j`. -#### 3. Monitoring & Alerts -Enable SSH audit logging on Proxmox nodes: +## Legacy Cleanup (If Upgrading) + +If you still have the old sensor proxy installed from prior releases, remove it manually: ```bash -# Install auditd -apt-get install auditd - -# Watch SSH access -auditctl -w /root/.ssh -p wa -k ssh_access - -# Monitor for unexpected commands -tail -f /var/log/audit/audit.log | grep ssh +sudo systemctl disable --now pulse-sensor-proxy || true +sudo rm -f /usr/local/bin/pulse-sensor-proxy +sudo rm -rf /etc/pulse-sensor-proxy /var/lib/pulse-sensor-proxy /run/pulse-sensor-proxy ``` - -#### 4. IP Restrictions -Limit SSH access to your Pulse server IP in `/etc/ssh/sshd_config`: - -```ssh -Match User root Address 192.168.1.100 - ForceCommand sensors -j - PermitOpen none - AllowAgentForwarding no - AllowTcpForwarding no -``` - -### Verifying Proxy Installation - -To check if your deployment is using the secure proxy: - -```bash -# On Proxmox host - check proxy service -systemctl status pulse-sensor-proxy - -# Check if socket exists -ls -l /run/pulse-sensor-proxy/pulse-sensor-proxy.sock - -# View proxy logs -journalctl -u pulse-sensor-proxy -f -``` - -Forward these logs off-host for retention by following standard rsyslog/syslog practices. - -In the Pulse container, check the logs at startup: -```bash -# Should see: "Temperature proxy detected - using secure host-side bridge" -journalctl -u pulse | grep -i proxy -``` - -### Disabling Temperature Monitoring - -To remove SSH access: - -```bash -# On each Proxmox node -sed -i '/pulse@/d' /root/.ssh/authorized_keys - -# Or remove just the forced command entry -sed -i '/command="sensors -j"/d' /root/.ssh/authorized_keys -``` - -Temperature data will stop appearing in the dashboard after the next polling cycle. - -## Operations & Troubleshooting - -### Managing the Proxy Service - -The pulse-sensor-proxy service runs on the Proxmox host (outside the container). - -**Service Management:** -```bash -# Check service status -systemctl status pulse-sensor-proxy - -# Restart the proxy -systemctl restart pulse-sensor-proxy - -# Stop the proxy (disables temperature monitoring) -systemctl stop pulse-sensor-proxy - -# Start the proxy -systemctl start pulse-sensor-proxy - -# Enable proxy to start on boot -systemctl enable pulse-sensor-proxy - -# Disable proxy autostart -systemctl disable pulse-sensor-proxy -``` - -### Log Locations - -**Proxy Logs (on Proxmox host):** -```bash -# Follow proxy logs in real-time -journalctl -u pulse-sensor-proxy -f - -# View last 50 lines -journalctl -u pulse-sensor-proxy -n 50 - -# View logs since last boot -journalctl -u pulse-sensor-proxy -b - -# View logs with timestamps -journalctl -u pulse-sensor-proxy --since "1 hour ago" -``` - -**Pulse Logs (in container):** -```bash -# Check if proxy is being used -journalctl -u pulse | grep -i "proxy\|temperature" - -# Should see: "Temperature proxy detected - using secure host-side bridge" -``` - -### SSH Key Rotation - -Rotate SSH keys periodically for security (recommended every 90 days). - -**Automated Rotation (Recommended):** - -The `pulse-proxy-rotate-keys.sh` helper script handles rotation safely with staging, verification, and rollback support: - -```bash -# 1. Dry-run first (recommended) -curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/pulse-proxy-rotate-keys.sh | \ - sudo bash -s -- --dry-run - -# 2. Perform rotation -curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/pulse-proxy-rotate-keys.sh | sudo bash -``` - -**What the script does:** -- Generates new Ed25519 keypair in staging directory -- Pushes new key to all cluster nodes via proxy RPC -- Verifies SSH connectivity with new key on each node -- Atomically swaps keys (current β†’ backup, staging β†’ active) -- Preserves old keys for rollback - -**If rotation fails, rollback:** -```bash -curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/pulse-proxy-rotate-keys.sh | \ - sudo bash -s -- --rollback -``` - -**Manual Rotation (Fallback):** - -If the automated script fails or is unavailable: - -```bash -# 1. On Proxmox host, backup old keys -cd /var/lib/pulse-sensor-proxy/ssh/ -cp id_ed25519 id_ed25519.backup -cp id_ed25519.pub id_ed25519.pub.backup - -# 2. Generate new keypair -ssh-keygen -t ed25519 -f id_ed25519 -N "" -C "pulse-sensor-proxy-rotated" - -# 3. Re-run setup to push keys to cluster -curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install-sensor-proxy.sh | \ - bash -s -- --ctid - -# 4. Verify temperature data still works in Pulse UI -``` - -### Automatic Cleanup When Nodes Are Removed - -SSH keys are automatically removed when you delete a node from Pulse: - -1. **When you remove a node** in Pulse (**Settings β†’ Proxmox**), Pulse signals the temperature proxy -2. **The proxy creates a cleanup request** file at `/var/lib/pulse-sensor-proxy/cleanup-request.json` -3. **A systemd path unit detects the request** and triggers the cleanup service -4. **The cleanup script automatically:** - - SSHs to the specified node (or localhost if it's local) - - Removes the SSH key entries (`# pulse-managed-key` and `# pulse-proxy-key`) - - Logs the cleanup action via syslog - -**Automatic cleanup works for:** -- βœ… **Cluster nodes** - Full automatic cleanup (Proxmox clusters have unrestricted passwordless SSH) -- ⚠️ **Standalone nodes** - Cannot auto-cleanup due to forced command security (see below) - -**Standalone Node Limitation:** - -Standalone nodes use forced commands (`command="sensors -j"`) for security. This same restriction prevents the cleanup script from running `sed` to remove keys. This is a **security feature, not a bug** - adding a workaround would defeat the forced command protection. - -For standalone nodes: -- Keys remain after removal (but they're **read-only** - only `sensors -j` access) -- **Low security risk** - no shell access, no write access, no port forwarding -- **Auto-cleanup on re-add** - Setup script removes old keys when node is re-added -- **Manual cleanup if needed:** - ```bash - ssh root@standalone-node "sed -i '/# pulse-proxy-key$/d' /root/.ssh/authorized_keys" - ``` - -**Monitoring Cleanup:** -```bash -# Watch cleanup operations in real-time -journalctl -u pulse-sensor-cleanup -f - -# View cleanup history -journalctl -u pulse-sensor-cleanup --since "1 week ago" - -# Check if cleanup system is active -systemctl status pulse-sensor-cleanup.path -``` - -**Manual Cleanup (if needed):** - -If automatic cleanup fails or you need to manually revoke access: - -```bash -# On the node being removed, remove all Pulse SSH keys -ssh root@old-node "sed -i -e '/# pulse-managed-key\$/d' -e '/# pulse-proxy-key\$/d' /root/.ssh/authorized_keys" - -# Or remove them locally -sed -i -e '/# pulse-managed-key$/d' -e '/# pulse-proxy-key$/d' /root/.ssh/authorized_keys - -# No restart needed - proxy will fail gracefully for that node -# Temperature monitoring will continue for remaining nodes -``` - -### Failure Modes - -**Proxy Not Running:** -- Symptom: No temperature data in Pulse UI -- Check: `systemctl status pulse-sensor-proxy` on Proxmox host -- Fix: `systemctl start pulse-sensor-proxy` - -**Socket Not Accessible in Container:** -- Symptom: Pulse logs show "Temperature proxy not available - using direct SSH" -- Check: `ls -l /run/pulse-sensor-proxy/pulse-sensor-proxy.sock` in container -- Fix: Verify bind mount in LXC config (`/etc/pve/lxc/.conf`) -- Should have: `lxc.mount.entry: /run/pulse-sensor-proxy run/pulse-sensor-proxy none bind,create=dir 0 0` - -**pvecm Not Available:** -- Symptom: Proxy fails to discover cluster nodes -- Cause: Pulse runs on non-Proxmox host -- Fallback: Use legacy direct SSH method (native installation) - -**Pulse Running Off-Cluster:** -- Symptom: Proxy discovers local host but not remote cluster nodes -- Limitation: Proxy requires passwordless SSH between cluster nodes -- Solution: Ensure Proxmox host running Pulse has SSH access to all cluster nodes - -**Unauthorized Connection Attempts:** -- Symptom: Proxy logs show "Unauthorized connection attempt" -- Cause: Process with non-root UID trying to access socket -- Normal: Only root (UID 0) or proxy's own user can access socket -- Check: Look for suspicious processes trying to access the socket - -### Monitoring the Proxy - -**Manual Monitoring (v1):** - -The proxy service includes systemd restart-on-failure, which handles most issues automatically. For additional monitoring: - -```bash -# Check proxy health -systemctl is-active pulse-sensor-proxy && echo "Proxy is running" || echo "Proxy is down" - -# Monitor logs for errors -journalctl -u pulse-sensor-proxy --since "1 hour ago" | grep -i error - -# Verify socket exists and is accessible -test -S /run/pulse-sensor-proxy/pulse-sensor-proxy.sock && echo "Socket OK" || echo "Socket missing" -``` - -**Alerting:** -- Rely on systemd's automatic restart (`Restart=on-failure`) -- Monitor via journalctl for persistent failures -- Check Pulse UI for missing temperature data - -**Future:** Integration with pulse-watchdog is planned for automated health checks and alerting (see #528). - -### Known Limitations - -**Single Proxy = Single Point of Failure:** -- Each Proxmox host runs one pulse-sensor-proxy instance -- If the proxy service dies, temperature monitoring stops for all containers on that host -- This is acceptable for read-only telemetry, but be aware of the failure mode -- Systemd auto-restart (`Restart=on-failure`) mitigates most outages -- If multiple Pulse containers run on same host, they share the same proxy - -**Sensors Output Parsing Brittleness:** -- Pulse depends on `sensors -j` JSON output format from lm-sensors -- Changes to sensor names, structure, or output format could break parsing -- Consider adding schema validation and instrumentation to detect issues early -- Monitor proxy logs for parsing errors: `journalctl -u pulse-sensor-proxy | grep -i "parse\|error"` - -**Cluster Discovery Limitations:** -- Proxy uses `pvecm status` to discover cluster nodes (requires Proxmox IPC access) -- If Proxmox hardens IPC access or cluster topology changes unexpectedly, discovery may fail -- Standalone Proxmox nodes work but only monitor that single node -- Fallback: re-run the proxy installer script to reconfigure cluster access - -**Rate Limiting & Scaling** (updated in commit 46b8b8d): - -**What changed:** pulse-sensor-proxy now defaults to 1 request per second with a burst of 5 per calling UID. Earlier builds throttled after two calls every five seconds, which caused temperature tiles to flicker or fall back to `--` as soon as clusters reached three or more nodes. - -**Symptoms of saturation:** -- Temperature widgets flicker between values and `--`, or entire node rows disappear after adding new hardware -- `Settings β†’ System β†’ Updates` shows no proxy restarts, yet scheduler health reports breaker openings for temperature pollers -- Proxy logs include `limiter.rejection` or `Rate limit exceeded` entries for the container UID - -**Diagnose:** -1. Check scheduler health for temperature pollers: - ```bash - curl -s http://localhost:7655/api/monitoring/scheduler/health \ - | jq '.instances[] | select(.key | contains("temperature")) \ - | {key, lastSuccess: .pollStatus.lastSuccess, breaker: .breaker.state, deadLetter: .deadLetter.present}' - ``` - Breakers that remain `open` or repeated dead letters indicate the proxy is rejecting calls. -2. Inspect limiter metrics on the host: - ```bash - curl -s http://127.0.0.1:9127/metrics \ - | grep -E 'pulse_proxy_limiter_(rejects|penalties)_total' - ``` - A rising counter confirms the limiter is backing off callers. -3. Review logs for throttling: - ```bash - journalctl -u pulse-sensor-proxy -n 100 | grep -i "rate limit" - ``` - -**Tuning guidance:** Add a `rate_limit` block to `/etc/pulse-sensor-proxy/config.yaml` (see `cmd/pulse-sensor-proxy/config.example.yaml`) when clusters grow beyond the defaults. Use the formula `per_peer_interval_ms = polling_interval_ms / node_count` and set `per_peer_burst β‰₯ node_count` to allow one full sweep per polling window. - -| Deployment size | Nodes | 10 s poll interval β†’ interval_ms | Suggested burst | Notes | -| --- | --- | --- | --- | --- | -| Small | 1–3 | 1000 (default) | 5 | Works for most single Proxmox hosts. | -| Medium | 4–10 | 500 | 10 | Halves wait time; keep burst β‰₯ node count. | -| Large | 10–20 | 250 | 20 | Monitor CPU on proxy; consider staggering polls. | -| XL | 30+ | 100–150 | 30–50 | Only enable after validating proxy host capacity. | - -**Security note:** Lower intervals increase throughput and reduce UI staleness, but they also allow untrusted callers to issue more RPCs per second. Keep `per_peer_interval_ms β‰₯ 100` in production and continue to rely on UID allow-lists plus audit logs when raising limits. - -**SSH latency monitoring:** -- Monitor SSH latency metrics: `curl -s http://127.0.0.1:9127/metrics | grep pulse_proxy_ssh_latency` - -**Requires Proxmox Cluster Membership:** -- Proxy requires passwordless root SSH between cluster nodes -- Standard for Proxmox clusters, but hardened environments may differ -- Alternative: Create dedicated service account with sudo access to `sensors` - -**No Cross-Cluster Support:** -- Proxy only manages the cluster its host belongs to -- Cannot bridge temperature monitoring across multiple disconnected clusters -- Each cluster needs its own Pulse instance with its own proxy - -### Common Issues - -**Temperature Data Stops Appearing:** -1. Check proxy service: `systemctl status pulse-sensor-proxy` -2. Check proxy logs: `journalctl -u pulse-sensor-proxy -n 50` -3. Test SSH manually: `ssh root@node "sensors -j"` -4. Verify socket exists: `ls -l /run/pulse-sensor-proxy/pulse-sensor-proxy.sock` - -**New Cluster Node Not Showing Temperatures:** -1. Ensure lm-sensors installed: `ssh root@new-node "sensors -j"` -2. Proxy auto-discovers on next poll (may take up to 1 minute) -3. Re-run the proxy installer script to configure SSH keys on the new node: `curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install-sensor-proxy.sh | bash -s -- --ctid ` - -**Permission Denied Errors:** -1. Verify socket permissions: `ls -l /run/pulse-sensor-proxy/pulse-sensor-proxy.sock` -2. Should be: `srw-rw---- 1 root root` -3. Check Pulse runs as root in container: `pct exec -- whoami` - -**Proxy Service Won't Start:** -1. Check logs: `journalctl -u pulse-sensor-proxy -n 50` -2. Verify binary exists: `ls -l /usr/local/bin/pulse-sensor-proxy` -3. Test manually: `/usr/local/bin/pulse-sensor-proxy --version` -4. Check socket directory: `ls -ld /var/run` - -## Configuration Management - -The sensor proxy includes a built-in CLI for safe configuration management. It uses locking and atomic writes to prevent config corruption. - -### Quick Reference - -```bash -# Validate config files -pulse-sensor-proxy config validate - -# Add nodes to allowed list -pulse-sensor-proxy config set-allowed-nodes --merge 192.168.0.1 --merge node1.local - -# Replace entire allowed list -pulse-sensor-proxy config set-allowed-nodes --replace --merge 192.168.0.1 -``` - -**Key benefits:** -- Atomic writes with file locking prevent corruption -- Automatic deduplication and normalization -- systemd validation prevents startup with bad config -- Installer uses CLI (no more shell/Python divergence) - -**See also:** -- [Sensor Proxy CLI Reference](../cmd/pulse-sensor-proxy/README.md) - Full command documentation - -## Control-Plane Sync & Migration - -The sensor proxy can register with Pulse and sync its authorized node list via `/api/temperature-proxy/authorized-nodes`. This avoids manual `allowed_nodes` maintenance and reduces reliance on `/etc/pve` access. - -### New installs - -Always pass the Pulse URL when installing: - -```bash -curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install-sensor-proxy.sh | \ - sudo bash -s -- --ctid --pulse-server http://:7655 -``` - -The installer now: - -- Registers the proxy with Pulse (even for socket-only mode) -- Saves `/etc/pulse-sensor-proxy/.pulse-control-token` -- Appends a `pulse_control_plane` block to `/etc/pulse-sensor-proxy/config.yaml` - -### Migrating existing hosts - -If your proxy was installed without control-plane sync enabled, run the migration helper on each host: - -```bash -curl -fsSL http://:7655/api/install/migrate-sensor-proxy-control-plane.sh | \ - sudo bash -s -- --pulse-server http://:7655 -``` - -The script registers the existing proxy, writes the control token, updates the config, and restarts the service (use `--skip-restart` if you prefer to bounce it yourself). Once migrated, temperatures for every node defined in Pulse will continue working even if the proxy can’t reach `/etc/pve` or Corosync IPC. - -After migration you should see `Temperature data fetched successfully` entries for each node in `journalctl -u pulse-sensor-proxy`, and Settings β†’ Diagnostics will show the last control-plane sync time. - -### Getting Help - -If temperature monitoring isn't working: - -1. **Collect diagnostic info:** - ```bash - # On Proxmox host - systemctl status pulse-sensor-proxy - journalctl -u pulse-sensor-proxy -n 100 > /tmp/proxy-logs.txt - ls -la /run/pulse-sensor-proxy/pulse-sensor-proxy.sock - - # In Pulse container - journalctl -u pulse -n 100 | grep -i temp > /tmp/pulse-temp-logs.txt - ``` - -2. **Test manually:** - ```bash - # On Proxmox host - test SSH to a cluster node - ssh root@cluster-node "sensors -j" - ``` - -3. **Check GitHub Issues:** -4. **Include in bug report:** - - Pulse version - - Deployment type (LXC/Docker/native) - - Proxy logs - - Pulse logs - - Output of manual SSH test - -
diff --git a/docs/TROUBLESHOOTING.md b/docs/TROUBLESHOOTING.md index 80769667b..08e6214b0 100644 --- a/docs/TROUBLESHOOTING.md +++ b/docs/TROUBLESHOOTING.md @@ -45,8 +45,8 @@ sudo pulse bootstrap-token #### Audit Log verification shows unsigned events - **Symptom**: Audit Log entries show β€œUnsigned” or verification fails in the UI. -- **Root cause**: `PULSE_AUDIT_SIGNING_KEY` is not set, so events are stored without signatures. -- **Fix**: Set `PULSE_AUDIT_SIGNING_KEY` and restart Pulse Pro. Newly created events will be signed; existing unsigned events remain unsigned. +- **Root cause**: Audit signing is disabled (crypto manager unavailable), so events are stored without signatures. +- **Fix**: Ensure `.encryption.key` is present and Pulse Pro audit logging is enabled, then restart Pulse to regenerate `.audit-signing.key`. Newly created events will be signed; existing unsigned events remain unsigned. #### Audit Log is empty - **Symptom**: Audit Log shows zero events or "Console Logging Only." @@ -55,8 +55,8 @@ sudo pulse bootstrap-token #### Audit Log verification fails for older events - **Symptom**: Older events fail verification while newer events pass. -- **Root cause**: The signing key changed or was rotated, so signatures no longer match. -- **Fix**: Keep `PULSE_AUDIT_SIGNING_KEY` stable. If rotated intentionally, expect older events to fail verification. +- **Root cause**: The audit signing key changed (for example, `.audit-signing.key` was regenerated), so signatures no longer match. +- **Fix**: Restore the previous `.audit-signing.key` from backup to verify older events. If rotated intentionally, expect older events to fail verification. ### Monitoring Data diff --git a/docs/UNIFIED_AGENT.md b/docs/UNIFIED_AGENT.md index 32d472150..6260a292f 100644 --- a/docs/UNIFIED_AGENT.md +++ b/docs/UNIFIED_AGENT.md @@ -2,7 +2,7 @@ The unified agent (`pulse-agent`) combines host, Docker, and Kubernetes monitoring into a single binary. It replaces the separate `pulse-host-agent` and `pulse-docker-agent` for simpler deployment and management. -> Note: In v5, temperature monitoring should be done via `pulse-agent --enable-proxmox`. `pulse-sensor-proxy` is deprecated and retained only for existing installs during the migration window. +> Note: For temperature monitoring, use `pulse-agent --enable-proxmox` (recommended) or SSH-based collection. The legacy sensor proxy has been removed. See `docs/TEMPERATURE_MONITORING.md`. ## Quick Start @@ -77,14 +77,13 @@ curl -fsSL http://:7655/install.sh | \ | `--hostname` | `PULSE_HOSTNAME` | Override hostname | *(OS hostname)* | | `--agent-id` | `PULSE_AGENT_ID` | Unique agent identifier | *(machine-id)* | | `--report-ip` | `PULSE_REPORT_IP` | Override reported IP (multi-NIC) | *(auto)* | +| `--disable-ceph` | `PULSE_DISABLE_CEPH` | Disable local Ceph status polling | `false` | | `--tag` | `PULSE_TAGS` | Apply tags (repeatable or CSV) | *(none)* | | `--log-level` | `LOG_LEVEL` | Log verbosity (`debug`, `info`, `warn`, `error`) | `info` | | `--health-addr` | `PULSE_HEALTH_ADDR` | Health/metrics server address | `:9191` | **Token resolution order**: `--token` β†’ `--token-file` β†’ `PULSE_TOKEN` β†’ `/var/lib/pulse-agent/token`. -Legacy env var: `PULSE_KUBE_INCLUDE_ALL_POD_FILES` is still accepted for backward compatibility. - ## Auto-Detection Auto-detection behavior: @@ -117,7 +116,7 @@ curl -fsSL http://:7655/install.sh | \ ### Disable Docker (even if detected) ```bash curl -fsSL http://:7655/install.sh | \ - bash -s -- --url http://:7655 --token --disable-docker + bash -s -- --url http://:7655 --token --enable-docker=false ``` ### Host + Kubernetes Monitoring diff --git a/docs/UPGRADE_v5.md b/docs/UPGRADE_v5.md index e226fff74..a54b14d4b 100644 --- a/docs/UPGRADE_v5.md +++ b/docs/UPGRADE_v5.md @@ -58,15 +58,15 @@ If you reset auth (for example by deleting `.env`), Pulse may require a bootstra ### Temperature monitoring in containers -If Pulse runs in a container and you are relying on SSH-based temperature collection, v5 blocks that in hardened configurations. +If Pulse runs in a container and you are relying on SSH-based temperature collection, move to the agent or run Pulse on the host. SSH-based collection from containers is intended for dev/test only (use `PULSE_DEV_ALLOW_CONTAINER_SSH=true` if you must). Preferred option: - Install the unified agent (`pulse-agent`) on Proxmox hosts with `--enable-proxmox` -Deprecated option (existing installs only): +Alternative option: -- `pulse-sensor-proxy` continues to work for now, but it is deprecated in v5 and not recommended for new installs. Plan to migrate to the unified agent. +- Run Pulse outside a container and use SSH-based temperature collection (restricted `sensors -j` keys) ### Backups not showing after upgrade (v4 β†’ v5) diff --git a/docs/WEBHOOKS.md b/docs/WEBHOOKS.md index 09bb5b278..74bcdb190 100644 --- a/docs/WEBHOOKS.md +++ b/docs/WEBHOOKS.md @@ -33,6 +33,9 @@ For generic webhooks, use Go templates to format the JSON payload. - `{{.Message}}`, `{{.Value}}`, `{{.Threshold}}`, `{{.Duration}}`, `{{.Timestamp}}` - `{{.Instance}}` (Pulse public URL if configured) - `{{.CustomFields.}}` (user-defined fields in the UI) +- `{{.Metadata}}` (alert metadata map) +- `{{.AlertCount}}`, `{{.Alerts}}` (grouped alerts) +- `{{.Mention}}` (platform-specific mention, if configured) **Convenience fields:** - `{{.ValueFormatted}}`, `{{.ThresholdFormatted}}` @@ -68,4 +71,4 @@ Pulse Pro supports dedicated audit webhooks for security event compliance. Unlik 2. Add your endpoint URL (e.g., `https://siem.corp.local/ingest/pulse`). ### Security -Audit webhooks are dispatched asynchronously. The payload includes a `signature` field which can be verified using your `PULSE_AUDIT_SIGNING_KEY` to ensure the event has not been tampered with in transit. +Audit webhooks are dispatched asynchronously. The payload includes a `signature` field which can be verified using the per-instance HMAC key stored (encrypted) at `.audit-signing.key` in the Pulse data directory. There is no `PULSE_AUDIT_SIGNING_KEY` override. diff --git a/docs/monitoring/ADAPTIVE_POLLING.md b/docs/monitoring/ADAPTIVE_POLLING.md index 39c268ab0..c91472d5f 100644 --- a/docs/monitoring/ADAPTIVE_POLLING.md +++ b/docs/monitoring/ADAPTIVE_POLLING.md @@ -34,7 +34,7 @@ The `circuitBreaker` (`internal/monitoring/circuit_breaker.go`) follows a standa - **Transient** errors (retryable) are retried up to 5 times before moving to the Dead Letter Queue. - **Permanent** errors move directly to the Dead Letter Queue. -**Note:** When `AdaptivePollingMaxInterval` is set to 15 seconds or less, the retry backoff is shortened (750ms initial, 6s max) to keep fast feedback loops during tight polling windows. +**Note:** When `AdaptivePollingMaxInterval` is set to 15 seconds or less, the retry backoff is shortened (750ms initial, 4s max) to keep fast feedback loops during tight polling windows. ## βš™οΈ Configuration Adaptive polling is **disabled by default**. diff --git a/docs/security/TEMPERATURE_MONITORING.md b/docs/security/TEMPERATURE_MONITORING.md index 0f210b2ba..f2a5fdcf3 100644 --- a/docs/security/TEMPERATURE_MONITORING.md +++ b/docs/security/TEMPERATURE_MONITORING.md @@ -1,65 +1,35 @@ -# 🌑️ Temperature Monitoring +# Temperature Monitoring Security -This page describes the recommended v5 approach for temperature monitoring and the security tradeoffs between approaches. - -For the full sensor-proxy setup guide (socket mounts, HTTP mode, troubleshooting), see: -`docs/TEMPERATURE_MONITORING.md`. - -> **Deprecation notice (v5):** `pulse-sensor-proxy` is deprecated and not recommended for new deployments. Use `pulse-agent --enable-proxmox` for temperature monitoring. The sensor-proxy section below is retained for existing installations during the migration window. In v5, legacy sensor-proxy endpoints are disabled by default unless `PULSE_ENABLE_SENSOR_PROXY=true` is set on the Pulse server. +Pulse supports two temperature collection paths: the unified agent (recommended) and SSH-based collection from the Pulse server. This page summarizes the security tradeoffs. ## Recommended: Pulse Agent -The simplest and most feature-rich method is installing the Pulse agent on your Proxmox nodes: +The unified agent (`pulse-agent --enable-proxmox`) runs locally on each Proxmox host and reports temperature metrics directly to Pulse. No SSH keys are stored on the server, and access is scoped to the agent token. -```bash -curl -fsSL http://your-pulse-server:7655/install.sh | bash -s -- \ - --url http://your-pulse-server:7655 \ - --token YOUR_TOKEN \ - --enable-proxmox -``` +Benefits: +- Local sensor access only +- No inbound SSH requirement +- Standard agent auth and transport -**Benefits:** -- βœ… One-command setup -- βœ… Temperature monitoring built-in -- βœ… No SSH keys or proxy configuration required +See [docs/TEMPERATURE_MONITORING.md](../TEMPERATURE_MONITORING.md) for setup. -The agent runs `sensors -j` locally and reports temperatures directly to Pulse. +## SSH-Based Collection ---- +SSH-based temperature monitoring uses a restricted key entry that only allows `sensors -j` to run. This limits the blast radius if a key leaks. -## Deprecated: Sensor Proxy (Host Service) +Recommended restrictions: -`pulse-sensor-proxy` is deprecated in v5 and is not recommended for new deployments. This section is retained for existing installations during the migration window. - -### πŸ›‘οΈ Security Model -- **Isolation**: SSH keys live on the host, not in the container. -- **Least Privilege**: Proxy runs as `pulse-sensor-proxy` (no shell). -- **Verification**: Container identity verified via `SO_PEERCRED`. - -### πŸ—οΈ Components -1. **Pulse Backend**: Connects to Unix socket `/mnt/pulse-proxy/pulse-sensor-proxy.sock`. -2. **Sensor Proxy**: Validates request, executes SSH to node. -3. **Target Node**: Accepts SSH key restricted to `sensors -j`. - -### πŸ”’ Key Restrictions -SSH keys deployed to nodes are locked down: ```text -command="sensors -j",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty +command="sensors -j",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty # pulse-sensors ``` -### 🚦 Rate Limiting -- **Per Peer**: ~12 req/min. -- **Concurrency**: Max 2 parallel requests per peer. -- **Global**: Max 8 concurrent requests. +Additional notes: +- Use a dedicated key for temperature collection only. +- Avoid running Pulse in a container for SSH-based collection. If you must for dev/test, set `PULSE_DEV_ALLOW_CONTAINER_SSH=true` and keep access tightly scoped. -### πŸ“ Auditing -All requests logged to system journal: -```bash -journalctl -u pulse-sensor-proxy -``` -Logs include: `uid`, `pid`, `method`, `node`, `correlation_id`. +See [docs/TEMPERATURE_MONITORING.md](../TEMPERATURE_MONITORING.md) for the full setup flow. -### Related Docs +## Related Docs -- Unified Agent Security: [`docs/AGENT_SECURITY.md`](../AGENT_SECURITY.md) -- Repository Security Policy: [`/SECURITY.md`](../../SECURITY.md) +- Unified Agent Security: [docs/AGENT_SECURITY.md](../AGENT_SECURITY.md) +- Repository Security Policy: [SECURITY.md](../../SECURITY.md)