diff --git a/DEEP_DIVE_SUMMARY.md b/DEEP_DIVE_SUMMARY.md deleted file mode 100644 index 249b9e3f0..000000000 --- a/DEEP_DIVE_SUMMARY.md +++ /dev/null @@ -1,159 +0,0 @@ -# Pulse Deep Dive Summary -**Status:** βœ… Complete -**Updated:** 2026-01-02 - ---- - -## πŸ“‘ Area: Adaptive Polling (`internal/monitoring`) -**Verified Files:** `scheduler.go`, `circuit_breaker.go`, `monitor.go`, `task_queue.go` - -### βœ… Verification Checklist -- [x] **Logic Accuracy**: Verified that `BuildPlan` correctly calculates `NextRun` times based on staleness and error history. -- [x] **Jitter & Smoothing**: Confirmed that the `adaptiveIntervalSelector` adds noise to prevent "thundering herd" patterns. -- [x] **Circuit Breaker**: Verified the exponential backoff (min 5s, max 5m) for failing nodes. -- [x] **Resource Protection**: Verified that the Worker Pool (default 10) prevents polling from overwhelming the host. -- [x] **Error Handling**: Confirmed that permanent errors bypass retries and go to the Dead Letter Queue. - ---- - -## πŸ€– Area: AI Patrol & Findings (`internal/ai`) -**Verified Files:** `patrol.go`, `findings.go`, `service.go`, `config/ai.go` - -### βœ… Verification Checklist -- [x] **Prompt Logic**: Verified that dismissed findings are injected into LLM prompts as "Operational Memory". -- [x] **Escalation Logic**: Confirmed the system overrides dismissals if issue severity worsens. -- [x] **Default Auditing**: Identified and updated the documentation to reflect the 6-hour default patrol interval. -- [x] **Free Tier Fallback**: Verified that "Heuristic Patrol" provides value without requiring a Pro LLM. -- [x] **Persistence**: Confirmed findings are stored in `ai_findings.enc` and survive service restarts. - ---- - -## ⚑ Area: Real-time Engine (`internal/websocket`) -**Verified Files:** `hub.go`, `router.go`, `state_snapshot.go` - -### βœ… Verification Checklist -- [x] **Coalescing**: Verified that rapid state updates are squashed into a 100ms window to save client bandwidth. -- [x] **Concurrency Safety**: Confirmed deep cloning of alerts (`cloneAlertData`) to prevent data races during broadcast. -- [x] **Initial Sync**: Verified the "Welcome -> InitialState" handshake sequence for new connections. -- [x] **Proxy Awareness**: Confirmed support for `X-Forwarded-*` headers for robust origin validation. -- [x] **Sanitization**: Verified recursive NaN/Inf cleanup to prevent JSON marshalling errors on numeric metrics. - ---- - -## πŸ“‰ Area: Metrics Persistence (`internal/metrics`) -**Verified Files:** `store.go`, `docs/METRICS_HISTORY.md` - -### βœ… Verification Checklist -- [x] **Tiered Storage**: Verified that the system correctly rolls up data into Raw, Minute, Hourly, and Daily resolutions. -- [x] **Automatic Tier Selection**: Confirmed that queries automatically select the optimal granularity based on the requested time range. -- [x] **Batch Logging**: Verified that writes are buffered (100 records or 5s) and committed via WAL mode for performance. -- [x] **Data Integrity**: Verified that `Min`, `Max`, and `AVG` are preserved during rollups, allowing for high-quality historical charts. -- [x] **Retention Pruning**: Confirmed the background worker hourly cleanup process works as expected. - ---- - -## πŸ”” Area: Alerting Framework (`internal/alerts`) -**Verified Files:** `alerts.go`, `CONFIGURATION.md` - -### βœ… Verification Checklist -- [x] **Hysteresis**: Verified that separately configurable Trigger/Clear thresholds prevent "jitter" alerts. -- [x] **Time Thresholds**: Confirmed that metrics must exceed thresholds for a sustained period (default 5s) before firing. -- [x] **Flapping Detection**: Verified the state-change tracking logic (5 changes in 5 min) that silences noisy alerts. -- [x] **Escalation & Quiet Hours**: Confirmed that quiet hours can selectively target performance vs. offline alerts, and escalations trigger correctly on timers. -- [x] **Rate Limiting**: Verified the 10-alerts-per-hour limit and 2% minimum delta suppression logic. - ---- - -## 🏠 Area: Host Agent Integration (`internal/api/host_agents.go`) -**Verified Files:** `host_agents.go`, `UNIFIED_AGENT.md`, `AGENT_SECURITY.md` - -### βœ… Verification Checklist -- [x] **Zero-Config Pairing**: Verified the hostname-based lookup logic that allows agents to auto-bind to existing PVE nodes. -- [x] **Bi-Directional Communication**: Confirmed that agents receive configuration overrides (like `commandsEnabled`) in the response to their metric reports. -- [x] **Security Scoping**: Verified that agents are restricted to their own host records via API token validation. -- [x] **Smart Deduplication**: Confirmed that the system prefers Agent metrics over Proxmox API metrics when both are available (for higher accuracy). -- [x] **Self-Unregistration**: Verified the `/uninstall` endpoint allowing for clean removal of infrastructure records. - ---- - -## πŸ” Area: Security & Auth (`internal/api/auth.go`) -**Verified Files:** `auth.go`, `OIDC.md`, `PROXY_AUTH.md` - -### βœ… Verification Checklist -- [x] **Sliding Sessions**: Verified that active dashboard use extends session life via `ValidateAndExtendSession`. -- [x] **OIDC Background Refresh**: Confirmed that OIDC tokens are automatically refreshed in the background 5 minutes before expiry. -- [x] **Proxy Authentication**: Verified that Pulse can trust external SSO headers like `X-Proxy-Secret` and map roles to admin privileges. -- [x] **Brute-Force Protection**: Confirmed built-in IP and User lockout logic (15min cooldown) and rate limiting. -- [x] **Cookie Intelligence**: Verified that Pulse automatically adjusts `SameSite` and `Secure` flags based on proxy/HTTPS detection (e.g., Cloudflare Tunnel support). - ---- - -## πŸ”Œ Area: Proxmox Client (`pkg/proxmox/client.go`) -**Verified Files:** `client.go`, `cluster_client.go` - -### βœ… Verification Checklist -- [x] **Defensive Parsing**: Verified that `FlexInt` and `coerceUint64` handle Proxmox API inconsistencies (strings vs numbers, and scientific notation). -- [x] **Auth Fallback**: Confirmed that the client automatically falls back from JSON to Form-Encoded auth for older PVE versions. -- [x] **Smart Ticket Refresh**: Verified that session tickets are automatically refreshed every 2 hours without interrupting polling. -- [x] **Memory Correction**: Confirmed the `EffectiveAvailable()` calculation accurately reflects reclaimable memory (Free + Buffers + Cached). -- [x] **Error Guidance**: Verified that the client parses 403/595 errors into actionable advice for users (e.g., reminding them to set permissions on the User, not just the Token). - ---- - -## 🐳 Area: Docker Agent (`internal/dockeragent`) -**Verified Files:** `collect.go`, `container_update.go`, `registry.go` - -### βœ… Verification Checklist -- [x] **Memory Accuracy**: Verified subtraction of reclaimable cache (cgroup v1/v2) to match `docker stats`. -- [x] **Safe Updates**: Confirmed the "Rename -> Pull -> Create -> Health Check" update lifecycle with automatic 5s rollback guard. -- [x] **Registry Intelligence**: Verified multi-arch manifest resolution and anonymous token handling for Docker Hub/GHCR. -- [x] **Mode Awareness**: Confirmed that `Unified` mode correctly aligns machine IDs with the host agent to prevent token conflicts. - ---- - -## πŸ”„ Area: Update System (`internal/updates`) -**Verified Files:** `manager.go`, `adapter_installsh.go`, `version.go` - -### βœ… Verification Checklist -- [x] **Reliable Discovery**: Verified the GitHub API + RSS/Atom feed fallback for update discovery. -- [x] **Atomic Updates**: Confirmed the secure pipe-delivery of `install.sh` and verification of its SHA256 checksum. -- [x] **Health-Aware Deployment**: Verified that updates are only considered successful if the `/api/health` endpoint recovers within 30s. -- [x] **History & Recovery**: Confirmed that backup paths are persisted in history, allowing for full state rollback of both binary and config. - ---- - -## πŸ“’ Area: Notification System (`internal/notifications`) -**Verified Files:** `notifications.go`, `email_template.go`, `queue.go` - -### βœ… Verification Checklist -- [x] **SSRF Protection**: Verified that `ValidateWebhookURL` blocks loopback, private IP ranges, and dangerous redirects. -- [x] **Smart Grouping**: Confirmed `groupWindow` logic correctly bundles multiple alerts into a single delivery. -- [x] **Contextual Cooldown**: Verified that cooldown uses `AlertID` + `StartTime`, allowing immediate re-notifications for new events while suppressing noise for ongoing ones. -- [x] **Persistent Queue**: Confirmed that notifications survive service restarts via the background disk-backed queue. - ---- - -## πŸ” Area: Network Discovery (`pkg/discovery`) -**Verified Files:** `discovery.go`, `envdetect/detect.go` - -### βœ… Verification Checklist -- [x] **Environment Awareness**: Confirmed that the scanner automatically detects if it's in Docker/LXC to adjust scan targets. -- [x] **Confidence-Based Phases**: Verified that lower-likelihood subnets are scanned with lower priority and skipped if time budget is low. -- [x] **Fingerprint Accuracy**: Confirmed that the probe goes beyond port checking, using TLS and API fingerprinting with a weighted scoring engine (0.7+ threshold). - ---- - -## πŸ€– Area: AI Agent & Command Execution (`internal/agentexec`) -**Verified Files:** `policy.go`, `server.go` - -### βœ… Verification Checklist -- [x] **Security Boundary**: Verified the `CommandPolicy` engine correctly categorizes commands into Auto-Approve, Require-Approval, and Blocked. -- [x] **Evasion Resistance**: Confirmed `sudo` normalization prevents simple policy bypass attempts. -- [x] **RPC Reliability**: Verified the 5s heartbeat and "3-strike" policy for managing agent connection lifecycles. - ---- - -## 🏁 Overall Conclusion -The Pulse architecture is highly optimized for performance and reliability. The monitoring engine uses adaptive logic to protect resources, the AI system maintains a long-term memory to reduce noise, and the WebSocket hub ensures the frontend stays responsive without flooding the network. - -Documentation is generally very strong, with minor discrepancies identified in areas where defaults were recently changed for token efficiency or performance. diff --git a/README.md b/README.md index 1ed21fa08..83e8714c8 100644 --- a/README.md +++ b/README.md @@ -57,6 +57,8 @@ Run this one-liner on your Proxmox host to create a lightweight LXC container: curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install.sh | bash ``` +Note: this installs the Pulse **server**. Agent installs use the command generated in **Settings β†’ Agents β†’ Installation commands** (served from `/install.sh` on your Pulse server). + ### Option 2: Docker ```bash docker run -d \ @@ -77,7 +79,7 @@ Access the dashboard at `http://:7655`. - **[API Reference](docs/API.md)**: Integrate Pulse with your own tools. - **[Architecture](ARCHITECTURE.md)**: High-level system design and data flow. - **[Troubleshooting](docs/TROUBLESHOOTING.md)**: Solutions to common issues. -- **[Agent Security](docs/AGENT_SECURITY.md)**: Details on signed updates and verification. +- **[Agent Security](docs/AGENT_SECURITY.md)**: Details on checksum-verified updates and verification. - **[Docker Monitoring](docs/DOCKER.md)**: Setup and management of Docker agents. ## 🌐 Community Integrations @@ -88,18 +90,21 @@ Community-maintained integrations and addons: ## πŸš€ Pulse Pro -**[Pulse Pro](https://pulserelay.pro)** unlocks **AI Patrol** β€” automated background monitoring that catches issues before they become outages. +**[Pulse Pro](https://pulserelay.pro)** unlocks **LLM-backed AI Patrol** β€” automated background monitoring that catches issues before they become outages. | Feature | Free | Pro | |---------|------|-----| | Real-time dashboard | βœ… | βœ… | | Threshold alerts | βœ… | βœ… | | AI Chat (BYOK) | βœ… | βœ… | -| **AI Patrol** (automated scans) | β€” | βœ… | -| Root cause analysis | β€” | βœ… | +| Heuristic Patrol (local rules) | βœ… | βœ… | +| **LLM-backed AI Patrol** | β€” | βœ… | +| Alert-triggered AI analysis | β€” | βœ… | +| Kubernetes AI analysis | β€” | βœ… | +| Auto-fix + autonomous mode | β€” | βœ… | | Priority support | β€” | βœ… | -AI Patrol runs on your schedule (every 15 minutes to every 24 hours) and finds: +AI Patrol runs on your schedule (every 10 minutes to every 7 days, default 6 hours) and finds: - ZFS pools approaching capacity - Backup jobs that silently failed - VMs stuck in restart loops @@ -108,8 +113,8 @@ AI Patrol runs on your schedule (every 15 minutes to every 24 hours) and finds: Technical highlights: - Cross-system context (nodes, VMs, backups, containers, and metrics history) -- Noise reduction via correlation and trend-aware checks -- Actionable findings with remediation hints +- LLM analysis for high-impact findings + alert-triggered deep dives +- Optional auto-fix with command safety policies and audit trail **[Try the live demo β†’](https://demo.pulserelay.pro)** or **[learn more at pulserelay.pro](https://pulserelay.pro)** diff --git a/SECURITY.md b/SECURITY.md index ad45574a1..ae91705f1 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -44,7 +44,7 @@ environment where `PULSE_DOCKER=true`/`/.dockerenv` is detected. Preferred option (no SSH keys, no proxy wiring): 1. Install the unified agent (`pulse-agent`) on each Proxmox host with Proxmox integration enabled. - - Use the UI to generate an install command in **Settings β†’ Agents**, or run: + - Use the UI to generate an install command in **Settings β†’ Agents β†’ Installation commands**, or run: ```bash curl -fsSL http://pulse.example.com:7655/install.sh | \ sudo bash -s -- --url http://pulse.example.com:7655 --token --enable-proxmox @@ -159,9 +159,8 @@ Environment="PULSE_TRUSTED_NETWORKS=192.168.1.0/24,10.0.0.0/24" ``` When configured: -- Access from trusted networks: no auth required -- Access from outside: authentication enforced -- Useful for: mixed home/remote access scenarios +- Access still requires authentication (no bypass). +- The trusted list only influences security posture warnings and diagnostics. ## Security Warning System @@ -193,8 +192,8 @@ If you're comfortable with your security setup, you can dismiss warnings: ### Security Features - **Logs**: token values masked with `***` in all outputs - **API**: frontend receives only `hasToken: true`, never actual values -- **Export**: requires a valid API token (`X-API-Token` header or `token` - parameter) to extract credentials +- **Export**: requires authentication (session, proxy auth, or `X-API-Token` + header) to extract credentials - **Migration**: use passphrase-protected export/import (see [Migration Guide](docs/MIGRATION.md)) - **Auto-migration**: unencrypted configs automatically migrate to encrypted @@ -205,7 +204,7 @@ If you're comfortable with your security setup, you can dismiss warnings: By default, configuration export/import is blocked. You have two options: ### Option 1: Create an API Token (Recommended) -Create a token in **Settings β†’ Security β†’ API Tokens**, then use it for exports. +Create a token in **Settings β†’ API Tokens**, then use it for exports. For automation-only environments, you can seed tokens via environment variables (legacy) and they will be persisted to `api_tokens.json` on startup. @@ -246,7 +245,7 @@ for sensitive data. - **Encryption**: credentials encrypted at rest (AES-256-GCM) - **Export protection**: exports always encrypted with a passphrase - **Minimum passphrase**: 12 characters required for exports -- **Security tab**: check status in *Settings β†’ Security* +- **Security tab**: check status in *Settings β†’ Security β†’ Overview* ### Enterprise Security (When Authentication Enabled) - **Password security** @@ -263,12 +262,18 @@ for sensitive data. - API-only mode supported (no password auth required) - **CSRF protection**: all state-changing operations require CSRF tokens - **Rate limiting** - - Auth endpoints: 10 attempts/minute per IP (returns `Retry-After` header) + - Auth endpoints: 10 attempts/minute per IP + - Config changes: 30 requests/minute per IP + - Exports: 5 requests per 5 minutes per IP + - Recovery operations: 3 requests per 10 minutes per IP + - Update checks/actions: 60 requests/minute per IP + - WebSocket connects: 30 requests/minute per IP - General API: 500 requests/minute per IP - - Real-time endpoints exempt for functionality - - All responses include rate limit headers: + - Public endpoints: 1000 requests/minute per IP + - 429 responses include rate limit headers: - `X-RateLimit-Limit`: Maximum requests per window - `X-RateLimit-Remaining`: Requests remaining in current window + - `X-RateLimit-Reset`: Window reset timestamp - `Retry-After`: Seconds to wait before retrying (on 429 responses) - **Account lockout** - Locks after 5 failed login attempts @@ -278,11 +283,11 @@ for sensitive data. - Manual reset available via API for admins - **Session management** - Secure HttpOnly cookies - - 24-hour session expiry + - 24-hour session expiry (30 days when "Remember me" is enabled) - Session invalidation on password change - **Security headers** - Content-Security-Policy - - X-Frame-Options: DENY + - X-Frame-Options: `DENY` by default (adjusted when `allowEmbedding` is enabled in system settings) - X-Content-Type-Options: nosniff - X-XSS-Protection: 1; mode=block - Referrer-Policy: strict-origin-when-cross-origin @@ -292,6 +297,7 @@ for sensitive data. - Rollback actions are logged with timestamps and metadata - Scheduler health escalations recorded in audit trail - Runtime logging configuration changes tracked + - Security status uses `PULSE_AUDIT_LOG=true` (or legacy `AUDIT_LOG_ENABLED=true`) to mark audit logging as active in the UI ### What's Encrypted in Exports - Node credentials (passwords, API tokens) @@ -315,8 +321,8 @@ together. ### Password Authentication #### Quick Security Setup (Recommended) -1. Navigate to *Settings β†’ Security*. -2. Click **Enable Security Now**. +1. Navigate to *Settings β†’ Security β†’ Authentication*. +2. Click **Setup**. 3. Enter username and password. 4. Save the generated API token (shown only once!). 5. Security is enabled immediately (no restart needed). @@ -331,12 +337,12 @@ This automatically: #### Manual Setup (Advanced) ```bash -# Using systemd (password will be hashed automatically) +# Using systemd (plain text will be auto-hashed) sudo systemctl edit pulse # Add: [Service] Environment="PULSE_AUTH_USER=admin" -Environment="PULSE_AUTH_PASS=$2a$12$..." # Use bcrypt hash, not plain text! +Environment="PULSE_AUTH_PASS=$2a$12$..." # Prefer bcrypt hash for production; plain text is auto-hashed. # Docker (credentials persist in volume via .env file) # IMPORTANT: Always quote bcrypt hashes to prevent shell expansion! @@ -348,7 +354,7 @@ docker run -e PULSE_AUTH_USER=admin -e PULSE_AUTH_PASS='$2a$12$...' rcourtman/pu #### Features - Web UI login required when authentication enabled -- Change/remove password from Settings β†’ Security +- Change/remove password from Settings β†’ Security β†’ Authentication - Passwords ALWAYS hashed with bcrypt (cost 12) - Session-based authentication with secure HttpOnly cookies - 24-hour session expiry @@ -394,10 +400,16 @@ docker run -e API_TOKENS=ansible-token,docker-agent-token rcourtman/pulse:latest # Include the ORIGINAL token (not hash) in X-API-Token header curl -H "X-API-Token: your-original-token" http://localhost:7655/api/health -# or in Authorization header (preferred for shared tooling) -curl -H "Authorization: Bearer your-original-token" http://localhost:7655/api/export +# Export config requires auth + passphrase (min 12 chars) +curl -X POST \ + -H "Content-Type: application/json" \ + -H "X-API-Token: your-original-token" \ + -d '{"passphrase":"use-a-strong-passphrase"}' \ + http://localhost:7655/api/config/export ``` +Most API endpoints also accept `Authorization: Bearer `, but export/import uses the `X-API-Token` header. + ### Auto-Registration Security #### Default Mode @@ -451,12 +463,13 @@ docker run \ ## CORS (Cross-Origin Resource Sharing) -By default, Pulse allows all origins (`ALLOWED_ORIGINS=*`). This is convenient for local setups, -but should be restricted in production. +By default, Pulse does **not** enable CORS (same-origin only). Configure allowed origins only when +you need cross-origin access (for example, a separate UI domain or external tooling). ### Configuring CORS for External Access -If you need to access Pulse API from a different domain: +If you need to access the Pulse API from a different domain, configure **Settings β†’ System β†’ Network** +or use environment overrides: ```bash # Docker @@ -475,6 +488,7 @@ Notes: - `ALLOWED_ORIGINS` supports a single origin or `*` (it is written directly to `Access-Control-Allow-Origin`). - In production, set a specific origin to avoid exposing the API to arbitrary sites. +- For local dev, Pulse auto-allows `http://localhost:5173` and `http://localhost:7655` when `NODE_ENV=development` or `PULSE_DEV=true`. ## Monitoring and Observability @@ -572,7 +586,7 @@ curl -X POST http://localhost:7655/api/security/reset-lockout \ **Rate limited?** Wait 1 minute and try again **Can't login?** Check `PULSE_AUTH_USER` and `PULSE_AUTH_PASS` environment variables **API access denied?** Verify the token you supplied matches one of the values created in *Settings β†’ API Tokens* (use the original token, not the hash) -**CORS errors?** Configure `ALLOWED_ORIGINS` for your domain -**Forgot password?** Start fresh – delete your Pulse data and restart +**CORS errors?** Configure Allowed Origins in the UI or set `ALLOWED_ORIGINS` for your domain +**Forgot password?** Remove `.env` and restart Pulse, then use the bootstrap token to set new credentials --- diff --git a/cmd/pulse-sensor-proxy/README.md b/cmd/pulse-sensor-proxy/README.md index 9899712fa..da7004729 100644 --- a/cmd/pulse-sensor-proxy/README.md +++ b/cmd/pulse-sensor-proxy/README.md @@ -150,11 +150,11 @@ all HTTP access attempts to the audit log. - Format: JSON with hash chaining (`prev_hash`, `event_hash`, `seq`) - Access: Owned by `pulse-sensor-proxy`, `0640`, `chattr +a` -Follow `docs/operations/audit-log-rotation.md` for rotation (remove `+a`, +Follow `docs/operations/AUDIT_LOG_ROTATION.md` for rotation (remove `+a`, truncate, restart service, reapply `+a`). Also consider forwarding with `scripts/setup-log-forwarding.sh`; see -`docs/operations/sensor-proxy-log-forwarding.md` for RELP/TLS forwarding -instructions and verification steps. +`docs/operations/SENSOR_PROXY_LOGS.md` for log forwarding and verification +steps. ## Metrics & Monitoring @@ -234,5 +234,5 @@ If you suspect config corruption (service won't start, temperatures stopped): sudo systemctl start pulse-sensor-proxy ``` -For additional hardening steps, read `docs/PULSE_SENSOR_PROXY_HARDENING.md` and -`docs/TEMPERATURE_MONITORING_SECURITY.md`. +For additional hardening steps, read `docs/security/SENSOR_PROXY_HARDENING.md` and +`docs/security/TEMPERATURE_MONITORING.md`. diff --git a/deploy/helm/pulse/values.schema.json b/deploy/helm/pulse/values.schema.json index 85b70ddd0..9c9c31ef0 100644 --- a/deploy/helm/pulse/values.schema.json +++ b/deploy/helm/pulse/values.schema.json @@ -153,7 +153,7 @@ "properties": { "enabled": { "type": "boolean", - "description": "Enable Docker monitoring agent" + "description": "Enable legacy pulse-docker-agent workload (deprecated)" }, "kind": { "type": "string", @@ -235,7 +235,7 @@ }, "path": { "type": "string", - "description": "Metrics endpoint path" + "description": "Metrics endpoint path on the main HTTP service (metrics listener is separate)" } } } diff --git a/deploy/helm/pulse/values.yaml b/deploy/helm/pulse/values.yaml index 6060998b7..8ff54d5e4 100644 --- a/deploy/helm/pulse/values.yaml +++ b/deploy/helm/pulse/values.yaml @@ -157,6 +157,8 @@ agent: failureThreshold: 3 # Monitoring configuration +# Note: The ServiceMonitor targets the main HTTP service (7655). Prometheus metrics are +# served on 9091 by the Pulse server, so scraping requires an additional Service. monitoring: serviceMonitor: enabled: false diff --git a/docs/AGENT_SECURITY.md b/docs/AGENT_SECURITY.md index 41c7c7e2f..5c462583b 100644 --- a/docs/AGENT_SECURITY.md +++ b/docs/AGENT_SECURITY.md @@ -10,13 +10,24 @@ The agent's self-update mechanism is critical for security and stability. To pre The agent verifies a SHA-256 checksum of the downloaded binary. The server must provide `X-Checksum-Sha256`; updates are rejected if the header is missing or mismatched. -### 2. Pre-Flight Checks -To prevent "brick-updates"β€”bad updates that crash immediately and require manual recoveryβ€”the agent performs a pre-flight check before replacing the running executable. +### 2. Signature Verification (Optional) +The legacy Docker agent supports optional Ed25519 signature verification when the server provides `X-Signature-Ed25519`. The unified agent relies on checksum verification only. Missing signatures are logged as a warning where supported. + +### 3. Pre-Flight Checks +To prevent "brick-updates"β€”bad updates that crash immediately and require manual recoveryβ€”agents perform pre-flight validation before replacing the running executable. + +Unified agent (`pulse-agent`): +1. Download new binary. +2. Verify checksum (required). +3. Validate binary magic (ELF/Mach-O/PE) and size limits (100MB max). +4. Make executable and swap atomically. + +Legacy Docker agent (`pulse-docker-agent`): 1. Download new binary. 2. Verify checksum (required). 3. Make executable. -4. **Execute with `--self-test`**: The agent attempts to run the new binary with a special flag that loads the configuration and verifies basic functionality. -5. If the self-test fails (exit code != 0), the update is aborted. +4. **Execute with `--self-test`** to validate startup. +5. If the self-test fails, the update is aborted. ## API Security diff --git a/docs/AI.md b/docs/AI.md index ffdd7e3c9..d082ac578 100644 --- a/docs/AI.md +++ b/docs/AI.md @@ -1,6 +1,6 @@ # Pulse AI -Pulse Pro unlocks **AI Patrol** for continuous, automated health checks. Learn more at https://pulserelay.pro. +Pulse Pro unlocks **AI Patrol** for continuous, automated health checks. Learn more at https://pulserelay.pro or see the technical overview in [PULSE_PRO.md](PULSE_PRO.md). ## What Patrol Actually Does (Technical) @@ -38,7 +38,7 @@ Alerts are threshold-based and narrow. Patrol is context-based and cross-system. ## Controls and Limits -- **Schedule**: from 15 minutes to 24 hours. +- **Schedule**: from 10 minutes to 7 days (default 6 hours). - **Scope**: only configured resources and connected agents. - **Safety**: command execution remains disabled by default. - **Cost control**: Pro uses model selection and rate limits; free tier uses heuristic-only Patrol. @@ -114,15 +114,20 @@ Patrol is **intentionally conservative** to avoid noise: ## Features - **Interactive chat**: Ask questions about current cluster state and get AI-assisted troubleshooting. -- **Patrol**: Background checks periodically (default: 15 minutes) that generate findings. Interval is fully configurable down to 15 minutes. -- **Alert analysis**: Optional token-efficient analysis when alerts fire. +- **Patrol**: Background checks periodically (default: 6 hours) that generate findings. Interval is configurable from 10 minutes to 7 days, or set to 0 to disable. +- **Alert-triggered analysis (Pro)**: Optional token-efficient analysis when alerts fire. +- **Kubernetes AI analysis (Pro)**: Deep cluster analysis beyond basic monitoring. - **Command execution**: When enabled, AI can run commands via connected agents. - **Finding management**: Dismiss, resolve, or suppress findings to prevent recurrence. - **Cost tracking**: Tracks token usage and supports monthly budget limits. +Alert-triggered analysis runs attach a timeline event to the alert, so investigations remain auditable alongside acknowledgements and remediation steps. + +> **License note**: Kubernetes AI analysis is gated by the `kubernetes_ai` Pulse Pro feature. + ## Configuration -Configure in the UI: **Settings β†’ AI** +Configure in the UI: **Settings β†’ System β†’ AI Assistant** AI settings are stored encrypted at rest in `ai.enc` under the Pulse config directory. Patrol findings and history are stored in `ai_findings.json`, `ai_patrol_runs.json`, and usage data in `ai_usage_history.json`. These files are located in `/etc/pulse` for systemd installs, or `/data` for Docker/Kubernetes. @@ -151,7 +156,7 @@ You can set separate models for: ## Patrol Service (Pro Feature) -Patrol runs automated health checks on a configurable schedule (default: every 15 minutes). It passes comprehensive infrastructure context to the LLM (see "Context Patrol Receives" above) and generates findings when issues are detected. +Patrol runs automated health checks on a configurable schedule (default: every 6 hours). It passes comprehensive infrastructure context to the LLM (see "Context Patrol Receives" above) and generates findings when issues are detected. Pulse Pro users get full LLM-powered analysis. Free users still benefit from **Heuristic Patrol**, which uses local rule-based logic to detect common issues (offline nodes, disk exhaustion, etc.) without requiring an external AI provider. Free users also get full access to the AI Chat assistant (BYOK). @@ -183,17 +188,25 @@ When chatting with AI about a patrol finding, the AI can: Pulse includes settings that control how "active" AI features are: -- **Autonomous mode**: When enabled, AI may execute safe commands without approval. -- **Patrol auto-fix**: Allows patrol to attempt automatic remediation. -- **Alert-triggered analysis**: Limits AI to analyzing specific events when alerts occur. +- **Autonomous mode (Pro)**: When enabled, AI may execute safe commands without approval. +- **Patrol auto-fix (Pro)**: Allows patrol to attempt automatic remediation. +- **Alert-triggered analysis (Pro)**: Limits AI to analyzing specific events when alerts occur. If you enable execution features, ensure agent tokens and scopes are appropriately restricted. +### Advanced Network Restrictions + +Pulse blocks AI tool HTTP fetches to loopback and link-local addresses by default. For local development, you can allow loopback targets: + +- `PULSE_AI_ALLOW_LOOPBACK=true` + +Use this only in trusted environments. + ## Troubleshooting | Issue | Solution | |-------|----------| -| AI not responding | Verify provider credentials in **Settings β†’ AI** | +| AI not responding | Verify provider credentials in **Settings β†’ System β†’ AI Assistant** | | No execution capability | Confirm at least one agent is connected | | Findings not persisting | Check Pulse has write access to `ai_findings.json` in the config directory | | Too many findings | This shouldn't happen - please report if it does | diff --git a/docs/API.md b/docs/API.md index aa9d537db..ac960ab9e 100644 --- a/docs/API.md +++ b/docs/API.md @@ -77,6 +77,24 @@ Returns version, build time, and update status. `POST /api/config/nodes/test-connection` Validate credentials before saving. +### Export Configuration +`POST /api/config/export` (admin or API token) +Request body: +```json +{ "passphrase": "use-a-strong-passphrase" } +``` +Returns an encrypted export bundle in `data`. Passphrases must be at least 12 characters. + +### Import Configuration +`POST /api/config/import` (admin) +Request body: +```json +{ + "data": "", + "passphrase": "use-a-strong-passphrase" +} +``` + --- ## πŸ“Š Metrics & Charts @@ -84,7 +102,7 @@ Validate credentials before saving. ### Chart Data `GET /api/charts?range=1h` Returns time-series data for CPU, Memory, and Storage. -**Ranges**: `1h`, `24h`, `7d`, `30d` +**Ranges**: `5m`, `15m`, `30m`, `1h`, `4h`, `12h`, `24h`, `7d` ### Storage Charts `GET /api/storage-charts` @@ -136,6 +154,10 @@ Triggers a test alert to all configured channels. ## πŸ›‘οΈ Security +### Security Status +`GET /api/security/status` +Returns authentication status, proxy auth state, and security posture flags. + ### List API Tokens `GET /api/security/tokens` @@ -157,6 +179,64 @@ Supports actions: `GET /api/security/recovery` returns recovery mode status. +### Reset Account Lockout (Admin) +`POST /api/security/reset-lockout` +```json +{ "identifier": "admin" } +``` +Identifier can be a username or IP address. + +### Regenerate API Token (Admin) +`POST /api/security/regenerate-token` + +Returns a new raw token (shown once) and updates stored hashes: +```json +{ + "success": true, + "token": "raw-token", + "deploymentType": "systemd", + "requiresRestart": false, + "message": "New API token generated and active immediately! Save this token - it won't be shown again." +} +``` + +### Validate API Token (Admin) +`POST /api/security/validate-token` +```json +{ "token": "raw-token" } +``` +Returns: +```json +{ "valid": true, "message": "Token is valid" } +``` + +### Bootstrap Token Validation (Public) +`POST /api/security/validate-bootstrap-token` + +Provide the token via header `X-Setup-Token` or JSON body: +```json +{ "token": "bootstrap-token" } +``` + +Returns `204 No Content` on success. + +### Quick Security Setup (Public, bootstrap token required) +`POST /api/security/quick-setup` + +Requires a valid bootstrap token (header `X-Setup-Token`) or an authenticated session. + +```json +{ + "username": "admin", + "password": "StrongPass!1", + "apiToken": "token", + "enableNotifications": false, + "darkMode": false, + "force": false, + "setupToken": "optional-bootstrap-token" +} +``` + --- ## βš™οΈ System Settings @@ -182,9 +262,9 @@ Returns scheduler health, DLQ, and breaker status. Requires `monitoring:read`. - `POST /api/updates/apply` - `GET /api/updates/status` - `GET /api/updates/stream` -- `GET /api/updates/plan` +- `GET /api/updates/plan?version=vX.Y.Z` (optional `channel`) - `GET /api/updates/history` -- `GET /api/updates/history/entry` +- `GET /api/updates/history/entry?id=` --- @@ -204,8 +284,29 @@ Initiate OIDC login flow. --- +## πŸ’³ License (Pulse Pro) + +### License Status (Admin) +`GET /api/license/status` + +### License Features (Authenticated) +`GET /api/license/features` + +### Activate License (Admin) +`POST /api/license/activate` +```json +{ "license_key": "PASTE_KEY_HERE" } +``` + +### Clear License (Admin) +`POST /api/license/clear` + +--- + ## πŸ€– Pulse AI *(v5)* +**Pro gating:** endpoints labeled "(Pro)" require a Pulse Pro license and return `402 Payment Required` if the feature is not licensed. + ### Get AI Settings `GET /api/settings/ai` Returns current AI configuration (providers, models, patrol status). Requires admin + `settings:read`. @@ -218,6 +319,12 @@ Configure AI providers, API keys, and preferences. Requires admin + `settings:wr `GET /api/ai/models` Lists models available to the configured providers (queried live from provider APIs). +### OAuth (Anthropic) +- `POST /api/ai/oauth/start` (admin) +- `POST /api/ai/oauth/exchange` (admin, manual code input) +- `GET /api/ai/oauth/callback` (public, IdP redirect) +- `POST /api/ai/oauth/disconnect` (admin) + ### Execute (Chat + Tools) `POST /api/ai/execute` Runs an AI request which may return tool calls, findings, or suggested actions. @@ -226,12 +333,30 @@ Runs an AI request which may return tool calls, findings, or suggested actions. `POST /api/ai/execute/stream` Streaming variant of execute (used by the UI for incremental responses). +### Kubernetes AI Analysis (Pro) +`POST /api/ai/kubernetes/analyze` +```json +{ "cluster_id": "cluster-id" } +``` +Requires a Pulse Pro license with the `kubernetes_ai` feature enabled. + ### Patrol - `GET /api/ai/patrol/status` - `GET /api/ai/patrol/findings` +- `DELETE /api/ai/patrol/findings` (clear all findings) - `GET /api/ai/patrol/history` -- `GET /api/ai/patrol/stream` -- `POST /api/ai/patrol/run` (admin) +- `GET /api/ai/patrol/runs` +- `GET /api/ai/patrol/stream` (Pro) +- `POST /api/ai/patrol/run` (admin, Pro) +- `POST /api/ai/patrol/acknowledge` (Pro) +- `POST /api/ai/patrol/dismiss` +- `POST /api/ai/patrol/resolve` +- `POST /api/ai/patrol/snooze` (Pro) +- `POST /api/ai/patrol/suppress` (Pro) +- `GET /api/ai/patrol/suppressions` (Pro) +- `POST /api/ai/patrol/suppressions` (Pro) +- `DELETE /api/ai/patrol/suppressions/{id}` (Pro) +- `GET /api/ai/patrol/dismissed` (Pro) ### Cost Tracking - `GET /api/ai/cost/summary` @@ -240,6 +365,8 @@ Streaming variant of execute (used by the UI for incremental responses). ## πŸ“ˆ Metrics Store (v5) +Auth required: `monitoring:read`. + ### Store Stats `GET /api/metrics-store/stats` Returns stats for the persistent metrics store (SQLite-backed). @@ -254,7 +381,12 @@ Returns historical metric series for a resource and time range. ### Unified Agent (Recommended) `GET /download/pulse-agent` -Downloads the unified agent binary for the current platform. +Downloads the unified agent binary. Without `arch`, Pulse serves the local binary on the server host. + +Optional query: +- `?arch=linux-amd64` (supported: `linux-amd64`, `linux-arm64`, `linux-armv7`, `linux-armv6`, `linux-386`, `darwin-amd64`, `darwin-arm64`, `windows-amd64`, `windows-arm64`, `windows-386`) + +The response includes `X-Checksum-Sha256` for verification. The unified agent combines host, Docker, and Kubernetes monitoring. Use `--enable-docker` or `--enable-kubernetes` to enable additional metrics. @@ -264,10 +396,25 @@ See [UNIFIED_AGENT.md](UNIFIED_AGENT.md) for installation instructions. `GET /install.sh` Serves the universal `install.sh` used to install `pulse-agent` on target machines. +### Unified Agent Installer (Windows) +`GET /install.ps1` +Serves the PowerShell installer for Windows. + ### Legacy Agents (Deprecated) `GET /download/pulse-host-agent` - *Deprecated, use pulse-agent* `GET /download/pulse-docker-agent` - *Deprecated, use pulse-agent --enable-docker* +Host-agent downloads accept `?platform=&arch=` and expose a checksum endpoint: +- `/download/pulse-host-agent.sha256?platform=linux&arch=amd64` + +Legacy install/uninstall scripts: +- `GET /install-docker-agent.sh` +- `GET /install-container-agent.sh` +- `GET /install-host-agent.sh` +- `GET /install-host-agent.ps1` +- `GET /uninstall-host-agent.sh` +- `GET /uninstall-host-agent.ps1` + ### Submit Reports `POST /api/agents/host/report` - Host metrics `POST /api/agents/docker/report` - Docker container metrics @@ -275,4 +422,17 @@ Serves the universal `install.sh` used to install `pulse-agent` on target machin --- +## 🌑️ Temperature Proxy (Legacy) + +These endpoints are only available when legacy `pulse-sensor-proxy` support is enabled. + +- `POST /api/temperature-proxy/register` (proxy registration) +- `GET /api/temperature-proxy/authorized-nodes` (proxy sync) +- `DELETE /api/temperature-proxy/unregister` (admin) +- `GET /api/temperature-proxy/install-command` (admin, `settings:write`) +- `GET /api/temperature-proxy/host-status` (admin, `settings:read`) + +Legacy migration helper: +- `GET /api/install/migrate-temperature-proxy.sh` + > **Note**: This is a summary of the most common endpoints. For a complete list, inspect the network traffic of the Pulse dashboard or check the source code in `internal/api/router.go`. diff --git a/docs/AUTO_UPDATE.md b/docs/AUTO_UPDATE.md index e5fdce7b8..1b18300a2 100644 --- a/docs/AUTO_UPDATE.md +++ b/docs/AUTO_UPDATE.md @@ -49,24 +49,23 @@ In **Settings β†’ System β†’ Updates**: | Setting | Description | |---------|-------------| | **Update Channel** | Stable (recommended) or Release Candidate | -| **Auto-Check** | Automatically check for updates daily | +| **Auto-Check** | Stored UI preference (server currently checks for updates hourly regardless) | -### Environment Variables +### Stored Settings (system.json) -```bash -# Enable one-click updates -AUTO_UPDATE_ENABLED=true +Auto-update preferences are stored in `system.json` and edited via the UI. -# Use release candidate channel -UPDATE_CHANNEL=rc - -# Adjust automatic check cadence (duration string) -AUTO_UPDATE_CHECK_INTERVAL=24h - -# Schedule daily checks (HH:MM, 24h) -AUTO_UPDATE_TIME=03:00 +```json +{ + "autoUpdateEnabled": false, + "updateChannel": "stable", + "autoUpdateCheckInterval": 24, + "autoUpdateTime": "03:00" +} ``` +**Note:** `autoUpdateTime` is stored for UI reference. The systemd timer still runs on its own schedule (02:00 + jitter). In-app update checks are driven by `autoUpdateCheckInterval`. + ## Manual Update Methods ### Docker @@ -76,21 +75,27 @@ AUTO_UPDATE_TIME=03:00 docker pull rcourtman/pulse:latest # Restart container -docker-compose down && docker-compose up -d +docker compose down && docker compose up -d ``` +If you use the legacy `docker-compose` binary, replace `docker compose` with `docker-compose`. + ### ProxmoxVE LXC (Manual) ```bash curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install.sh | bash ``` +This script installs/updates the **Pulse server**. Agent updates use the `/install.sh` command generated in **Settings β†’ Agents β†’ Installation commands**. + ### Systemd Service (Manual) ```bash curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install.sh | bash ``` +This script installs/updates the **Pulse server**. Agent updates use the `/install.sh` command generated in **Settings β†’ Agents β†’ Installation commands**. + ### Source Build ```bash @@ -111,16 +116,26 @@ Pulse creates a backup before updating. If the update fails: 3. Error details are logged ### Manual Rollback -If rollback is supported for your deployment, use the **Rollback** action from the update history in **Settings β†’ System β†’ Updates**. +Backups created by in-app updates are stored as `backup-/` folders inside the Pulse data directory (`/etc/pulse` or `/data`). If that directory is not writable, Pulse falls back to `/tmp/pulse-backup-`. +There is no rollback UI. To revert, stop Pulse, restore the backup contents to `/opt/pulse`, then restart. -Backups are stored as `backup-/` folders inside the Pulse data directory (`/etc/pulse` or `/data`). +Example (systemd/LXC): +```bash +sudo systemctl stop pulse +sudo cp -a /etc/pulse/backup-/pulse /opt/pulse/pulse +sudo cp -a /etc/pulse/backup-/VERSION /opt/pulse/VERSION +sudo rm -rf /opt/pulse/data /opt/pulse/config +sudo cp -a /etc/pulse/backup-/data /opt/pulse/data +sudo cp -a /etc/pulse/backup-/config /opt/pulse/config +sudo cp -a /etc/pulse/backup-/.env /opt/pulse/.env +sudo systemctl start pulse +``` ## Update History -View past updates in **Settings β†’ System β†’ Updates β†’ Update History**: -- Previous versions installed -- Update timestamps -- Success/failure status +History entries are stored in `update-history.jsonl` under the Pulse data directory (`/etc/pulse` or `/data`), and exposed via `GET /api/updates/history` (admin auth required). + +Systemd/LXC update runs write detailed logs to `/var/log/pulse/update-.log`. ## Troubleshooting @@ -131,7 +146,7 @@ View past updates in **Settings β†’ System β†’ Updates β†’ Update History**: ### Update failed 1. Check the error message in the progress modal -2. Review logs: `journalctl -u pulse -n 100` +2. Review logs: `journalctl -u pulse -n 100` or `/var/log/pulse/update-.log` 3. Verify disk space is available 4. Check network connectivity to GitHub diff --git a/docs/CONFIGURATION.md b/docs/CONFIGURATION.md index cbcb82e3f..0c90a0b49 100644 --- a/docs/CONFIGURATION.md +++ b/docs/CONFIGURATION.md @@ -5,6 +5,7 @@ Pulse uses a split-configuration model to ensure security and flexibility. | File | Purpose | Security Level | |------|---------|----------------| | `.env` | Authentication & Secrets | πŸ”’ **Critical** (Read-only by owner) | +| `.encryption.key` | Encryption key for `.enc` files | πŸ”’ **Critical** | | `system.json` | General Settings | πŸ“ Standard | | `nodes.enc` | Node Credentials | πŸ”’ **Encrypted** (AES-256-GCM) | | `alerts.json` | Alert Rules | πŸ“ Standard | @@ -13,7 +14,18 @@ Pulse uses a split-configuration model to ensure security and flexibility. | `apprise.enc` | Apprise notification config | πŸ”’ **Encrypted** | | `oidc.enc` | OIDC provider config | πŸ”’ **Encrypted** | | `api_tokens.json` | API token records (hashed) | πŸ”’ **Sensitive** | +| `env_token_suppressions.json` | Suppressed legacy env tokens (migration aid) | πŸ“ Standard | | `ai.enc` | AI settings and credentials | πŸ”’ **Encrypted** | +| `ai_findings.json` | AI Patrol findings | πŸ“ Standard | +| `ai_patrol_runs.json` | AI Patrol run history | πŸ“ Standard | +| `ai_usage_history.json` | AI usage history | πŸ“ Standard | +| `license.enc` | Pulse Pro license key | πŸ”’ **Encrypted** | +| `host_metadata.json` | Host notes, tags, and AI command overrides | πŸ“ Standard | +| `docker_metadata.json` | Docker metadata cache | πŸ“ Standard | +| `guest_metadata.json` | Guest notes and metadata | πŸ“ Standard | +| `recovery_tokens.json` | Recovery tokens (short-lived) | πŸ”’ **Sensitive** | +| `sessions.json` | Persistent sessions (includes OIDC refresh tokens) | πŸ”’ **Sensitive** | +| `update-history.jsonl` | Update history log (in-app updates) | πŸ“ Standard | | `metrics.db` | Persistent metrics history (SQLite) | πŸ“ Standard | All files are located in `/etc/pulse/` (Systemd) or `/data/` (Docker/Kubernetes) by default. @@ -31,7 +43,7 @@ This file controls access to Pulse. It is **never** exposed to the UI. ```bash # /etc/pulse/.env -# Admin Credentials (bcrypt hashed) +# Admin Credentials (bcrypt hashed; plain text auto-hashes on startup) PULSE_AUTH_USER='admin' PULSE_AUTH_PASS='$2a$12$...' @@ -92,20 +104,25 @@ Environment overrides (lock the corresponding UI fields): Controls runtime behavior like ports, logging, and polling intervals. Most of these can be changed in **Settings β†’ System**.
-Full Configuration Reference +Example system.json ```json { "pvePollingInterval": 10, // Seconds - "backendPort": 3000, // Internal port (default: 3000) + "backendPort": 3000, // Legacy (unused) "frontendPort": 7655, // Public port "logLevel": "info", // debug, info, warn, error "autoUpdateEnabled": false, // Enable auto-update checks - "adaptivePollingEnabled": false // Smart polling for large clusters + "adaptivePollingEnabled": false, // Smart polling for large clusters + "allowedOrigins": "", // CORS allowlist (single origin or "*") + "allowEmbedding": false, // Allow iframe embedding + "allowedEmbedOrigins": "", // Comma-separated origins for iframe embedding + "webhookAllowedPrivateCIDRs": "" // Allowlist for private webhook targets } ``` > **Note**: `logFormat` is only configurable via the `LOG_FORMAT` environment variable, not in `system.json`. +> **Note**: `autoUpdateTime` is stored by the UI, but the systemd timer uses its own schedule.
### Common Overrides (Environment Variables) @@ -114,6 +131,9 @@ Environment variables take precedence over `system.json`. | Variable | Description | Default | |----------|-------------|---------| | `FRONTEND_PORT` | Public listening port | `7655` | +| `PORT` | Legacy alias for `FRONTEND_PORT` | *(unset)* | +| `BACKEND_HOST` | Bind host for the HTTP server and metrics listener (advanced) | *(unset)* | +| `BACKEND_PORT` | Legacy internal API port (unused) | `3000` | | `LOG_LEVEL` | Log verbosity (see below) | `info` | | `LOG_FORMAT` | Log output format (`auto`, `json`, `console`) | `auto` | @@ -130,17 +150,35 @@ Environment variables take precedence over `system.json`. | Variable | Description | Default | |----------|-------------|---------| -| `PULSE_PUBLIC_URL` | URL for UI links, notifications, and OIDC. **Reverse proxy setups**: set this to the direct/internal Pulse URL (e.g., `http://192.168.1.10:7655`) so agents connect directly instead of via the proxy. | Auto-detected | +| `PULSE_PUBLIC_URL` | URL for UI links, notifications, and OIDC. For reverse proxies, keep this as the public URL and use `PULSE_AGENT_CONNECT_URL` for agent installs if you need a direct/internal address. | Auto-detected | | `PULSE_AGENT_CONNECT_URL` | Dedicated direct URL for agents (overrides `PULSE_PUBLIC_URL` for agent install commands). Alias: `PULSE_AGENT_URL`. | *(unset)* | -| `ALLOWED_ORIGINS` | CORS allowed domains | `*` | -| `IFRAME_EMBEDDING_ALLOW` | Iframe embedding policy (`SAMEORIGIN`, `ALLOWALL`, etc.) | `SAMEORIGIN` | +| `ALLOWED_ORIGINS` | CORS allowed origin (`*` or a single origin). Empty = same-origin only. | *(unset)* | | `DISCOVERY_ENABLED` | Auto-discover nodes | `false` | | `DISCOVERY_SUBNET` | CIDR or `auto` | `auto` | +| `DISCOVERY_ENVIRONMENT_OVERRIDE` | Force discovery environment (`auto`, `native`, `docker_host`, `docker_bridge`, `lxc_privileged`, `lxc_unprivileged`) | `auto` | +| `DISCOVERY_SUBNET_ALLOWLIST` | Comma-separated CIDRs allowed for discovery | *(empty)* | +| `DISCOVERY_SUBNET_BLOCKLIST` | Comma-separated CIDRs excluded from discovery | `169.254.0.0/16` | +| `DISCOVERY_MAX_HOSTS_PER_SCAN` | Max hosts to scan per run | `1024` | +| `DISCOVERY_MAX_CONCURRENT` | Max concurrent discovery probes | `50` | +| `DISCOVERY_ENABLE_REVERSE_DNS` | Enable reverse DNS lookup (`true`/`false`) | `true` | +| `DISCOVERY_SCAN_GATEWAYS` | Include gateway IPs in discovery (`true`/`false`) | `true` | +| `DISCOVERY_DIAL_TIMEOUT_MS` | TCP dial timeout (ms) | `1000` | +| `DISCOVERY_HTTP_TIMEOUT_MS` | HTTP probe timeout (ms) | `2000` | | `PULSE_ENABLE_SENSOR_PROXY` | Enable legacy `pulse-sensor-proxy` endpoints (deprecated, unsupported) | `false` | | `PULSE_AUTH_HIDE_LOCAL_LOGIN` | Hide username/password form | `false` | | `DEMO_MODE` | Enable read-only demo mode | `false` | | `PULSE_TRUSTED_PROXY_CIDRS` | Comma-separated IPs/CIDRs trusted to supply `X-Forwarded-For`/`X-Real-IP` | *(unset)* | -| `PULSE_TRUSTED_NETWORKS` | Comma-separated CIDRs treated as trusted local networks | *(unset)* | +| `PULSE_TRUSTED_NETWORKS` | Comma-separated CIDRs treated as trusted local networks (does not bypass auth) | *(unset)* | +| `PULSE_SENSOR_PROXY_SOCKET` | Legacy sensor-proxy socket override (deprecated) | *(unset)* | + +### Iframe Embedding (system.json) + +Embedding is controlled by `system.json` and the UI (**Settings β†’ System β†’ Network**): + +- `allowEmbedding` (boolean): enables iframe embedding +- `allowedEmbedOrigins` (comma-separated): restricts `frame-ancestors` when embedding is enabled + +When `allowEmbedding` is `false`, Pulse sends `X-Frame-Options: DENY` and `frame-ancestors 'none'`. ### Monitoring Overrides @@ -150,7 +188,7 @@ Environment variables take precedence over `system.json`. | `PBS_POLLING_INTERVAL` | PBS metrics polling frequency | `60s` | | `PMG_POLLING_INTERVAL` | PMG metrics polling frequency | `60s` | | `CONCURRENT_POLLING` | Enable concurrent polling for multi-node clusters | `true` | -| `CONNECTION_TIMEOUT` | API connection timeout | `45s` | +| `CONNECTION_TIMEOUT` | API connection timeout | `60s` | | `BACKUP_POLLING_CYCLES` | Poll cycles between backup checks | `10` | | `ENABLE_BACKUP_POLLING` | Enable backup job monitoring | `true` | | `BACKUP_POLLING_INTERVAL` | Backup polling frequency | `0` (Auto) | @@ -178,14 +216,41 @@ Environment variables take precedence over `system.json`. | `LOG_MAX_AGE` | Log file retention (days) | `30` | | `LOG_COMPRESS` | Compress rotated logs | `true` | -### Update Settings +### Update Settings (system.json) + +These are stored in `system.json` and managed via the UI. + +| Key | Description | Default | +|-----|-------------|---------| +| `updateChannel` | Update channel (`stable` or `rc`) | `stable` | +| `autoUpdateEnabled` | Allow one-click updates | `false` | +| `autoUpdateCheckInterval` | Stored UI preference (server currently checks hourly) | `24` | +| `autoUpdateTime` | Stored UI preference (systemd timer has its own schedule) | `03:00` | + +### Auto-Import (Bootstrap) + +You can auto-import an encrypted backup on first startup. This is useful for automated provisioning and test environments. + +| Variable | Description | +|----------|-------------| +| `PULSE_INIT_CONFIG_DATA` | Base64 or raw contents of an export bundle (auto-imports on first start) | +| `PULSE_INIT_CONFIG_FILE` | Path to an export bundle on disk (auto-imports on first start) | +| `PULSE_INIT_CONFIG_PASSPHRASE` | Passphrase for the export bundle (required) | + +> **Note**: `PULSE_INIT_CONFIG_URL` is only supported by the hidden `pulse config auto-import` command, not by the server startup auto-import. + +### Developer/Test Overrides (Environment Variables) + +These are primarily for development or test harnesses and should not be used in production. | Variable | Description | Default | |----------|-------------|---------| -| `UPDATE_CHANNEL` | Update channel (`stable` or `rc`) | `stable` | -| `AUTO_UPDATE_ENABLED` | Allow one-click updates | `false` | -| `AUTO_UPDATE_CHECK_INTERVAL` | Auto-check interval | `24h` | -| `AUTO_UPDATE_TIME` | Scheduled check time (HH:MM) | `03:00` | +| `PULSE_UPDATE_SERVER` | Override update server base URL (testing only) | *(unset)* | +| `PULSE_UPDATE_STAGE_DELAY_MS` | Adds artificial delays between update stages (testing only) | *(unset)* | +| `PULSE_ALLOW_DOCKER_UPDATES` | Expose update UI/actions in Docker (debug only) | `false` | +| `PULSE_AI_ALLOW_LOOPBACK` | Allow AI tool HTTP fetches to loopback addresses | `false` | +| `PULSE_LICENSE_PUBLIC_KEY` | Override embedded license public key (base64, dev only) | *(unset)* | +| `PULSE_LICENSE_DEV_MODE` | Skip license verification (development only) | `false` | ### Metrics Retention (Tiered) @@ -204,7 +269,7 @@ See [METRICS_HISTORY.md](METRICS_HISTORY.md) for details. Pulse uses a powerful alerting engine with hysteresis (separate trigger/clear thresholds) to prevent flapping. -**Managed via UI**: Settings β†’ Alerts β†’ Thresholds +**Managed via UI**: Alerts β†’ Thresholds
Manual Configuration (JSON) diff --git a/docs/DEPLOYMENT_MODELS.md b/docs/DEPLOYMENT_MODELS.md index e7dbdcf70..ab91ad1e8 100644 --- a/docs/DEPLOYMENT_MODELS.md +++ b/docs/DEPLOYMENT_MODELS.md @@ -23,9 +23,21 @@ Docker and Kubernetes do not publish `9091` unless you explicitly expose it. Pulse uses a split config model: - **Local auth and secrets**: `.env` (managed by Quick Security Setup or environment overrides, not shown in the UI) +- **Encryption key**: `.encryption.key` (required to decrypt `.enc` files) - **System settings**: `system.json` (editable in the UI unless locked by env) - **Nodes and credentials**: `nodes.enc` (encrypted) +- **Notification config**: `email.enc`, `webhooks.enc`, `apprise.enc` (encrypted) +- **API tokens**: `api_tokens.json` +- **Legacy token suppressions**: `env_token_suppressions.json` - **AI config**: `ai.enc` (encrypted) +- **AI patrol data**: `ai_findings.json`, `ai_patrol_runs.json`, `ai_usage_history.json` +- **Pulse Pro license**: `license.enc` (encrypted) +- **Host metadata**: `host_metadata.json` +- **Docker metadata**: `docker_metadata.json` +- **Guest metadata**: `guest_metadata.json` +- **Sessions**: `sessions.json` (persistent sessions, sensitive) +- **Recovery tokens**: `recovery_tokens.json` +- **Update history**: `update-history.jsonl` - **Metrics history**: `metrics.db` (SQLite) Path mapping: diff --git a/docs/DOCKER.md b/docs/DOCKER.md index 972fc023e..535a0d99c 100644 --- a/docs/DOCKER.md +++ b/docs/DOCKER.md @@ -47,7 +47,7 @@ Run with: `docker compose up -d` ## βš™οΈ Configuration -Pulse is configured via environment variables. +Pulse is configured via the UI (`system.json`) with optional environment overrides. | Variable | Description | Default | |----------|-------------|---------| @@ -56,7 +56,7 @@ Pulse is configured via environment variables. | `PULSE_AUTH_PASS` | Admin Password | *(unset)* | | `API_TOKENS` | Comma-separated API tokens (**legacy**) | *(unset)* | | `DISCOVERY_SUBNET` | Custom CIDR to scan | *(auto)* | -| `ALLOWED_ORIGINS` | CORS allowed domains | `*` | +| `ALLOWED_ORIGINS` | CORS allowed origin (`*` or a single origin). Empty = same-origin only. | *(unset)* | | `LOG_LEVEL` | Log verbosity (`debug`, `info`, `warn`, `error`) | `info` | | `PULSE_DISABLE_DOCKER_UPDATE_ACTIONS` | Hide Docker update buttons (read-only mode) | `false` | @@ -113,7 +113,7 @@ Pulse can detect and apply updates to your Docker containers directly from the U 1. **Update Detection**: Pulse compares the local image digest with the latest digest from the container registry 2. **Visual Indicator**: Containers with available updates show a blue upward arrow icon 3. **One-Click Update**: Click the update button, confirm, and Pulse handles the rest -4. **Batch Updates**: Use the **"Update All"** button in the filter bar to update multiple containers safely in sequence +4. **Batch Updates**: Use the **"Update All"** button in the filter bar to queue updates for multiple containers ### Updating a Container @@ -125,16 +125,15 @@ Pulse can detect and apply updates to your Docker containers directly from the U - Stop the current container - Create a backup (renamed with `_pulse_backup_` suffix) - Start a new container with the same configuration - - Clean up the backup after 5 minutes + - Clean up the backup after 15 minutes (if the update succeeds) ### Batch Updates When multiple containers have updates available, an **"Update All"** button appears in the filter bar. 1. Click **"Update All"** -2. Confirm the action in the toast notification -3. Pulse queues the updates and processes them in parallel batches (default 5 concurrent updates) -4. A progress indicator shows the status of the batch operation -5. Failed updates are pushed to the end of the queue and reported in the final summary +2. Click again within 3 seconds to confirm +3. Pulse queues update commands for each container (they run on the next agent report cycle) +4. A toast summary reports how many updates were queued or failed ### Safety Features @@ -177,7 +176,7 @@ services: To disable registry checks entirely, set `PULSE_DISABLE_DOCKER_UPDATE_CHECKS=true` on the **agent**. -You can also toggle "Hide Docker Update Buttons" from the UI: **Settings β†’ Agents β†’ Docker Settings**. +You can also toggle "Hide Docker Update Buttons" from the UI: **Settings β†’ Agents** (Docker Settings card). --- diff --git a/docs/FAQ.md b/docs/FAQ.md index 14e4faf64..892293c5b 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -9,6 +9,8 @@ If you run Proxmox VE, use the official LXC installer (recommended): curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install.sh | bash ``` +Note: this installs the Pulse **server**. Agent installs use the command from **Settings β†’ Agents β†’ Installation commands** (served from `/install.sh` on your Pulse server). + If you prefer Docker: ```bash @@ -20,8 +22,8 @@ See [INSTALL.md](INSTALL.md) for all options (Docker Compose, Kubernetes, system ### How do I add a node? Go to **Settings β†’ Proxmox**. -- **Recommended (Agent setup)**: choose **Setup mode: Agent** and run the generated install command on the Proxmox host. -- **Manual**: choose **Setup mode: Manual** and enter the credentials (password or API token) for the Proxmox API. +- **Recommended (Agent setup)**: select **Agent Install** and run the generated install command on the Proxmox host. +- **Manual**: use **Username & Password**, or select the **Manual** tab and enter API token credentials. If you want Pulse to find servers automatically, enable discovery in **Settings β†’ System β†’ Network** and then return to **Settings β†’ Proxmox** to review discovered servers. @@ -39,8 +41,8 @@ If a setting is disabled with an amber warning, it's being overridden by an envi ### What is Pulse Pro, and what does it actually do? Pulse Pro unlocks **AI Patrol** β€” scheduled, cross-system analysis that correlates real-time state, recent metrics history, and diagnostics to surface actionable findings. -Example output includes trend-based capacity warnings, backup regressions, and correlated container failures that simple threshold alerts miss. -See [AI Patrol](AI.md) and https://pulserelay.pro. +Example output includes trend-based capacity warnings, backup regressions, Kubernetes AI cluster analysis, and correlated container failures that simple threshold alerts miss. +See [AI Patrol](AI.md), [Pulse Pro technical overview](PULSE_PRO.md), and https://pulserelay.pro. ### Why do VMs show "-" for disk usage? Proxmox API returns `0` for VM disk usage by default. You must install the **QEMU Guest Agent** inside the VM and enable it in Proxmox (VM β†’ Options β†’ QEMU Guest Agent). @@ -101,7 +103,7 @@ Yes. Pulse supports OIDC in **Settings β†’ Security β†’ Single Sign-On** and Pro - Verify the port (default 7655) is open on your firewall. ### CORS errors? -Set `ALLOWED_ORIGINS=https://your-domain.com` environment variable if accessing Pulse from a different domain. +Pulse defaults to same-origin only. If you access the API from a different domain, set **Settings β†’ System β†’ Network β†’ Allowed Origins** or use `ALLOWED_ORIGINS` (single origin, or `*` if you explicitly want all origins). ### High memory usage? If you are storing long history windows, reduce metrics retention (see [METRICS_HISTORY.md](METRICS_HISTORY.md)). Also confirm your polling intervals match your environment size. diff --git a/docs/INSTALL.md b/docs/INSTALL.md index a400e5a33..e59671ea6 100644 --- a/docs/INSTALL.md +++ b/docs/INSTALL.md @@ -13,6 +13,8 @@ Run this on your Proxmox host: curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install.sh | bash ``` +> **Note**: The GitHub `install.sh` is the **server** installer. The agent installer is served from your Pulse server at `/install.sh` (see **Settings β†’ Agents β†’ Installation commands**). + ### Docker Ideal for containerized environments or testing. @@ -38,7 +40,6 @@ services: - "7655:7655" volumes: - pulse_data:/data - - /var/run/docker.sock:/var/run/docker.sock # Optional: Monitor local Docker environment: - PULSE_AUTH_USER=admin - PULSE_AUTH_PASS=secret123 @@ -48,6 +49,7 @@ volumes: ``` > **Note**: Plain text passwords set via `PULSE_AUTH_PASS` are auto-hashed on startup. For production, prefer Quick Security Setup or a pre-hashed bcrypt value. +> **Note**: Docker monitoring requires the unified agent on the Docker host with socket access; the Pulse server container does not need `/var/run/docker.sock`. See [UNIFIED_AGENT.md](UNIFIED_AGENT.md). --- @@ -70,6 +72,8 @@ For Linux servers (VM or bare metal), use the official installer: curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install.sh | sudo bash ``` +> **Note**: This installs the Pulse server. Use the `/install.sh` endpoint from your Pulse UI for installing `pulse-agent` on monitored hosts. +
Manual systemd install (advanced) @@ -120,7 +124,10 @@ Pulse is secure by default. On first launch, you must retrieve a **Bootstrap Tok ### Step 2: Create Admin Account 1. Open `http://:7655` 2. Paste the **Bootstrap Token**. -3. Create your **Admin Username** and **Password**. +3. Complete the **Quick Security Setup** wizard. + - Set your **Admin Username** and **Password** (or let Pulse generate one). + - Pulse generates an **API token** for agents and automations. + - Copy the credentials before leaving the page. > **Note**: If you configure authentication via environment variables (`PULSE_AUTH_USER`/`PULSE_AUTH_PASS` and/or legacy `API_TOKENS`), the bootstrap token is automatically removed and this step is skipped. @@ -128,7 +135,7 @@ Pulse is secure by default. On first launch, you must retrieve a **Bootstrap Tok ## πŸ”„ Updates -### Automatic Updates (Systemd only) +### Automatic Updates (Systemd/LXC only) Pulse can self-update to the latest stable version. **Enable via UI**: Settings β†’ System β†’ Updates @@ -143,7 +150,7 @@ Pulse can self-update to the latest stable version. ### Rollback If an update causes issues on systemd installations, backups are created automatically during the update process. -**Manual rollback**: Check for backup directories at `/etc/pulse/backup-/` created during updates. Restore the previous binary manually if needed. +**Manual rollback**: In-app updates store backups under `/etc/pulse/backup-/`. The systemd auto-update timer uses a temporary `/tmp/pulse-backup-` during the update and auto-restores on failure. --- diff --git a/docs/KUBERNETES.md b/docs/KUBERNETES.md index 5d591d739..d07afd7d0 100644 --- a/docs/KUBERNETES.md +++ b/docs/KUBERNETES.md @@ -33,9 +33,17 @@ Configure via `values.yaml` or `--set` flags. | `ingress.enabled` | Enable Ingress | `false` | | `persistence.enabled` | Enable PVC for /data | `true` | | `persistence.size` | PVC Size | `8Gi` | -| `agent.enabled` | Enable legacy docker agent workload | `false` | +| `agent.enabled` | Enable legacy `pulse-docker-agent` workload (deprecated) | `false` | -> Note: the `agent.*` block is legacy and currently references `pulse-docker-agent`. For new deployments, prefer the unified agent (`pulse-agent`) where possible. +> Note: the `agent.*` block is legacy and references `pulse-docker-agent`. For new deployments, prefer the unified agent (`pulse-agent`) where possible. + +### Prometheus Metrics + +The Helm chart exposes only the main HTTP port (`7655`). Prometheus metrics are served on a separate listener (`9091`) and are **not** exposed by default. + +If you want to scrape metrics: +1. Expose port `9091` with an additional Service. +2. Point your `ServiceMonitor` at that service/port (the built-in ServiceMonitor targets the HTTP service by default). ### Example `values.yaml` diff --git a/docs/MAIL_GATEWAY.md b/docs/MAIL_GATEWAY.md index 1f3b9a119..f6255b6bb 100644 --- a/docs/MAIL_GATEWAY.md +++ b/docs/MAIL_GATEWAY.md @@ -19,8 +19,8 @@ Pulse 5.0 adds support for monitoring Proxmox Mail Gateway instances alongside y 4. Enter connection details: - Host: Your PMG IP or hostname - Port: 8006 (default) - - API Token ID: e.g., `root@pmg!pulse` (format: `@!`) - - API Token Secret: Your token secret (shown once when you create the token) + - Username: e.g., `root@pam` or a dedicated `api@pmg` user + - Password: the PMG account password ### Via Discovery @@ -31,14 +31,13 @@ Pulse can automatically discover PMG instances on your network: 3. PMG instances on port 8006 are detected and shown in the Proxmox discovery panels 4. Click a discovered PMG server to add it -## API Token Setup on PMG +## Service Account Setup on PMG -Create an API token on your PMG server (recommended). The easiest method is via the PMG web UI: +PMG does not support API tokens. Use a dedicated PMG user with read-only access if possible: -- Create a token for a user (for example `root@pmg`) -- Copy the token secret when it is displayed (it is typically shown once) - -If you see 403/permission errors, start by testing with a token for an admin user to confirm connectivity, then tighten permissions once you know which PMG endpoints your instance requires. +- Create a user in the PMG UI (or CLI) such as `api@pmg`. +- Assign the minimum permissions needed to read mail statistics and cluster status. +- Use that username and password when adding the node in Pulse. ## Dashboard @@ -60,7 +59,7 @@ The Mail Gateway tab shows: ## Alerts -Configure alerts for PMG metrics in **Settings β†’ Alerts**: +Configure alerts for PMG metrics in **Alerts β†’ Thresholds**: - Queue depth exceeding threshold - Spam rate spike @@ -80,7 +79,7 @@ Monitor multiple PMG instances from a single Pulse dashboard: ### Connection refused 1. Verify PMG is accessible on port 8006 2. Check firewall rules -3. Ensure API token has correct permissions +3. Ensure the PMG user/password is correct and has read permissions ### No statistics showing 1. Wait for initial data collection (may take 1-2 polling cycles) @@ -89,5 +88,5 @@ Monitor multiple PMG instances from a single Pulse dashboard: ### Cluster nodes missing 1. PMG cluster must be properly configured -2. API token needs cluster-wide permissions +2. The PMG user needs cluster-wide permissions 3. All nodes must be reachable from Pulse diff --git a/docs/METRICS_HISTORY.md b/docs/METRICS_HISTORY.md index 5e1bd1e06..6d2de4d07 100644 --- a/docs/METRICS_HISTORY.md +++ b/docs/METRICS_HISTORY.md @@ -52,6 +52,8 @@ Pulse exposes the persistent metrics store via: - `GET /api/metrics-store/stats` - `GET /api/metrics-store/history` +These endpoints require authentication with the `monitoring:read` scope. + ### History Query Parameters `GET /api/metrics-store/history` supports: diff --git a/docs/MIGRATION.md b/docs/MIGRATION.md index acb0bedcb..b2346bbfc 100644 --- a/docs/MIGRATION.md +++ b/docs/MIGRATION.md @@ -28,9 +28,15 @@ Never copy `/etc/pulse` (or `/data` in Docker/Kubernetes) manually. Encryption k | Alerts & overrides | Browser sessions and local cookies | | Notifications (email, webhooks, Apprise) | Local login username/password (`.env`) | | System settings (`system.json`) | Update history/backup folders | -| API token records | | -| OIDC config | | -| Guest metadata/notes | | +| API token records | β€” | +| OIDC config | β€” | +| Guest metadata/notes | β€” | +| β€” | Host metadata (notes/tags/AI command overrides) | +| β€” | Docker metadata cache | +| β€” | AI settings and findings (`ai.enc`, `ai_findings.json`, `ai_patrol_runs.json`, `ai_usage_history.json`) | +| β€” | Pulse Pro license (`license.enc`) | +| β€” | Server sessions (`sessions.json`) | +| β€” | Update history (`update-history.jsonl`) | ## πŸ”„ Common Scenarios @@ -56,7 +62,8 @@ Because local login credentials are stored in `.env` (not part of exports), you 3. **Update Agents**: * **Unified Agent**: Update the `--token` flag in your service definition. * **Containerized agent**: Update `PULSE_TOKEN` in the agent container environment. - * *Tip: You can use the "Install New Agent" wizard to generate updated install commands.* + * *Tip: Use **Settings β†’ Agents β†’ Installation commands** to generate updated install commands.* +4. **Pulse Pro**: Re-activate your license key after migration (license files are not included in exports). ## πŸ”’ Security diff --git a/docs/OIDC.md b/docs/OIDC.md index 8c79ea86e..ec3b0a124 100644 --- a/docs/OIDC.md +++ b/docs/OIDC.md @@ -7,7 +7,7 @@ Enable Single Sign-On (SSO) with providers like Authentik, Keycloak, Okta, and A 1. **Configure Provider**: Create an OIDC application in your IdP. * **Redirect URI**: `https:///api/oidc/callback` * **Scopes**: `openid`, `profile`, `email` -2. **Enable in Pulse**: Go to **Settings β†’ Security β†’ Single sign-on (OIDC)**. +2. **Enable in Pulse**: Go to **Settings β†’ Security β†’ Single Sign-On**. 3. **Enter Details**: * **Issuer URL**: The base URL of your IdP (e.g., `https://auth.example.com/application/o/pulse/`). * **Client ID & Secret**: From your IdP. diff --git a/docs/PBS.md b/docs/PBS.md index 812b0919a..1b8920ded 100644 --- a/docs/PBS.md +++ b/docs/PBS.md @@ -51,29 +51,13 @@ The agent will: 3. Generate an API token 4. Register the PBS node with Pulse automatically -### Method 2: Password Setup (Best for Docker PBS) ⭐ +### Method 2: API-Only Setup Script (Best for PBS in Containers) ⭐ -**Perfect for PBS running in Docker containers** where you can't run the agent. - -1. Go to **Settings β†’ Nodes β†’ Add PBS Node** -2. Enter your PBS server's URL (e.g., `https://192.168.1.50:8007`) -3. Select **Username & Password** authentication -4. Enter admin credentials (e.g., `root@pam` with password) -5. Click **Save** - -Pulse will automatically: -- Connect to your PBS server -- Create a `pulse-monitor@pbs` monitoring user -- Generate an API token with Audit permissions -- Store the token (not your password) for ongoing monitoring - -> **Note:** This requires admin credentials initially, but Pulse converts them to a limited-permission token immediately. - -### Method 3: One-Click Setup Script +Use this when you can run a command on the PBS host but do not want to install the agent. From Pulse's Settings page: -1. Go to **Settings β†’ Nodes** -2. Click **Add PBS Node** +1. Go to **Settings β†’ Proxmox** +2. Click **Add Node** 3. Select **API Only** tab 4. Enter your PBS server's URL 5. Click copy to get the setup command @@ -84,7 +68,11 @@ Example (what the UI generates): curl -sSL "http://:7655/api/setup-script?type=pbs&host=https://:8007&pulse_url=http://:7655" | bash ``` -### Method 4: Manual Token Creation +The script creates a `pulse-monitor@pbs` user, generates a scoped API token, and registers the server with Pulse. + +> **Note**: API-only mode does not include temperature monitoring or AI command execution. Use **Agent Install** for full functionality. + +### Method 3: Manual Token Creation If you prefer manual setup: diff --git a/docs/PROXY_CONTROL_PLANE.md b/docs/PROXY_CONTROL_PLANE.md index 84973b5eb..7ac54c5fb 100644 --- a/docs/PROXY_CONTROL_PLANE.md +++ b/docs/PROXY_CONTROL_PLANE.md @@ -4,6 +4,8 @@ The Control Plane synchronizes `pulse-sensor-proxy` instances with the Pulse ser > **Deprecated in v5:** `pulse-sensor-proxy` (and its control-plane sync) is deprecated and not recommended for new deployments. New installs should use `pulse-agent --enable-proxmox` for temperature monitoring. +> **Important**: The control-plane endpoints are disabled by default. Set `PULSE_ENABLE_SENSOR_PROXY=true` on the Pulse server to enable legacy proxy support. + ## πŸ—οΈ Architecture ```mermaid @@ -19,7 +21,7 @@ graph LR ## πŸ”„ Workflow 1. **Install**: `install-sensor-proxy.sh` calls `/api/temperature-proxy/register`. -2. **Token Exchange**: Pulse returns a `ctrl_token` which the proxy saves to `/etc/pulse-sensor-proxy/.pulse-control-token`. +2. **Token Exchange**: Pulse returns a control-plane token which the proxy saves to `/etc/pulse-sensor-proxy/.pulse-control-token`. 3. **Polling**: The proxy polls `/api/temperature-proxy/authorized-nodes` every 60s (configurable). 4. **Update**: If the node list changes (e.g., a new node is added to Pulse), the proxy updates its internal allowlist automatically. @@ -31,11 +33,11 @@ The proxy configuration in `/etc/pulse-sensor-proxy/config.yaml` handles the syn pulse_control_plane: url: https://pulse.example.com:7655 token_file: /etc/pulse-sensor-proxy/.pulse-control-token - refresh_interval: 60s + refresh_interval: 60 ``` ## πŸ›‘οΈ Security -* **Tokens**: The `ctrl_token` is unique per proxy instance. +* **Tokens**: The control-plane token is unique per proxy instance. * **Least Privilege**: The proxy only knows about nodes explicitly added to Pulse. * **Fallback**: If the control plane is unreachable, the proxy uses its last known good configuration. diff --git a/docs/PULSE_PRO.md b/docs/PULSE_PRO.md new file mode 100644 index 000000000..c45e5fc86 --- /dev/null +++ b/docs/PULSE_PRO.md @@ -0,0 +1,120 @@ +# πŸš€ Pulse Pro (Technical Overview) + +Pulse Pro unlocks advanced AI automation features on top of the free Pulse platform. It keeps the same self-hosted model while adding continuous, context-aware analysis and remediation workflows. + +## What You Get + +### AI Patrol (LLM-Backed) +Scheduled background analysis that correlates live state + metrics history to produce actionable findings. + +**Inputs:** +- Nodes, guests, storages, backups, containers, and Kubernetes resources. +- Metrics history trends and anomaly scores. +- Alert state and diagnostics. + +**Outputs:** +- Findings with severity, category, and remediation hints. +- Trend-aware capacity warnings (e.g., "storage pool will be full in 10 days"). +- Cross-system correlation (e.g., backups failing because a datastore is full). + +### Pro-Only Automations +- **LLM-backed patrol analysis**: full AI analysis instead of heuristic-only findings. +- **Alert-triggered analysis**: on-demand deep analysis when alerts fire. +- **Autonomous mode**: optional diagnostic/fix commands through connected agents. +- **Auto-fix**: guarded remediations when enabled. +- **Kubernetes AI analysis**: deep cluster analysis beyond basic monitoring (Pro-only). + +### What Free Users Still Get +- **Heuristic Patrol**: local rule-based checks that surface common issues without any external AI provider. +- **AI Chat (BYOK)**: interactive troubleshooting with your own API keys. +- **Update alerts**: container/package update signals remain available in the free tier. + +### What You See In The UI +- **Patrol findings**: a prioritized list with severity, evidence, and recommended fixes. +- **Alert timelines**: AI analysis events attached to the alert history for auditability. +- **Remediation controls**: explicit toggles for autonomous mode and auto-fix workflows. + +## Pro Feature Gates (License-Enforced) + +Pulse Pro licenses enable specific server-side features. These are enforced at the API layer and in the UI: + +- `ai_patrol`: LLM-backed patrol findings and live patrol stream. +- `ai_alerts`: alert-triggered analysis runs. +- `ai_autofix`: autonomous mode and auto-fix workflows. +- `kubernetes_ai`: AI analysis for Kubernetes clusters (not basic monitoring). + +## Why It Matters (Technical Value) + +- **Cross-system correlation**: Patrol combines PVE, PBS, PMG, Docker, and Kubernetes signals into a single model context instead of isolated checks. +- **Trend-aware analysis**: Uses metrics history to detect slow-burn issues that static thresholds miss. +- **Noise control**: Suppression and dismissal memory prevent alert fatigue. +- **Actionable findings**: Each finding includes root-cause clues and next steps. +- **Auditability**: AI analysis is attached to alerts and stored with finding history, so decisions are traceable. + +## Scheduling and Controls + +- **Interval**: 10 minutes to 7 days (default 6 hours). Set to 0 to disable Patrol. +- **Scope**: Patrol only analyzes resources Pulse is already monitoring. +- **Safety**: Command execution and auto-fix are disabled by default and require explicit enablement. + +## How Licensing Works + +Pulse Pro is activated locally with a license key. + +1. Go to **Settings β†’ System β†’ Pulse Pro**. +2. Paste your license key and click **Activate License**. +3. The key is validated locally (no license server required). + +License status, expiry, and feature availability are visible in the same panel. + +The license key is stored encrypted in `license.enc` under the Pulse config directory. It is not included in export/import backups, so re-activate after migrations. + +### Feature Status API + +You can inspect license feature gates via: + +- `GET /api/license/features` (authenticated) + +This returns a feature map like `ai_patrol`, `ai_alerts`, `ai_autofix`, and `kubernetes_ai` so you can automate Pro-only workflows safely. + +## Under The Hood (Technical) + +- **Patrol context**: patrol runs build a unified snapshot from live state + `metrics.db` history, then correlate alerts, diagnostics, and resource topology. +- **Findings storage**: findings persist in `ai_findings.json` with run history in `ai_patrol_runs.json`. +- **Alert-triggered analysis**: runs per alert event and writes analysis into the alert timeline for auditability. +- **Auto-fix safety**: requires explicit toggles and uses the same agent command scopes you configure for manual runs. + +## Example Finding Payload (API) + +`GET /api/ai/patrol/findings` returns structured findings you can integrate with external tooling: + +```json +{ + "id": "finding-9f7c2f5e", + "key": "storage-high-usage", + "severity": "warning", + "category": "capacity", + "resource_id": "storage:local-lvm", + "resource_name": "local-lvm", + "resource_type": "storage", + "node": "pve-1", + "title": "Storage nearing capacity", + "description": "local-lvm is at 87% and growing ~4%/day.", + "recommendation": "Review VM disks on local-lvm or expand the volume within 7 days.", + "evidence": "Used 1.74TB of 2.0TB; +4.1%/day over 7d.", + "source": "ai-analysis", + "detected_at": "2025-03-04T09:11:12Z", + "last_seen_at": "2025-03-04T15:11:12Z", + "alert_id": "alert-storage-usage-local-lvm", + "times_raised": 2, + "suppressed": false +} +``` + +Heuristic (free-tier) findings omit `source: "ai-analysis"` and include the same schema for consistent automations. + +## Privacy and Data Handling + +Patrol runs on your Pulse server. When Pro is enabled, only the minimal context needed for analysis is sent to the configured AI provider. No telemetry is sent to Pulse by default. + +For a deeper AI walkthrough, see [AI.md](AI.md). diff --git a/docs/README.md b/docs/README.md index 50252f8da..49a18c038 100644 --- a/docs/README.md +++ b/docs/README.md @@ -29,6 +29,7 @@ Welcome to the Pulse documentation portal. Here you'll find everything you need ## πŸ” Security - **[Security Policy](../SECURITY.md)** – The core security model (Encryption, Auth, API Scopes). +- **[Proxy Auth](PROXY_AUTH.md)** – Authentik/Authelia/Cloudflare proxy authentication configuration. ## ✨ New in 5.0 @@ -40,17 +41,18 @@ Welcome to the Pulse documentation portal. Here you'll find everything you need ## πŸš€ Pulse Pro -Pulse Pro unlocks **AI Patrol** β€” automated background monitoring that spots issues before they become incidents. +Pulse Pro unlocks **LLM-backed AI Patrol** β€” automated background monitoring that spots issues before they become incidents. - **[Learn more at pulserelay.pro](https://pulserelay.pro)** - **[AI Patrol deep dive](AI.md)** - **[Pulse Pro technical overview](PULSE_PRO.md)** -- **What you actually get**: cross-system context, trend-aware checks, and actionable findings with remediation hints. +- **What you actually get**: LLM-backed patrol analysis, alert-triggered deep dives, Kubernetes AI analysis, and optional auto-fix workflows. - **Technical highlights**: correlation across nodes/VMs/backups/containers, trend-based capacity predictions, and findings you can resolve/suppress. +- **Scheduling**: 10 minutes to 7 days (default 6 hours). ## πŸ“‘ Monitoring & Agents -- **[Unified Agent](UNIFIED_AGENT.md)** – Single binary for Host and Docker monitoring. +- **[Unified Agent](UNIFIED_AGENT.md)** – Single binary for host, Docker, and Kubernetes monitoring. - **[Proxmox Backup Server](PBS.md)** – PBS integration, direct API vs PVE passthrough, token setup. - **[VM Disk Monitoring](VM_DISK_MONITORING.md)** – Enabling QEMU Guest Agent for disk stats. - **[Temperature Monitoring](TEMPERATURE_MONITORING.md)** – Agent-based temperature monitoring (`pulse-agent --enable-proxmox`). Sensor proxy is deprecated in v5. diff --git a/docs/REVERSE_PROXY.md b/docs/REVERSE_PROXY.md index 036e29cd3..fb4bfaeb2 100644 --- a/docs/REVERSE_PROXY.md +++ b/docs/REVERSE_PROXY.md @@ -13,6 +13,8 @@ location / { proxy_set_header Connection "upgrade"; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; # Critical for WebSockets proxy_read_timeout 86400; # 24h @@ -51,5 +53,6 @@ ProxyPassReverse / http://localhost:7655/ - **"Connection Lost"**: WebSocket upgrade failed. Check `Upgrade` and `Connection` headers. - **502 Bad Gateway**: Pulse is not running on port 7655. -- **CORS Errors**: Do not add CORS headers in the proxy; Pulse handles them. Set `ALLOWED_ORIGINS` env var if needed. +- **CORS Errors**: Do not add CORS headers in the proxy; Pulse handles them. Set **Settings β†’ System β†’ Network β†’ Allowed Origins** or use `ALLOWED_ORIGINS` if needed. +- **OIDC redirects or HTTPS detection issues**: Ensure `X-Forwarded-Proto` is set to `https`. - **Wrong client IPs**: Set `PULSE_TRUSTED_PROXY_CIDRS` to your proxy IP/CIDR so `X-Forwarded-For` is trusted. diff --git a/docs/SCREENSHOTS.md b/docs/SCREENSHOTS.md index 0033812a5..f5775da3b 100644 --- a/docs/SCREENSHOTS.md +++ b/docs/SCREENSHOTS.md @@ -1,7 +1,7 @@ # Pulse Screenshots ## Dashboard Overview (Dark Mode) -![Dashboard Overview](images/01-dashboard.png) +![Dashboard Overview](images/01-dashboard.jpg) *Real-time monitoring dashboard showing 7 Proxmox nodes with 35 VMs and 56 containers. Color-coded resource usage (CPU, RAM, storage) with quick status indicators for running/stopped guests. Automatic layout adapts to cluster size - compact cards for 5-9 nodes. Professional dark theme optimized for 24/7 monitoring setups.* ## Storage Management diff --git a/docs/SCRIPT_LIBRARY.md b/docs/SCRIPT_LIBRARY.md index 99b5d1f2a..9da24e0e8 100644 --- a/docs/SCRIPT_LIBRARY.md +++ b/docs/SCRIPT_LIBRARY.md @@ -12,7 +12,7 @@ This guide explains the shared Bash modules in `scripts/lib/` used for building **Conventions:** * **Namespaces:** Functions are exported as `module::function` (e.g., `common::run`). -* **Bundling:** `make bundle-scripts` inlines modules for distribution. +* **Bundling:** `./scripts/bundle.sh` inlines modules for distribution. * **Compatibility:** Targets Bash 5 on Debian 11+ and Ubuntu LTS. ## 🦴 Script Skeleton @@ -56,8 +56,7 @@ main "$@" ## πŸ“¦ Bundling 1. Update `scripts/bundle.manifest`. -2. Run `make bundle-scripts`. +2. Run `./scripts/bundle.sh`. 3. Verify `dist/` artifacts. **Note:** Never edit bundled artifacts manually. Always rebuild from source. - diff --git a/docs/SECURITY_CHANGELOG.md b/docs/SECURITY_CHANGELOG.md index f37b151d7..0b9e4cf4a 100644 --- a/docs/SECURITY_CHANGELOG.md +++ b/docs/SECURITY_CHANGELOG.md @@ -1,5 +1,7 @@ # Security Changelog - Pulse Sensor Proxy +> **Deprecated in v5:** `pulse-sensor-proxy` is deprecated and not recommended for new deployments. This changelog is retained for existing installations and historical reference. + ## 2025-11-07: Critical Security Hardening ### Summary @@ -240,11 +242,12 @@ allowed_peers: New Prometheus metrics for security monitoring: ``` -pulse_proxy_node_validation_failures_total{node, reason} +pulse_proxy_node_validation_failures_total{reason} pulse_proxy_read_timeouts_total pulse_proxy_write_timeouts_total -pulse_proxy_limiter_rejections_total{peer, reason} -pulse_proxy_limiter_penalties_total{peer, reason} +pulse_proxy_rate_limit_hits_total +pulse_proxy_limiter_rejections_total{reason, peer} +pulse_proxy_limiter_penalties_total{reason, peer} pulse_proxy_global_concurrency_inflight ``` diff --git a/docs/TROUBLESHOOTING.md b/docs/TROUBLESHOOTING.md index 766f72ed3..bed76acd7 100644 --- a/docs/TROUBLESHOOTING.md +++ b/docs/TROUBLESHOOTING.md @@ -19,7 +19,7 @@ sudo pulse bootstrap-token ``` ### Port change didn't take effect -1. Check which service is running: `systemctl status pulse` (or `pulse-backend`). +1. Check which service is running: `systemctl status pulse` (legacy installs may use `pulse-backend`). 2. Verify environment override: `systemctl show pulse --property=Environment`. 3. Docker: Ensure you updated the `-p` flag (e.g., `-p 8080:7655`). @@ -40,7 +40,8 @@ sudo pulse bootstrap-token **Cannot login / 401 Unauthorized** - Clear browser cookies. -- Check if your IP is banned (wait 15 mins or restart Pulse). +- Check if your IP is locked out (wait 15 mins). +- If another admin can log in, use `POST /api/security/reset-lockout` to clear the lockout for your username or IP. ### Monitoring Data @@ -63,13 +64,14 @@ sudo pulse bootstrap-token ### Notifications **Emails not sending** -- Check SMTP settings in **Settings β†’ Alerts**. +- Check SMTP settings in **Alerts β†’ Notification Destinations**. - Check logs: `docker logs pulse | grep email`. - Ensure your SMTP provider allows the connection (e.g., Gmail App Passwords). **Webhooks failing** - Verify the URL is reachable from the Pulse server. -- Check `Allowed Origins` if you are getting CORS errors. +- If targeting private IPs, allow them in **Settings β†’ System β†’ Network β†’ Webhook Security**. +- Check Pulse logs for HTTP status codes and response bodies. --- diff --git a/docs/UNIFIED_AGENT.md b/docs/UNIFIED_AGENT.md index 5fe552190..e9f957e34 100644 --- a/docs/UNIFIED_AGENT.md +++ b/docs/UNIFIED_AGENT.md @@ -7,7 +7,7 @@ The unified agent (`pulse-agent`) combines host, Docker, and Kubernetes monitori ## Quick Start Generate an installation command in the UI: -**Settings > Agents > "Install New Agent"** +**Settings β†’ Agents β†’ Installation commands** ### Linux (systemd) ```bash @@ -21,6 +21,18 @@ curl -fsSL http://:7655/install.sh | \ bash -s -- --url http://:7655 --token ``` +### Windows (PowerShell, run as Administrator) +```powershell +irm http://:7655/install.ps1 | iex +``` + +With environment variables: +```powershell +$env:PULSE_URL="http://:7655" +$env:PULSE_TOKEN="" +irm http://:7655/install.ps1 | iex +``` + ### Synology NAS ```bash curl -fsSL http://:7655/install.sh | \ @@ -46,7 +58,7 @@ curl -fsSL http://:7655/install.sh | \ | `--enable-host` | `PULSE_ENABLE_HOST` | Enable host metrics | `true` | | `--enable-docker` | `PULSE_ENABLE_DOCKER` | Enable Docker metrics | `false` (auto-detect if not configured) | | `--docker-runtime` | `PULSE_DOCKER_RUNTIME` | Force container runtime: `auto`, `docker`, or `podman` | `auto` | -| `--enable-kubernetes` | `PULSE_ENABLE_KUBERNETES` | Enable Kubernetes metrics | `false` | +| `--enable-kubernetes` | `PULSE_ENABLE_KUBERNETES` | Enable Kubernetes metrics | `false` (installer auto-detect if not configured) | | `--enable-proxmox` | `PULSE_ENABLE_PROXMOX` | Enable Proxmox integration | `false` | | `--proxmox-type` | `PULSE_PROXMOX_TYPE` | Proxmox type: `pve` or `pbs` | *(auto-detect)* | | `--enable-commands` | `PULSE_ENABLE_COMMANDS` | Enable AI command execution (disabled by default) | `false` | @@ -79,11 +91,11 @@ Legacy env var: `PULSE_KUBE_INCLUDE_ALL_POD_FILES` is still accepted for backwar Auto-detection behavior: - **Host metrics**: Enabled by default. -- **Docker/Podman**: Enabled automatically if Docker/Podman is detected and `PULSE_ENABLE_DOCKER` was not explicitly set. -- **Kubernetes**: Only enabled when `--enable-kubernetes`/`PULSE_ENABLE_KUBERNETES=true` is set. -- **Proxmox**: Only enabled when `--enable-proxmox`/`PULSE_ENABLE_PROXMOX=true` is set. Type auto-detects `pve` vs `pbs` if not specified. +- **Docker/Podman**: Enabled automatically by the agent if Docker/Podman is detected and `PULSE_ENABLE_DOCKER` was not explicitly set. +- **Kubernetes**: Enabled automatically by the installer when a kubeconfig is detected and `PULSE_ENABLE_KUBERNETES` was not explicitly set. +- **Proxmox**: Enabled automatically by the installer when Proxmox is detected. Type auto-detects `pve` vs `pbs` if not specified. -To disable Docker auto-detection, set `--enable-docker=false` or `PULSE_ENABLE_DOCKER=false`. +To disable auto-detection, set the relevant flag or env var (`--disable-docker`, `--disable-kubernetes`, `--disable-proxmox`). ## Installation Options @@ -102,7 +114,7 @@ curl -fsSL http://:7655/install.sh | \ ### Disable Docker (even if detected) ```bash curl -fsSL http://:7655/install.sh | \ - bash -s -- --url http://:7655 --token --enable-docker=false + bash -s -- --url http://:7655 --token --disable-docker ``` ### Host + Kubernetes Monitoring diff --git a/docs/UPGRADE_v5.md b/docs/UPGRADE_v5.md index afa3fc850..e7eab7a34 100644 --- a/docs/UPGRADE_v5.md +++ b/docs/UPGRADE_v5.md @@ -20,9 +20,11 @@ If you prefer CLI, use the official installer for the target version: ```bash curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install.sh | \ - sudo bash -s -- --stable + sudo bash -s -- --version vX.Y.Z ``` +This installer updates the **Pulse server**. Agent updates use the `/install.sh` command generated in **Settings β†’ Agents β†’ Installation commands**. + ### Docker ```bash @@ -76,7 +78,7 @@ pveum aclmod /storage -user pulse-monitor@pam -role PVEDatastoreAdmin **Alternative** (re-run agent setup): 1. Delete the node from Pulse Settings -2. Re-run the agent setup command from Settings β†’ Add Node +2. Re-run the agent setup command from Settings β†’ Proxmox β†’ Add Node 3. The new token will have correct permissions This happens because v5's agent setup grants broader permissions than the v4 manual setup scripts did. diff --git a/docs/WEBHOOKS.md b/docs/WEBHOOKS.md index 46bba9c63..44f11e3cc 100644 --- a/docs/WEBHOOKS.md +++ b/docs/WEBHOOKS.md @@ -1,10 +1,10 @@ # πŸ”” Webhooks -Pulse supports Discord, Slack, Teams, Telegram, Gotify, ntfy, and generic webhooks. +Pulse includes built-in templates for popular services and a generic JSON template for custom endpoints. ## πŸš€ Quick Setup -1. Go to **Alerts β†’ Notifications**. +1. Go to **Alerts β†’ Notification Destinations**. 2. Click **Add Webhook**. 3. Select service type and paste the URL. @@ -14,20 +14,37 @@ Pulse supports Discord, Slack, Teams, Telegram, Gotify, ntfy, and generic webhoo |---------|------------| | **Discord** | `https://discord.com/api/webhooks/{id}/{token}` | | **Slack** | `https://hooks.slack.com/services/...` | -| **Teams** | `https://{tenant}.webhook.office.com/...` | -| **Telegram** | `https://api.telegram.org/bot{token}/sendMessage?chat_id={id}` | +| **Teams** | `https://{tenant}.webhook.office.com/webhookb2/{webhook_path}` | +| **Teams (Adaptive Card)** | `https://{tenant}.webhook.office.com/webhookb2/{webhook_path}` | +| **Telegram** | `https://api.telegram.org/bot{bot_token}/sendMessage?chat_id={chat_id}` | +| **PagerDuty** | `https://events.pagerduty.com/v2/enqueue` | +| **Pushover** | `https://api.pushover.net/1/messages.json` | | **Gotify** | `https://gotify.example.com/message?token={token}` | | **ntfy** | `https://ntfy.sh/{topic}` | +| **Generic** | `https://example.com/webhook` | ## 🎨 Custom Templates For generic webhooks, use Go templates to format the JSON payload. -**Variables:** -- `{{.Message}}`: Alert text -- `{{.Level}}`: warning/critical -- `{{.Node}}`: Node name -- `{{.Value}}`: Metric value (e.g. 95.5) +**Variables (common):** +- `{{.ID}}`, `{{.Level}}`, `{{.Type}}` +- `{{.ResourceName}}`, `{{.ResourceID}}`, `{{.ResourceType}}`, `{{.Node}}` +- `{{.Message}}`, `{{.Value}}`, `{{.Threshold}}`, `{{.Duration}}`, `{{.Timestamp}}` +- `{{.Instance}}` (Pulse public URL if configured) +- `{{.CustomFields.}}` (user-defined fields in the UI) + +**Convenience fields:** +- `{{.ValueFormatted}}`, `{{.ThresholdFormatted}}` +- `{{.StartTime}}`, `{{.Acknowledged}}`, `{{.AckTime}}`, `{{.AckUser}}` + +**Template helpers:** `title`, `upper`, `lower`, `printf`, `urlquery`/`urlencode`, `urlpath` + +**Service-specific notes:** +- **Telegram**: include `chat_id` in the URL query string. +- **Telegram templates**: `{{.ChatID}}` is populated from the URL query string. +- **PagerDuty**: set `routing_key` as a custom field (or header) in the webhook config. +- **Pushover**: add `app_token` and `user_token` custom fields (required). **Example Payload:** ```json diff --git a/docs/api/SCHEDULER_HEALTH.md b/docs/api/SCHEDULER_HEALTH.md index c6a841b0b..ea23f81b6 100644 --- a/docs/api/SCHEDULER_HEALTH.md +++ b/docs/api/SCHEDULER_HEALTH.md @@ -1,7 +1,7 @@ # 🩺 Scheduler Health API **Endpoint**: `GET /api/monitoring/scheduler/health` -**Auth**: Required (Bearer token or Cookie) +**Auth**: Required (`Authorization: Bearer `, `X-API-Token`, or session cookie) Returns a real-time snapshot of the adaptive scheduler, including queue state, circuit breakers, and dead-letter tasks. @@ -16,11 +16,42 @@ Returns a real-time snapshot of the adaptive scheduler, including queue state, c "dueWithinSeconds": 2, "perType": { "pve": 4, "pbs": 2 } }, + "deadLetter": { + "count": 1, + "tasks": [ + { + "instance": "pbs-main", + "type": "pbs", + "nextRun": "2025-10-20T13:06:40Z", + "lastError": "connection timeout", + "failures": 5 + } + ] + }, + "breakers": [ + { + "instance": "pve-a", + "type": "pve", + "state": "half_open", + "failures": 3, + "retryAt": "2025-10-20T13:06:15Z" + } + ], + "staleness": [ + { + "instance": "pve-a", + "type": "pve", + "lastSuccess": "2025-10-20T13:05:10Z", + "stalenessSeconds": 32, + "stalenessScore": 0.12 + } + ], "instances": [ { "key": "pve::pve-a", "type": "pve", "displayName": "Pulse PVE Cluster", + "instance": "pve-a", "connection": "https://pve-a:8006", "pollStatus": { "lastSuccess": "2025-10-20T13:05:10Z", @@ -35,10 +66,14 @@ Returns a real-time snapshot of the adaptive scheduler, including queue state, c "breaker": { "state": "half_open", // closed, open, half_open "retryAt": "2025-10-20T13:06:15Z", - "failureCount": 3 + "failureCount": 3, + "since": "2025-10-20T12:58:10Z", + "lastTransition": "2025-10-20T13:05:40Z" }, "deadLetter": { - "present": false + "present": false, + "reason": "", + "retryCount": 0 } } ] @@ -57,9 +92,21 @@ The authoritative source for per-instance health. * **`breaker`**: * `state`: `closed` (healthy), `open` (failing), `half_open` (recovering). * `retryAt`: Next retry time if open/half-open. + * `since`: When the current breaker state started. + * `lastTransition`: Timestamp of the last state transition. * **`deadLetter`**: * `present`: `true` if the instance is in the DLQ (stopped polling). * `reason`: Why it was moved to DLQ (e.g., `permanent_failure`). + * `retryCount`: DLQ retry attempts. + * `nextRetry`: Next scheduled retry (if any). + +### Top-Level Queue and DLQ +* **`queue`**: Snapshot of the active task queue (depth + per-type counts). +* **`deadLetter`**: Aggregate DLQ summary plus up to 25 queued tasks. + +### Optional Summaries +* **`breakers`**: Only breakers that are not in default `closed`/zero-failure state. +* **`staleness`**: Snapshot of staleness scores (if the tracker is enabled). ## πŸ› οΈ Common Queries (jq) diff --git a/docs/monitoring/ADAPTIVE_POLLING.md b/docs/monitoring/ADAPTIVE_POLLING.md index a5144c3a1..82d97d841 100644 --- a/docs/monitoring/ADAPTIVE_POLLING.md +++ b/docs/monitoring/ADAPTIVE_POLLING.md @@ -7,25 +7,34 @@ Pulse uses an adaptive scheduler to optimize polling based on instance health an * **Priority Queue**: Min-heap keyed by `NextRun`. * **Circuit Breaker**: Prevents hot loops on failing instances using success/failure counters. * **Backoff**: Exponential retry delays (5s min to 5m max). -* **Worker Pool**: Controlled concurrency (default 10) to limit host resource usage. +* **Worker Pool**: One worker per configured instance (PVE/PBS/PMG), capped at 10. +* **Global Concurrency Cap**: At most 2 polling cycles run at once to avoid resource spikes. ## πŸ”¬ Implementation Details (Developer Info) -### Staleness Weighting -The `AdaptiveScheduler` (`internal/monitoring/scheduler.go`) calculates a `StalenessScore` (0.0 to 1.0) for every instance type. This score is weighted to prioritize active resources: -- **PVE (Proxmox nodes)**: High weight (1.0). Missing node data is critical. -- **VMs/Containers**: Medium weight (0.7). -- **Storage/Backups**: Lower weight (0.4). They change less frequently. +### Staleness Scoring +The `AdaptiveScheduler` (`internal/monitoring/scheduler.go`) relies on the `StalenessTracker` to compute a `StalenessScore` (0.0 to 1.0) based on **how long it has been since the last successful poll**. -The scheduler uses **Exponential Smoothing** on the intervals to prevent rapid "bobbing" between `MinInterval` and `MaxInterval` when sensors fluctuate. +- `0.0` = fresh (recent success) +- `1.0` = very stale or never succeeded + +The staleness score is normalized against `AdaptivePollingMaxInterval` (default 5 minutes). + +The scheduler applies **Exponential Smoothing** (alpha 0.6) and a small jitter (5%) to avoid oscillation. + +Additional influences: +- **Error penalty**: retries tighten the interval based on the error count. +- **Queue stretch**: large queues gently stretch intervals to avoid overload. ### Circuit Breaker Recovery -The `circuitBreaker` (`internal/monitoring/circuit_breaker.go`) follows the standard state machine but with Pulse-specific thresholds: -1. **Closing the Circuit**: It requires **one single successful poll** to transition from *Half-Open* back to *Closed*. -2. **Backoff Calculation**: Retries use `2^failures * 5s` up to the configured `MaxInterval`. +The `circuitBreaker` (`internal/monitoring/circuit_breaker.go`) follows a standard state machine: +1. **Closing the Circuit**: One successful poll moves *Half-Open* β†’ *Closed* and resets failure count. +2. **Backoff Calculation**: Retries use exponential backoff starting at 5s (multiplier 2, jitter 0.2) capped at 5m. 3. **Transient vs. Permanent**: - - **Transient (Network, Timeout)**: Retried 5 times before moving to DLQ. - - **Permanent (Auth 401, Forbidden 403)**: Bypasses immediate retries and moves straight to the Dead Letter Queue to avoid triggering IP lockouts on the target host. + - **Transient** errors (retryable) are retried up to 5 times before moving to the Dead Letter Queue. + - **Permanent** errors move directly to the Dead Letter Queue. + +**Note:** When `AdaptivePollingMaxInterval` is set to 15 seconds or less, the retry backoff is shortened (750ms initial, 6s max) to keep fast feedback loops during tight polling windows. ## βš™οΈ Configuration Adaptive polling is **disabled by default**. diff --git a/docs/monitoring/PROMETHEUS_METRICS.md b/docs/monitoring/PROMETHEUS_METRICS.md index 36a467c59..0cc610554 100644 --- a/docs/monitoring/PROMETHEUS_METRICS.md +++ b/docs/monitoring/PROMETHEUS_METRICS.md @@ -8,6 +8,8 @@ Example scrape target: This listener is separate from the main UI/API port (`7655`). In Docker and Kubernetes you must expose `9091` explicitly if you want to scrape it from outside the container/pod. +**Helm note:** the current chart exposes only port `7655`, so Prometheus scraping requires an additional Service that targets `9091` (and a matching ServiceMonitor). + ## 🌐 HTTP Ingress | Metric | Type | Description | | :--- | :--- | :--- | @@ -49,6 +51,17 @@ This listener is separate from the main UI/API port (`7655`). In Docker and Kube | `pulse_diagnostics_cache_misses_total` | Counter | Cache misses. | | `pulse_diagnostics_refresh_duration_seconds` | Histogram | Refresh latency. | +## 🚨 Alert Lifecycle +| Metric | Type | Description | +| :--- | :--- | :--- | +| `pulse_alerts_active` | Gauge | Active alerts by `level` and `type`. | +| `pulse_alerts_fired_total` | Counter | Total alerts fired by `level` and `type`. | +| `pulse_alerts_resolved_total` | Counter | Total alerts resolved by `type`. | +| `pulse_alerts_acknowledged_total` | Counter | Total alerts acknowledged. | +| `pulse_alerts_suppressed_total` | Counter | Alerts suppressed by `reason` (quiet_hours, rate_limit, duplicate, etc.). | +| `pulse_alerts_rate_limited_total` | Counter | Alerts suppressed due to rate limiting. | +| `pulse_alert_duration_seconds` | Histogram | Time from alert fire to resolve (by `type`). | + ## 🚨 Alerting Examples * **High Error Rate**: `rate(pulse_http_request_errors_total[5m]) > 0.05` * **Stale Node**: `pulse_monitor_node_poll_staleness_seconds > 300` diff --git a/docs/operations/AUTO_UPDATE.md b/docs/operations/AUTO_UPDATE.md index a12696661..f4b6fab5e 100644 --- a/docs/operations/AUTO_UPDATE.md +++ b/docs/operations/AUTO_UPDATE.md @@ -8,7 +8,9 @@ Manage Pulse auto-updates on host-mode installations. | :--- | :--- | | `pulse-update.timer` | Daily check (02:00 + jitter). | | `pulse-update.service` | Runs the update script. | -| `pulse-auto-update.sh` | Fetches release & restarts Pulse (`/usr/local/bin/pulse-auto-update.sh`). | +| `pulse-auto-update.sh` | Fetches release & restarts Pulse (`/opt/pulse/scripts/pulse-auto-update.sh`). | + +**Release channel note:** the systemd timer script tracks GitHub `releases/latest` (stable). RC channel settings only affect the in-app update checker. ## πŸš€ Enable/Disable @@ -34,15 +36,16 @@ journalctl -u pulse-update -f ``` ## πŸ” Observability -* **History**: `curl -s http://localhost:7655/api/updates/history | jq` (admin auth required) -* **Logs**: `journalctl -u pulse-update -f` or `journalctl -t pulse-auto-update -f` +* **History**: in-app updates are tracked via `GET /api/updates/history` (admin auth required) and stored in `update-history.jsonl` under `/etc/pulse` or `/data`. The systemd timer script does not record update history entries. +* **Logs**: `journalctl -u pulse-update -f` or `journalctl -t pulse-auto-update -f` for timer runs. In-app updates write detailed logs to `/var/log/pulse/update-.log`. ## ↩️ Rollback If an update fails: 1. Check logs: `journalctl -u pulse-update -f` or `journalctl -t pulse-auto-update -f`. -2. Use the **Rollback** action in **Settings β†’ System β†’ Updates** if available for your deployment type. +2. The timer script keeps a temporary backup under `/tmp/pulse-backup-` during the update; failures auto-restore from that backup and then clean it up. 3. If you need to pin a specific version, re-run the installer with a version: ```bash curl -fsSL https://github.com/rcourtman/Pulse/releases/latest/download/install.sh | \ sudo bash -s -- --version vX.Y.Z ``` + This installer updates the **Pulse server**. Agent updates use the `/install.sh` command generated in **Settings β†’ Agents β†’ Installation commands**. diff --git a/docs/releases/RELEASE_NOTES_v4.md b/docs/releases/RELEASE_NOTES_v4.md index 52c593e69..010afa362 100644 --- a/docs/releases/RELEASE_NOTES_v4.md +++ b/docs/releases/RELEASE_NOTES_v4.md @@ -88,12 +88,12 @@ curl -vk https://node.example:8443/health \ - Standalone host agents now ship with guided Linux, macOS, and Windows installers that stream registration status back to Pulse, generate scoped commands from **Settings β†’ Agents**, and feed host metrics into alerts alongside Proxmox and Docker. - Alert thresholds gained host-level overrides, connectivity toggles, and snapshot size guardrails so you can tune offline behaviour per host while keeping a global policy for other resources. - API tokens now support fine-grained scopes with a redesigned manager that previews command templates, highlights unused credentials, and makes revocation a single click. -- Proxmox replication jobs surface in a dedicated **Settings β†’ Hosts β†’ Replication** view with API plumbing to track task health and bubble failures into the monitoring pipeline. +- Proxmox replication jobs surface in a dedicated **Proxmox β†’ Replication** view with API plumbing to track task health and bubble failures into the monitoring pipeline. - Docker Swarm environments now receive service/task-aware reporting with configurable scope, plus a Docker settings view that highlights manager/worker roles, stack health, rollout status, and service alert thresholds. ### Improvements - Dashboard loads and drawer links respond faster thanks to cached guest metadata, reduced polling allocations, and inline URL editing that no longer flashes on WebSocket updates. -- Settings navigation is reorganized with dedicated Docker and Hosts sections, richer filters, and platform icons that make agent onboarding and discovery workflows clearer. +- Settings navigation is reorganized with dedicated platform and agent sections, richer filters, and platform icons that make onboarding and discovery workflows clearer. - LXC guests now report dynamic interface IPs, configuration metadata, and queue metrics so alerting, discovery, and drawers stay accurate even during rapid container churn. - Notifications consolidate into a consistent toast system, with clearer feedback during agent setup, token generation, and background job state changes.