Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-02-19 07:50:43 +01:00

Author	SHA1	Message	Date
rcourtman	408e113f35	Add TrueNAS SCALE persistence for host agent (Related to #718 )	2025-11-21 10:07:14 +00:00
rcourtman	f8e59839ba	Add agent-id support to host agent installers (Related to #721 )	2025-11-20 18:14:18 +00:00
rcourtman	b72fc2ab79	docs: align sensor proxy config with current defaults	2025-11-20 12:40:01 +00:00
rcourtman	3c5a1b273c	Improve Windows installer arch detection (related to #723 )	2025-11-20 09:37:45 +00:00
rcourtman	e39c6a3660	docs(sensor-proxy): comprehensive config management documentation Adds complete documentation for the new sensor-proxy config management CLI implemented in Phase 2. Addresses user-facing aspects of the corruption fix. New Documentation: - docs/operations/sensor-proxy-config-management.md (469 lines) - Complete operations runbook for config management - Full CLI reference with examples - Migration guide from inline config - Architecture explanation - Common operational tasks - Troubleshooting guide - Best practices and automation Updated Documentation: - cmd/pulse-sensor-proxy/README.md - Configuration Management CLI section - Allowed Nodes File format - Enhanced troubleshooting - Config corruption recovery - docs/TEMPERATURE_MONITORING.md - Config validation failure troubleshooting - Configuration Management quick reference - Cross-links to detailed docs - docs/TROUBLESHOOTING.md - Sensor proxy config validation errors - Comprehensive diagnosis steps - Automatic and manual recovery - README.md & docs/README.md - Added new runbook to operations index - Positioned for discoverability Coverage: - Both CLI commands fully documented - Phase 1 & Phase 2 architecture explained - Migration path from pre-v4.31.1 - Config corruption recovery procedures - Safe config editing practices - Automation examples - Troubleshooting all failure modes Documentation Quality: - Cross-linked from 5 different documents - Clear examples for common use cases - Target audience: system administrators - Follows project documentation style - Production-ready This completes the sensor-proxy config corruption fix by providing users with comprehensive guidance for the new config management system. Related to Phase 2 commits `3dc073a28`, `804a638ea`, `131666bc1`	2025-11-19 10:01:33 +00:00
rcourtman	9ea509ea8b	Improve host agent binary handling and docker installer purge (Related to #693 )	2025-11-18 22:11:44 +00:00
rcourtman	51b368ddc1	feat: make PVE polling interval configurable (related to #467 )	2025-11-18 21:30:04 +00:00
rcourtman	c176f9eb51	Document proxy control-plane refresh	2025-11-18 14:31:08 +00:00
rcourtman	f9341ae1fc	Improve temperature proxy workflow	2025-11-17 14:25:46 +00:00
rcourtman	47d5c14aef	Improve temperature proxy control-plane flow	2025-11-15 21:49:51 +00:00
rcourtman	93cde2439d	docs: highlight runbooks in index and script verification checklist	2025-11-14 10:39:10 +00:00
rcourtman	4752a9baff	docs: reference log forwarding runbook in sensor proxy guides	2025-11-14 10:37:09 +00:00
rcourtman	a4eb70af96	docs: document sensor proxy log forwarding	2025-11-14 01:12:25 +00:00
rcourtman	2850f20dad	docs: add auto-update troubleshooting	2025-11-14 01:07:32 +00:00
rcourtman	41b57aa75d	docs: clarify auto-update flow and surface proxy guide	2025-11-14 01:06:11 +00:00
rcourtman	bffc8f3f83	docs: add auto-update runbook	2025-11-14 01:05:06 +00:00
rcourtman	3c41d3960c	docs: add operations runbooks and audit fixes	2025-11-14 01:01:21 +00:00
rcourtman	25ae527c95	Clarify sensor proxy HTTPS workflow in docs	2025-11-14 00:48:41 +00:00
rcourtman	c0942d93f0	Explain HTTPS-first temperature architecture	2025-11-14 00:45:20 +00:00
rcourtman	411a448c8e	Document v4.31.0 release highlights	2025-11-14 00:43:16 +00:00
rcourtman	575b062d40	Remove installer v2 rollout doc	2025-11-13 22:36:01 +00:00
rcourtman	ccdded502d	Remove CONTRIBUTING-SCRIPTS doc	2025-11-13 22:34:21 +00:00
rcourtman	61f011af1d	Improve temperature proxy diagnostics and tests	2025-11-13 22:31:53 +00:00
rcourtman	6a1a88217f	Add release dry run workflow and API update integration test	2025-11-12 21:02:52 +00:00
rcourtman	305a5b17bc	Handle Snap Docker home restrictions (Related to #693 )	2025-11-12 19:20:04 +00:00
rcourtman	95c8fe9c2d	Add Snap Docker support to install-docker-agent.sh Snap-installed Docker does not automatically create a docker group, causing permission denied errors when the pulse-docker service user tries to access /var/run/docker.sock. Changes: - Auto-detect Snap Docker installations - Create docker group if missing when Snap Docker is detected - Restart Snap Docker after group creation to refresh socket ACLs - Add socket access validation before starting the service - Handle symlinked Docker sockets in systemd unit ReadWritePaths - Document troubleshooting steps in DOCKER_MONITORING.md	2025-11-11 23:07:29 +00:00
rcourtman	3477aa3dae	Update Kubernetes docs with GitHub Pages Helm repository - Replace GHCR OCI instructions with GitHub Pages repository - Add comprehensive upgrade instructions with examples - Add rollback procedures - Add detailed uninstall instructions - Simplify installation (no authentication required)	2025-11-11 19:40:51 +00:00
rcourtman	bb7ca93c18	feat: Add mdadm RAID monitoring support for host agents Implements comprehensive mdadm RAID array monitoring for Linux hosts via pulse-host-agent. Arrays are automatically detected and monitored with real-time status updates, rebuild progress tracking, and automatic alerting for degraded or failed arrays. Key changes: Backend: - Add mdadm package for parsing mdadm --detail output - Extend host agent report structure with RAID array data - Integrate mdadm collection into host agent (Linux-only, best-effort) - Add RAID array processing in monitoring system - Implement automatic alerting: - Critical alerts for degraded arrays or arrays with failed devices - Warning alerts for rebuilding/resyncing arrays with progress tracking - Auto-clear alerts when arrays return to healthy state Frontend: - Add TypeScript types for RAID arrays and devices - Display RAID arrays in host details drawer with: - Array status (clean/degraded/recovering) with color-coded indicators - Device counts (active/total/failed/spare) - Rebuild progress percentage and speed when applicable - Green for healthy, amber for rebuilding, red for degraded Documentation: - Document mdadm monitoring feature in HOST_AGENT.md - Explain requirements (Linux, mdadm installed, root access) - Clarify scope (software RAID only, hardware RAID not supported) Testing: - Add comprehensive tests for mdadm output parsing - Test parsing of healthy, degraded, and rebuilding arrays - Verify proper extraction of device states and rebuild progress All builds pass successfully. RAID monitoring is automatic and best-effort - if mdadm is not installed or no arrays exist, host agent continues reporting other metrics normally. Related to #676	2025-11-09 16:36:33 +00:00
rcourtman	188944019a	docs: Add webhook private IP allowlist configuration guide Document the new webhook security feature that allows homelab users to configure trusted private IP ranges for webhook targets. Includes: - Overview of default security behavior - Step-by-step configuration instructions - Security considerations and best practices - Example CIDR configurations - Troubleshooting guidance for common error messages Related to #673	2025-11-09 08:36:15 +00:00
rcourtman	082c6c2201	Fix documentation typo: change 'Servers' to 'Hosts' tab (related to #661 )	2025-11-08 17:24:15 +00:00
rcourtman	1a3abf7f3f	Fix pulse-host-agent temperature collection on all Linux distros (related to #661 ) The temperature collection in pulse-host-agent was broken on all Linux distributions due to an incorrect platform check. Root cause: - collectTemperatures() checked `if a.platform != "linux"` at agent.go:316 - normalisePlatform() returns the raw distro name from gopsutil (debian, ubuntu, pve) - This caused temperature collection to be skipped on ALL Linux hosts Fix: - Changed check to `if runtime.GOOS != "linux"` which correctly identifies Linux - runtime.GOOS returns "linux" regardless of distribution Also fixed documentation typo: - Changed "Servers tab" to "Hosts tab" in HOST_AGENT.md and TEMPERATURE_MONITORING.md - Reported by user in issue #661 comments Testing: - Verified build succeeds - Confirmed runtime.GOOS returns "linux" on Linux systems Related to #661	2025-11-08 10:25:01 +00:00
rcourtman	3ad35976b2	Clarify Docker agent cycling troubleshooting for cloned VMs/LXCs (related to #648 ) Enhanced the "Docker hosts cycling" troubleshooting entry to explicitly call out VM/LXC cloning as a cause of identical agent IDs. Added specific remediation steps for regenerating machine IDs on cloned systems. This addresses the resolution path discovered in discussion #648 where a user cloned a Proxmox LXC and encountered cycling behavior even with separate API tokens because the agent IDs were duplicated.	2025-11-07 22:59:19 +00:00
rcourtman	2b7492ac59	feat: Add temperature collection to pulse-host-agent (related to #661 ) Implements temperature monitoring in pulse-host-agent to support Docker-in-VM deployments where the sensor proxy socket cannot cross VM boundaries. Changes: - Create internal/sensors package with local collection and parsing - Add temperature collection to host agent (Linux only, best-effort) - Support CPU package/core, NVMe, and GPU temperature sensors - Update TEMPERATURE_MONITORING.md with Docker-in-VM setup instructions - Update HOST_AGENT.md to document temperature feature The host agent now automatically collects temperature data on Linux systems with lm-sensors installed. This provides an alternative path for temperature monitoring when running Pulse in a VM, avoiding the unix socket limitation. Temperature collection is best-effort and fails gracefully if lm-sensors is not available, ensuring other metrics continue to be reported. Related to #661	2025-11-07 22:54:40 +00:00
rcourtman	52bc23b850	docs: Fix remaining :rw mount references to :ro Updates all remaining references to read-write socket mounts in TEMPERATURE_MONITORING.md to use read-only (:ro) mounts for security. Changes: - Manual installation section - Docker-only responsibilities section - Ansible playbook example All socket mounts should be :ro to prevent container tampering.	2025-11-07 17:14:47 +00:00
rcourtman	427cb383d8	docs: Remove deployment checklist (per user request)	2025-11-07 17:11:15 +00:00
rcourtman	f9dc2f6466	docs: Add comprehensive security audit documentation Adds complete documentation for 2025-11-07 security audit and hardening: - SECURITY_AUDIT_2025-11-07.md: Full professional audit report - 9 security issues identified and fixed (4 critical, 4 medium, 1 low) - Detailed findings, remediations, and testing - Security posture improved from B+ to A - 85%+ reduction in exploitable attack surface - SECURITY_CHANGELOG.md: Detailed changelog with migration guide - Complete implementation details for all fixes - Configuration examples - Backwards compatibility notes - New metrics and features - DEPLOYMENT_CHECKLIST.md: Step-by-step deployment guide - Pre-deployment backup procedures - Deployment steps for Docker and LXC - Verification procedures - Rollback procedures - Troubleshooting guide - Success criteria - README.md: Updated with security hardening highlights - Links to audit report - Key security features added Audit performed by Claude (Sonnet 4.5) + Codex collaboration. All implementations by Codex based on Claude specifications. 100% remediation rate (9/9 issues fixed). 17 new tests added, all passing. Related to security audit 2025-11-07.	2025-11-07 17:10:21 +00:00
rcourtman	48fabdd827	Improve Docker temperature monitoring documentation for clarity (related to #600 ) Updated the Quick Start for Docker section in TEMPERATURE_MONITORING.md to be more user-friendly and address common setup issues: - Added clear explanation of why the proxy is needed (containers can't access hardware) - Provided concrete IP example instead of placeholder - Showed full docker-compose.yml context with proper YAML structure - Added sudo to commands where needed - Updated docker-compose commands to v2 syntax with note about v1 - Expanded verification steps with clearer success indicators - Added reminder to check container name in verification commands These improvements should help users who encounter blank temperature displays due to missing proxy installation or bind mount configuration.	2025-11-07 15:09:42 +00:00
rcourtman	910f2dd800	Add troubleshooting entries for Docker agent token issues (related to #648 ) Added two troubleshooting sections to DOCKER_MONITORING.md: 1. "Docker hosts cycling or appearing to replace each other" - explains why multiple agents sharing the same token cause the UI to switch between hosts instead of showing all simultaneously 2. "Agent rejected after host removal" - documents the re-enrollment process when a host is on the removal blocklist These entries make common setup issues searchable while linking to canonical setup instructions rather than duplicating them.	2025-11-07 10:55:45 +00:00
rcourtman	a1dc451ed4	Document alert reliability features and DLQ API Add comprehensive documentation for new alert system reliability features: API Documentation (docs/API.md): - Dead Letter Queue (DLQ) API endpoints - GET /api/notifications/dlq - Retrieve failed notifications - GET /api/notifications/queue/stats - Queue statistics - POST /api/notifications/dlq/retry - Retry DLQ items - POST /api/notifications/dlq/delete - Delete DLQ items - Prometheus metrics endpoint documentation - 18 metrics covering alerts, notifications, and queue health - Example Prometheus configuration - Example PromQL queries for common monitoring scenarios Configuration Documentation (docs/CONFIGURATION.md): - Alert TTL configuration - maxAlertAgeDays, maxAcknowledgedAgeDays, autoAcknowledgeAfterHours - Flapping detection configuration - flappingEnabled, flappingWindowSeconds, flappingThreshold, flappingCooldownMinutes - Usage examples and common scenarios - Best practices for preventing notification storms All new features are fully documented with examples and default values.	2025-11-06 17:34:05 +00:00
rcourtman	becda56897	Fix critical rollback download URL bug and doc inconsistencies Issues found during systematic audit after #642: 1. CRITICAL BUG - Rollback downloads were completely broken: - Code constructed: pulse-linux-amd64 (no version, no .tar.gz) - Actual asset name: pulse-v4.26.1-linux-amd64.tar.gz - This would cause 404 errors on all rollback attempts - Fixed: Construct correct tarball URL with version - Added: Extract tarball after download to get binary 2. TEMPERATURE_MONITORING.md referenced non-existent v4.27.0: - Changed to use /latest/download/ for future-proof docs 3. API.md example had wrong filename format: - Changed pulse-linux-amd64.tar.gz to pulse-v4.30.0-linux-amd64.tar.gz - Ensures example matches actual release asset naming The rollback bug would have affected any user attempting to roll back to a previous version via the UI or API.	2025-11-06 14:25:32 +00:00
rcourtman	fd3a72606f	Add standalone host-agent binaries to releases Issue: HOST_AGENT.md documented downloading pulse-host-agent binaries from GitHub releases, but those assets didn't exist. Only tarballs were available, making manual installation unnecessarily complex. Changes: - Copy standalone host-agent binaries (all architectures) to release/ directory alongside sensor-proxy binaries - Include host-agent binaries in checksum generation - Update HOST_AGENT.md to clarify available architectures - Retroactively uploaded missing binaries to v4.26.1 This enables air-gapped and manual installations without requiring an already-running Pulse server to download from.	2025-11-06 14:20:59 +00:00
rcourtman	6192e166f2	chore: prepare release v4.26.1	2025-11-06 12:13:56 +00:00
rcourtman	dfe960deb4	Fix container SSH detection and improve troubleshooting for issue #617 Related to #617 This fixes a misconfiguration scenario where Docker containers could attempt direct SSH connections (producing [preauth] log spam) instead of using the sensor proxy. Changes: - Fix container detection to check PULSE_DOCKER=true in addition to system.InContainer() heuristics (both temperature.go and config_handlers.go) - Upgrade temperature collection log from Error to Warn with actionable guidance about mounting the proxy socket - Add Info log when dev mode override is active so operators understand the security posture - Add troubleshooting section to docs for SSH [preauth] logs from containers The container detection was inconsistent - monitor.go checked both flags but temperature.go and config_handlers.go only checked InContainer(). Now all locations consistently check PULSE_DOCKER \|\| InContainer().	2025-11-06 09:57:53 +00:00
rcourtman	88ad986877	Revert "Hide Settings tab when authentication is not configured" This reverts commit d5a1e3d07729bad61743e8645a636e2545e11038.	2025-11-05 23:21:34 +00:00
rcourtman	3d1c910daa	Hide Settings tab when authentication is not configured Related to #636 When authentication is not configured (hasAuth() returns false), the Settings tab is now automatically hidden from the web interface. This provides a cleaner monitoring-only view for unauthenticated deployments where users only need to check the health of their environment. The Settings icon beside the Alerts tab will only appear when authentication is properly configured via PULSE_AUTH_USER/PASS, API tokens, proxy auth, or OIDC. Changes: - Modified utilityTabs in App.tsx to conditionally include Settings based on hasAuth() signal - Updated CONFIGURATION.md to document this UI behavior	2025-11-05 23:10:20 +00:00
rcourtman	8ca31003a0	docs: document TLS certificate file permissions for HTTPS setup Add comprehensive documentation for HTTPS/TLS configuration including: - File ownership and permission requirements (pulse user) - Common troubleshooting steps for startup failures - Complete setup examples for systemd and Docker - Validation commands for certificate/key verification Related to discussion #634	2025-11-05 23:08:02 +00:00
rcourtman	efa1ec1cd9	docs: document per-metric alert delay configuration (addresses #433 ) Added comprehensive documentation for the per-metric alert delay feature that was requested in issue #433. This feature allows configuring different alert delays for different metrics (e.g., longer delays for CPU spikes, shorter delays for memory pressure). Key additions: - Detailed explanation of delay precedence hierarchy - JSON configuration examples for common use cases - Table of recommended delays by metric type with reasoning - UI access instructions for the Alert Delay row Also added example tests demonstrating the feature's functionality and common configuration patterns. The feature itself was already fully implemented in both backend (metricTimeThresholds support) and frontend (per-metric delay inputs in ResourceTable). This commit surfaces the feature through documentation so users know it exists and how to use it. Related to #433	2025-11-05 20:04:44 +00:00
rcourtman	545634372e	Document log_level configuration for pulse-sensor-proxy Update hardening documentation to include log_level configuration option. Users can now find examples of controlling logging verbosity through YAML config and environment variables. Related to #629	2025-11-05 19:48:42 +00:00
rcourtman	a5e3469da8	Add comprehensive automation documentation for temperature proxy installation This addresses the need for users who deploy Pulse via infrastructure-as-code tools (Ansible, Terraform, Salt, Puppet) to have scriptable, well-documented installation procedures. Changes: Comprehensive Automation Section: - Documented all installer script flags and options - Required: --ctid (LXC) or --standalone (Docker) - Optional: --quiet, --pulse-server, --version, --local-binary, --skip-restart - Documented idempotency, exit codes, and non-interactive behavior Real-World Examples: - Ansible playbook for LXC deployments - Ansible playbook for Docker deployments (includes docker-compose.yml management) - Terraform null_resource example with remote-exec - Manual step-by-step configuration (no script) Configuration Documentation: - Complete YAML config file format with all options - Environment variable overrides (PULSE_SENSOR_PROXY_ALLOWED_SUBNETS, etc.) - Example systemd service overrides - Rate limiting, metrics, ACL, and subnet configuration Quick Reference: - Added link at top of doc for automation users to jump directly to automation section - Clear examples of re-running after changes (adding nodes, upgrading versions) Key Features for Automation: - --quiet flag for non-interactive execution - Idempotent design (safe to re-run) - Verifiable exit codes - Environment variable configuration - Local binary support (no internet required) This makes it straightforward for infrastructure teams to integrate Pulse temperature monitoring into their existing automation workflows without relying on interactive scripts or manual steps.	2025-11-05 18:18:04 +00:00
rcourtman	a1fb79ae6a	Fix temperature proxy documentation and setup script for Docker vs LXC clarity This addresses confusion around temperature monitoring setup for Docker deployments where users expected a turnkey experience similar to LXC. The core issue: The setup script and documentation suggested that temperature monitoring was "automatically configured" for all containerized deployments, but in reality only LXC containers have a fully automatic setup. Docker requires manual steps. Changes: Setup Script (config_handlers.go): - Fixed "unknown environment" path to show separate instructions for LXC vs Docker - Docker instructions now correctly show --standalone flag (was incorrectly showing --ctid) - Added docker-compose.yml bind mount instructions inline - Added restart command for Docker deployments Documentation (TEMPERATURE_MONITORING.md): - Added prominent "Deployment-Specific Setup" callout at the top - Clarified that LXC is fully automatic, Docker requires manual steps - Reorganized "Setup (Automatic)" section to clearly distinguish: - LXC: Fully turnkey (no manual steps) - Docker: Manual proxy installation required - Node configuration: Works for both - Updated "Host-side responsibilities" to specify it's Docker-only - Fixed architecture benefits to reflect LXC vs Docker differences Why this matters: - LXC setup script auto-detects the container and runs install-sensor-proxy.sh --ctid - Docker deployments can't be auto-detected and require --standalone flag - Users running Docker were getting incorrect instructions (--ctid instead of --standalone) - Documentation suggested everything was automatic, leading to confusion Now the documentation and setup script accurately reflect that: - LXC = Turnkey (automatic) - Docker = Manual steps required (but well-documented) - Native = Direct SSH (no proxy) Related to GitHub Discussion #605	2025-11-05 18:18:04 +00:00

1 2 3

120 Commits