Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-02-18 00:17:39 +01:00

Author	SHA1	Message	Date
rcourtman	afad679ffd	fix(sensor-proxy): add timeouts to pmxcfs operations in installer The container config backup and pct commands could hang indefinitely when the Proxmox cluster filesystem (pmxcfs) is slow or unresponsive. This caused the installer to appear to hang after printing "Configuring socket bind mount..." with no further output. Added timeout protection to: - Container config backup cp operation - pct status check - pct config verification - Config rollback cp operation Related to #738	2025-12-15 07:04:14 +00:00
rcourtman	758560ee69	fix(sensor-proxy): add --proxy-url flag for manual URL override Closes #826 The error messages suggested using --proxy-url but the flag was never implemented. This adds the flag so users can manually specify the proxy URL when: - Auto IP detection produces malformed results - The desired IP is not the primary IP - Multi-homed hosts need a specific interface	2025-12-14 21:15:55 +00:00
rcourtman	7d733db3a8	fix: Default sensor-proxy HTTP to 0.0.0.0:8443 for IPv4 binding On systems with net.ipv6.bindv6only=1 (including some Proxmox 8 configurations), using ":8443" results in IPv6-only binding. Users reported curl to 127.0.0.1:8443 hanging while [::1]:8443 worked. Changed default from ":8443" to "0.0.0.0:8443" to explicitly bind IPv4. Related to #805	2025-12-03 20:25:08 +00:00
rcourtman	4f23cddcae	fix: Handle --http-addr with bind address in sensor-proxy installer When using --http-addr 0.0.0.0:8443 (to bind to IPv4 only), the URL construction was broken, producing URLs like https://192.168.31.110.0.0.0:8443 Now correctly extracts the port number from both ":8443" and "0.0.0.0:8443" formats using ${HTTP_ADDR##*:} instead of ${HTTP_ADDR#:} Related to #805	2025-12-03 20:16:30 +00:00
rcourtman	87eb88dd98	fix: sensor-proxy installer fails silently on containers without snapshots The SNAPSHOT_START extraction used grep in a pipeline with pipefail enabled. When a container config has no snapshot sections (no lines starting with '['), grep returns exit code 1, causing set -e to terminate the script without any error message. This affected newly created containers that hadn't been snapshotted yet, which is the common case for fresh Pulse installations via community scripts. Related to #780	2025-12-03 09:04:45 +00:00
rcourtman	b3b8081426	fix: add timeout to pmxcfs operations in install-sensor-proxy.sh Reading and writing container config from /etc/pve/lxc/ can hang indefinitely if the Proxmox cluster filesystem (pmxcfs) is slow or unresponsive. This causes the installer to appear to hang after "Configuring socket bind mount..." with no further output. Add 10-second timeouts to both cp operations and provide helpful error messages suggesting the user check cluster health with 'pvecm status'. Related to #738	2025-12-01 21:04:01 +00:00
rcourtman	f197dfc922	Fix sensor-proxy installer to download latest release by default The VERSION variable was hardcoded to v4.32.0 instead of being empty, which prevented the "fetch latest release" logic from running. When VERSION is empty, REQUESTED_VERSION defaults to "latest" which triggers proper release detection via GitHub API. Related to #738	2025-12-01 06:02:42 +00:00
rcourtman	973f1f9866	Fix SSH key collision when installing sensor-proxy on multiple cluster nodes When running install-sensor-proxy.sh on multiple nodes in a cluster, each installation was removing all existing pulse-managed-key entries before adding its own. This caused the following scenario: 1. Run script on node A: node A's key is added to all nodes 2. Run script on node B: node B's key replaces node A's key on all nodes 3. Result: node A's proxy can no longer SSH to other nodes The fix changes the behavior to: - Check if the specific SSH key already exists on the target - Only add the key if not present (idempotent) - Never remove existing pulse-managed-key entries This allows multiple sensor-proxy installations to coexist in a cluster, with each node's proxy key authorized on all nodes. Related to #738	2025-11-30 21:03:36 +00:00
rcourtman	7e990710e9	Fix indentation in cleanup section after pvesh refactor The previous commit left broken indentation and an orphaned else block in the cleanup section. This fixes the structure to properly handle the cluster nodes vs standalone node cases. Related to #738	2025-11-29 18:41:59 +00:00
rcourtman	649278bf5f	Use pvesh API for cluster node discovery in install-sensor-proxy.sh Replace brittle pvecm nodes CLI parsing with pvesh API calls. The old approach used awk field positions ($4) which breaks across Proxmox versions, locales, or output format changes. Added get_cluster_node_names() helper that: - Prefers pvesh get /cluster/status --output-format json (structured) - Falls back to pvecm nodes CLI parsing if pvesh unavailable - Uses python3 for JSON parsing (always available on Proxmox) Related to #738	2025-11-29 18:33:27 +00:00
rcourtman	a0eead95f1	Fix pvecm nodes parsing to correctly extract hostname field The awk was using $NF which returns "(local)" on the local node's line instead of the hostname. Changed to $4 which is the actual hostname field. Related to #738	2025-11-29 18:24:44 +00:00
rcourtman	81eb6b018a	Use pvecm nodes for cluster discovery to prefer management IPs For multi-network Proxmox clusters (e.g., separate corosync and management networks), the installer now uses `pvecm nodes` to get hostnames and resolves them via /etc/hosts. This automatically prefers management IPs when the cluster has proper /etc/hosts configuration. Falls back to the previous `pvecm status` IP extraction if hostname resolution doesn't yield results. Related to #738	2025-11-29 15:07:42 +00:00
rcourtman	ee9c63c880	Add jq dependency and fix secondary node support in sensor-proxy installer Related to #738 Fixes two issues discovered by k5madsen: 1. Missing jq dependency: The sensor wrapper script uses jq extensively to parse SMART data JSON from smartctl but the installer never checked if jq was installed. Added jq to REQUIRED_CMDS list. 2. Secondary node support: When running on a secondary Proxmox cluster node where the container doesn't exist locally, the script now: - Warns instead of failing with "Container does not exist" - Continues installation for host temperature monitoring - Skips container-specific socket mount configuration This allows users to run the installer on all cluster nodes (as intended) to ensure the sensor-proxy service is available when containers migrate.	2025-11-28 21:08:43 +00:00
rcourtman	6853a0ffd1	feat: serve install scripts from GitHub releases instead of main branch Scripts like install.sh and install-sensor-proxy.sh are now attached as release assets and downloaded from releases/latest/download/ URLs. This ensures users always get scripts compatible with their installed version, even while development continues on main. Changes: - build-release.sh: copy install scripts to release directory - create-release.yml: upload scripts as release assets - Updated all documentation and code references to use release URLs - Scripts reference each other via release URLs for consistency	2025-11-26 08:59:59 +00:00
courtmanr@gmail.com	bddb90229b	Improve setup script clarity: reduce verbosity and fix confusing messages	2025-11-25 10:13:20 +00:00
courtmanr@gmail.com	0c6fd01ff2	Improve setup script output by hiding irrelevant Docker/proxy info	2025-11-25 10:01:41 +00:00
courtmanr@gmail.com	c91add36d2	fix: filter out qdevice from cluster node discovery	2025-11-24 22:54:58 +00:00
courtmanr@gmail.com	a5fbe52a59	Fix pvecm status parsing for QDevice flags (#738 )	2025-11-22 23:44:01 +00:00
rcourtman	d0d7a3dcbd	Fix mp mount detection pattern for pulse-sensor-proxy The grep pattern was looking for 'pulse-sensor-proxy' as a standalone string, but the actual mount line contains paths like: mp0: /run/pulse-sensor-proxy,mp=/mnt/pulse-proxy,replicate=0 This caused the removal logic to never execute, leaving the old mp mount in place and preventing the migration to lxc.mount.entry format. Changed pattern to match either path component: - /pulse-sensor-proxy (source path) - /mnt/pulse-proxy (mount point) Also removed space after colon in pattern to match actual format. This completes the fix for temperature proxy setup on LXC containers.	2025-11-22 22:34:26 +00:00
rcourtman	3858397f76	Fix LXC config modification for Proxmox pmxcfs filesystem The /etc/pve/ directory is a clustered FUSE filesystem (pmxcfs) managed by Proxmox. Direct modifications using sed -i or echo >> don't work reliably on this filesystem, and LXC config files contain snapshot sections that must be preserved. Changes: - Use temp file approach: copy config, modify temp, copy back to trigger sync - Only modify main config section (before first [snapshot] marker) - Properly handle both mp mount removal and lxc.mount.entry addition - Apply fix to both install.sh and install-sensor-proxy.sh This fixes temperature proxy setup failures where the socket mount entry wasn't being persisted to the container configuration. Related to #628	2025-11-22 22:19:00 +00:00
rcourtman	596bdbfb13	Handle standby SMART temps and capture disk identity	2025-11-22 07:35:13 +00:00
rcourtman	3b85436c0f	Related to #738 : make pulse proxy mount migration-safe	2025-11-21 21:29:14 +00:00
rcourtman	f0166dcab6	fix(installer): handle legacy sensor-proxy config commands	2025-11-20 20:33:51 +00:00
courtmanr@gmail.com	37b1517bd8	feat: implement atomic config management in sensor proxy	2025-11-20 19:01:24 +00:00
courtmanr@gmail.com	c8b4d4a0d8	Implement sensor proxy installation and configuration updates	2025-11-20 13:23:21 +00:00
rcourtman	bd0c47ed1b	Improve token collision handling and installer subnet support	2025-11-20 09:45:36 +00:00
rcourtman	7d0bbaf961	WIP: Fix temperature proxy registration persistence (incomplete) This commit contains multiple fixes for temperature proxy registration, but the core issue remains unresolved. ## What's Fixed: 1. Added config pointer and reloadFunc to TemperatureProxyHandlers 2. Added SetConfig method to keep handler in sync with router config changes 3. Added config reload after registration to prevent monitor from overwriting 4. Fixed installer port conflict detection and duplicate YAML key issues 5. Added comprehensive debug logging throughout registration flow ## What's Still Broken: The TemperatureProxyURL, TemperatureProxyToken, and TemperatureProxyControlToken fields are NOT persisting to nodes.enc after SaveNodesConfig is called. Debug logs confirm: - HandleRegister correctly updates nodesConfig.PVEInstances[matchedIndex] - The correct data is passed to SaveNodesConfig (verified in logs) - SaveNodesConfig completes without errors - Config reload executes successfully - BUT after Pulse restart, the fields are empty when loaded from disk The bug is in SaveNodesConfig serialization or file writing logic itself. Related files: - internal/api/temperature_proxy.go: Registration handler - internal/config/persistence.go: SaveNodesConfig implementation - internal/config/config.go: PVEInstance struct definition	2025-11-19 20:12:19 +00:00
rcourtman	714c2b753d	fix(sensor-proxy): ensure correct config.yaml permissions after modifications Fixed bug where config.yaml would end up with root:root 600 permissions after the installer modified it, causing service startup failures with "permission denied" errors. Root cause: Two code paths modified config.yaml without resetting ownership: 1. ensure_control_plane_config() - used mktemp (creates root-owned file), then mv'd it over config.yaml without chown/chmod 2. HTTP mode configuration - appended to config.yaml without resetting perms Fix: Added chown/chmod after both modifications: - Line 1601-1602: After control-plane config update - Line 1860-1861: After HTTP mode config append Now config.yaml maintains pulse-sensor-proxy:pulse-sensor-proxy 644 permissions after all modifications, allowing the service to start correctly. This bug was discovered during repair logic testing - the service failed to start after the installer ran, even though the fmt.Sprintf argument alignment fix was working correctly.	2025-11-19 14:53:44 +00:00
rcourtman	497f94f4e8	feat(sensor-proxy): improve turnkey setup experience with Pulse restart handling - Update installer to use v4.32.0 Phase 2 binaries with file-based config - Add automatic detection of Pulse service (systemd/hot-dev/docker) - Add --restart-pulse flag for automatic Pulse restart in dev/test environments - Default behavior shows clear instructions to restart Pulse manually (safe for production) - Add prominent restart notice with command suggestions based on detected deployment - Improve UX by making restart step impossible to miss Related to Phase 2 sensor-proxy architecture improvements	2025-11-19 12:44:07 +00:00
rcourtman	d6084e29dd	fix(sensor-proxy): fix remaining unsafe config writers 1. Self-heal script: Add BINARY_PATH variable so CLI migration actually runs - Previously logged "Binary not available" and skipped migration 2. migrate-sensor-proxy-control-plane.sh: Use atomic write (temp + rename) - Prevents partial writes if script is interrupted - Reduces race window with running service These were the remaining gaps identified by Codex review. NOTE: migrate-sensor-proxy-control-plane.sh still uses Python manipulation instead of the Phase 2 CLI, but as a one-time migration script for upgrades from v4.31, the atomic write provides sufficient protection. Future versions can deprecate this script entirely.	2025-11-19 10:59:54 +00:00
rcourtman	d554c9dbb2	fix(sensor-proxy): eliminate all uncoordinated config writers Remove all code paths that manipulate config files without Phase 2 locking: 1. Installer: Remove ensure_allowed_nodes_file_reference() call (line 1674) - Migration now handled exclusively by config migrate-to-file 2. Installer: Make migration failures fatal in update_allowed_nodes() - Prevents fallback to unsafe Python manipulation 3. Daemon sanitizer: Remove os.WriteFile() call - Now only sanitizes in-memory copy, doesn't write back to disk - Logs warning instructing admin to run `config migrate-to-file` 4. Self-heal script: Replace 132 lines of Python with CLI call - sanitize_allowed_nodes() now calls `config migrate-to-file` - Eliminates uncoordinated Python-based config rewriting All config mutations now flow exclusively through Phase 2 CLI with atomic operations and file locking. No code paths remain that can create duplicate allowed_nodes blocks. Addresses Codex review feedback on Phase 2 gaps.	2025-11-19 10:55:01 +00:00
rcourtman	28cd487889	feat(sensor-proxy): complete Phase 2 with CLI-based config migration Add `config migrate-to-file` command and update installer to eliminate all shell/Python config manipulation, ensuring atomic operations throughout. Changes: - Add `config migrate-to-file` command to atomically migrate inline allowed_nodes blocks to file-based configuration - Update installer's update_allowed_nodes() to call CLI exclusively - Simplify migrate_inline_allowed_nodes_to_file() to use CLI - Remove dependency on Python/sed for config manipulation - Implement dual-file locking (config.yaml + allowed_nodes.yaml) to prevent race conditions during migration All config mutations now flow through the Phase 2 CLI with: - File locking (flock) - Atomic writes (temp + rename + fsync) - Proper YAML parsing/generation This completes Phase 2 architecture and eliminates the root cause of config corruption issues. Related to prior commits: `53dec6010`, `3dc073a28`, `804a638ea`, `131666bc1`	2025-11-19 10:35:49 +00:00
rcourtman	1162a208cc	fix(sensor-proxy): critical Phase 2 locking and validation fixes Fixes critical issues found by Codex code review: 1. Fixed file locking race condition (CRITICAL) - Lock file was being replaced by atomic rename, invalidating the lock - New approach: lock a separate `.lock` file that persists across renames - Ensures concurrent writers (installer + self-heal timer) are properly serialized - Without this fix, corruption was still possible despite Phase 2 2. Fixed validation to honor configured allowed_nodes_file path - validate command now uses loadConfig() to read actual config - Respects allowed_nodes_file setting instead of assuming default path - Prevents false positives/negatives when path is customized 3. Allow empty allowed_nodes lists - Empty lists are valid (admin may clear for security, or rely on IPC validation) - validate no longer fails on empty lists - set-allowed-nodes --replace with zero nodes now supported - Critical for operational flexibility 4. Installer error propagation - update_allowed_nodes failures now exit installer with error - Prevents silent failures that leave stale allowlists - Self-heal will abort instead of masking CLI errors Technical Details: - withLockedFile() now locks `<path>.lock` instead of target file - Lock held for entire duration of read-modify-write-rename - atomicWriteFile() completes while lock is still held - Empty lists represented as `allowed_nodes: []` in YAML Testing: ✅ Lock file created and persists across operations ✅ Empty list can be written with --replace ✅ Validation passes with empty lists ✅ Config path from allowed_nodes_file honored ✅ Concurrent operations properly serialized These fixes ensure Phase 2 actually eliminates corruption by design. Identified by Codex code review Related to Phase 2 commit `3dc073a28`	2025-11-19 09:47:43 +00:00
rcourtman	0565781655	feat(sensor-proxy): Phase 2 - atomic config management with CLI Implements bullet-proof configuration management to completely eliminate allowed_nodes corruption by design. This builds on Phase 1 (file-only mode) by replacing all shell/Python config manipulation with proper Go tooling. New Features: - `pulse-sensor-proxy config validate` - parse and validate config files - `pulse-sensor-proxy config set-allowed-nodes` - atomic node list updates - File locking via flock prevents concurrent write races - Atomic writes (temp file + rename) ensure consistency - systemd ExecStartPre validation prevents startup with bad config Architectural Changes: 1. Installer now calls config CLI instead of embedded Python/shell scripts 2. All config mutations go through single authoritative writer 3. Deduplication and normalization handled in Go (reuses existing logic) 4. Sanitizer kept as noisy failsafe (warns if corruption still occurs) Implementation Details: - New cmd/pulse-sensor-proxy/config_cmd.go with cobra commands - withLockedFile() wrapper ensures exclusive access - atomicWriteFile() uses temp + rename pattern - Installer update_allowed_nodes() simplified to CLI calls - Both systemd service modes include ExecStartPre validation Why This Works: - Single code path for all writes (no shell/Python divergence) - File locking serializes self-heal timer + manual installer runs - Validation gate prevents proxy from starting with corrupt config - CLI uses same YAML parser as the daemon (guaranteed compatibility) Phase 2 Benefits: - Corruption impossible by design (not just detected and fixed) - No more Python dependency for config management - Atomic operations prevent partial writes - Clear error messages on validation failures The defensive sanitizer remains active but now logs loudly if triggered, allowing us to confirm Phase 2 eliminates corruption in production before removing the safety net entirely. This completes the fix for the recurring temperature monitoring outages. Related to Phase 1 commit `53dec6010`	2025-11-19 09:37:49 +00:00
rcourtman	5f4143f0ab	fix(sensor-proxy): eliminate allowed_nodes config corruption Phase 1 hotfix to address recurring config file corruption that causes 99% of temperature monitoring failures. The root cause was the installer oscillating between inline and file-based allowlist modes, creating duplicate `allowed_nodes:` keys in config.yaml. Changes: - Force file-based allowlist mode exclusively (refuse versions < v4.31.1) - Add automatic migration from inline to file-based config - Remove inline mode code path from update_allowed_nodes() - Migration runs on every install/self-heal to clean up existing corruption The self-heal timer runs every 5 minutes and was the primary source of corruption when version detection failed or encountered edge cases. This eliminates the dual code paths and ensures config.yaml is never edited for allowlist changes - only /etc/pulse-sensor-proxy/allowed_nodes.yaml is modified. Phase 2 (next release) will implement proper Go-based config management with atomic writes, locking, and systemd validation to prevent corruption by design. Related to recurring temperature monitoring outages	2025-11-19 09:21:54 +00:00
rcourtman	6e77c4dbea	fix: sanitize sensor proxy config during self-heal Related to #714.	2025-11-18 22:51:40 +00:00
rcourtman	28278aa0cb	Deduplicate inline proxy allow list	2025-11-18 14:58:50 +00:00
rcourtman	1abff55feb	Improve temperature proxy detection	2025-11-18 14:25:09 +00:00
rcourtman	9d6f32a56d	Include control-plane allow list in proxy config	2025-11-18 10:42:13 +00:00
rcourtman	c25b6f4e94	Fix setup-script tokens and proxy registration timing	2025-11-18 10:22:54 +00:00
rcourtman	13daa61d1d	Harden turnkey install and proxy auto-registration	2025-11-18 00:24:50 +00:00
rcourtman	2eaeccac44	Avoid blocking self-heal start during install	2025-11-17 23:14:51 +00:00
rcourtman	c4ce9a71c0	Break self-heal recursion when proxy unregistered	2025-11-17 23:01:57 +00:00
rcourtman	2f74ff985a	Fix inline allowed_nodes cleanup	2025-11-17 22:50:25 +00:00
rcourtman	3fe6b4fe9b	Improve temp proxy install UX	2025-11-17 22:30:32 +00:00
rcourtman	b80242a571	Restore pending control-plane helpers	2025-11-17 22:04:30 +00:00
rcourtman	99ab7171e7	Fix pending control-plane helpers	2025-11-17 22:01:11 +00:00
rcourtman	825e9e75ab	Speed up proxy self-heal reconciliation	2025-11-17 21:56:21 +00:00
rcourtman	ca4c570fa1	Add automatic control-plane reconciliation	2025-11-17 21:55:47 +00:00
rcourtman	fea8380444	Improve sensor proxy installer compatibility	2025-11-17 21:38:28 +00:00

1 2 3

122 Commits