Commit Graph

276 Commits

Author SHA1 Message Date
rcourtman
8c440b6f54 feat: notify server during agent uninstallation
- Add /api/agents/host/uninstall endpoint for agent self-unregistration
- Update install.sh to notify server during --uninstall (reads agent ID from disk)
- Update install.ps1 with same logic for Windows
- Update frontend uninstall command to include URL/token flags

This ensures that when an agent is uninstalled, the host record is
immediately removed from Pulse and any linked PVE nodes have their
+Agent badge cleared.
2025-12-26 22:38:46 +00:00
rcourtman
a5d92d5359 fix: Add --disk-exclude support to install script
Users can now pass disk exclusion patterns during agent installation:

  curl ... | bash -s -- --disk-exclude '/mnt/*' --url ... --token ...

The flag is repeatable for multiple exclusion patterns.

Related to #896
2025-12-26 12:11:18 +00:00
rcourtman
0eb512f90d feat: Add SysV init support for legacy systems. Related to #894
Adds support for systems that use SysV init (like Asustor NAS) that don't have
systemd, OpenRC, or launchd. The installer now:

- Detects /etc/init.d as a fallback when no other init system is found
- Creates an LSB-compliant init script with start/stop/restart/status
- Uses update-rc.d (Debian) or chkconfig (RHEL) to enable on boot
- Falls back to manual rc.d symlink creation if neither tool is available
- Properly cleans up on uninstall
2025-12-24 19:40:23 +00:00
rcourtman
2420c2affb feat: Commands disabled by default, require --enable-commands to opt-in
BREAKING CHANGE: AI command execution on agents is now disabled by default.
Users who want AI auto-fix must explicitly enable it with --enable-commands
flag or PULSE_ENABLE_COMMANDS=true environment variable.

Changes:
- Add --enable-commands flag (opt-in for command execution)
- Commands disabled by default for security (defense-in-depth)
- --disable-commands is now deprecated (logs warning, no longer needed)
- PULSE_DISABLE_COMMANDS deprecated in favor of PULSE_ENABLE_COMMANDS
- Update installer script to use --enable-commands
- Backwards compatibility: PULSE_DISABLE_COMMANDS=false still enables commands

This addresses community feedback about secure defaults for arbitrary
command execution on production infrastructure.

Related to #889
2025-12-24 17:36:44 +00:00
rcourtman
73a92813f5 feat: add --disable-commands flag to installer script. Related to #889 2025-12-24 17:09:48 +00:00
rcourtman
3c3f560c4b Fix login re-auth with stale sessions and hot-dev encryption safety
- Login.tsx: Use apiClient.fetch with skipAuth to avoid auth loops
- router.go: Skip CSRF validation for /api/login endpoint
- hot-dev.sh: Detect encrypted files before generating new key to prevent data loss
2025-12-20 13:45:11 +00:00
rcourtman
7f05d87809 fix: add missing HandleLicenseFeatures method and related changes
- Add HandleLicenseFeatures handler that was missing from license_handlers.go
- Add /api/license/features route to router
- Update AI service and metadata provider
- Update frontend license API and components
- Fix CI build failure caused by tests referencing unimplemented method
2025-12-19 22:59:52 +00:00
rcourtman
13af682ce1 fix(config): add PULSE_AGENT_CONNECT_URL and improve Docker detection
- Add AgentConnectURL config option to override public URL for agents
- Improve install.sh to diagnose docker detection failures
- Update router to prioritize AgentConnectURL for agent install commands
2025-12-19 16:43:14 +00:00
rcourtman
65829983b5 v5: gate legacy sensor-proxy and prune dev docs 2025-12-18 21:51:25 +00:00
rcourtman
ab480ca489 fix: Prevent orphaned encrypted data when encryption key is deleted
- crypto.go: Add runtime validation to Encrypt() that verifies the key file
  still exists on disk before encrypting. If the key was deleted while Pulse
  is running, encryption now fails with a clear error instead of creating
  orphaned data that can never be decrypted.

- hot-dev.sh: Auto-generate encryption key for production data directory
  (/etc/pulse) when HOT_DEV_USE_PROD_DATA=true and key is missing. This
  prevents startup failures and ensures encrypted data can be created.

- Added test TestEncryptRefusesAfterKeyDeleted to verify the protection works.
2025-12-17 17:00:53 +00:00
rcourtman
b6317fe7b1 feat: add Linux agent archives to releases and improve version mismatch detection
Release improvements:
- Add standalone Linux agent archives (amd64, arm64, armv7, armv6, 386) to releases
- Previously only Darwin/Windows had standalone agent downloads

Agent version mismatch detection:
- New agentVersion.ts utility compares agent versions against server version
- DockerHostSummaryTable now shows accurate 'outdated' warnings when
  agent version is older than the connected Pulse server
- Better tooltips explain version status and upgrade instructions
2025-12-16 22:51:10 +00:00
rcourtman
5b7445f758 fix(sensor-proxy): remove pct pull fallback that causes PVE task errors
The installer had a fallback that tried to copy the sensor-proxy binary from
inside the Pulse container using `pct pull`. However, the paths it tried
(/opt/pulse/bin/pulse-sensor-proxy-linux-amd64) don't exist in Pulse containers.

When pct pull fails, Proxmox logs a task error visible in the UI, even though
the error is harmless and expected. This caused recurring "TASK ERROR: failed
to open /opt/pulse/bin/pulse-sensor-proxy-linux-amd64: No such file or directory"
messages every time the selfheal timer ran and couldn't download from GitHub.

The pct pull approach was never going to succeed for LXC-installed Pulse instances
since the binary doesn't exist at those paths inside the container. Removing it
eliminates the spurious error messages.

Related to #817
2025-12-16 00:02:34 +00:00
rcourtman
1f131b8a14 fix(sensor-proxy): correctly skip selfheal regeneration during selfheal runs
The PULSE_SENSOR_PROXY_SELFHEAL env var is set to "1", but the check
was only looking for "true", causing selfheal to regenerate itself on
every run. This meant the cached installer would overwrite the selfheal
script with its (potentially older) version, defeating any fixes in
the selfheal script.

Now correctly checks for both "true" and "1".

Related to #849
2025-12-15 14:06:40 +00:00
rcourtman
7dd036bcda fix(sensor-proxy): skip selfheal reinstall when service is healthy
The selfheal timer was running the full installer every 5 minutes even when
the service was already running and healthy. This caused unnecessary:
- Pulse service restarts
- Config migrations
- Socket setup
- 172MB memory spikes and 15s CPU usage per run

Now the selfheal exits early after checking service health, only proceeding
to reinstall logic if the service is actually failing.

Fixes #849
2025-12-15 13:31:18 +00:00
rcourtman
3594fc692a fix(sensor-proxy): skip prerelease versions in selfheal update check
The selfheal timer was downloading prerelease versions (e.g., v5.0.0-rc.1)
even when users had the stable channel selected. This happened because
fetch_latest_release_tag() didn't filter out prereleases from the GitHub
releases API response.

Now both the Python-based parser and the grep fallback skip prereleases:
- Python: checks `release.get("prerelease")` field
- Grep fallback: filters out -rc, -alpha, -beta patterns

Related to #849
2025-12-15 11:03:32 +00:00
rcourtman
5c6d1798a8 fix(sensor-proxy): prevent installer hang on control-plane sync
Fixes #819

The installer was hanging at 'Pending control-plane sync detected' because
systemctl start ran synchronously, waiting for the selfheal service to complete.
If the control-plane sync failed or took a long time (e.g., Proxmox node not
configured in Pulse yet), the installer would hang indefinitely.

Changed to use 'systemctl start --no-block' to run the selfheal service
asynchronously in the background. The sync will still be attempted, but the
installer will complete immediately and show the success message.
2025-12-15 08:39:06 +00:00
rcourtman
afad679ffd fix(sensor-proxy): add timeouts to pmxcfs operations in installer
The container config backup and pct commands could hang indefinitely
when the Proxmox cluster filesystem (pmxcfs) is slow or unresponsive.
This caused the installer to appear to hang after printing
"Configuring socket bind mount..." with no further output.

Added timeout protection to:
- Container config backup cp operation
- pct status check
- pct config verification
- Config rollback cp operation

Related to #738
2025-12-15 07:04:14 +00:00
rcourtman
6ca6f34577 fix(agent): stop running agent before TrueNAS reinstall to avoid "text file busy"
On TrueNAS, the runtime binary may be in /root/bin or /var/tmp while
the install script only checked INSTALL_DIR (/data/pulse-agent).
This left the running process using the binary when the script tried
to copy a new version, causing "Text file busy" errors.

Now explicitly stop the service and kill any pulse-agent processes
before modifying binaries on TrueNAS systems.

Related to #846
2025-12-15 04:03:06 +00:00
rcourtman
758560ee69 fix(sensor-proxy): add --proxy-url flag for manual URL override
Closes #826

The error messages suggested using --proxy-url but the flag was never
implemented. This adds the flag so users can manually specify the
proxy URL when:
- Auto IP detection produces malformed results
- The desired IP is not the primary IP
- Multi-homed hosts need a specific interface
2025-12-14 21:15:55 +00:00
rcourtman
2e06f6b966 feat: auto-detect platforms during agent install and allow multi-host tokens
- Install script now auto-detects Docker, Kubernetes, and Proxmox
- Platform monitoring is enabled automatically when detected
- Users can override with --disable-* or --enable-* flags
- Allow same token to register multiple hosts (one per hostname)
- Update tests to reflect new multi-host token behavior
- Improve CompleteStep and UnifiedAgents UI components
- Update UNIFIED_AGENT.md documentation
2025-12-14 16:21:59 +00:00
rcourtman
ee659fd645 fix: Unraid uninstall now cleans up legacy agents from go script
The previous fix added legacy cleanup for systemd/macOS but missed the
Unraid-specific section. Now removes pulse-host-agent and pulse-docker-agent
entries from /boot/config/go and cleans up /boot/config/pulse directory.
2025-12-13 22:31:50 +00:00
rcourtman
e7524d0264 feat: thorough uninstall cleans up legacy agents and all artifacts
The --uninstall flag now removes:
- Unified pulse-agent (service, binary, logs)
- Legacy pulse-host-agent (service, binary, logs)
- Legacy pulse-docker-agent (service, binary, logs)
- Agent state directory (/var/lib/pulse-agent)
- All related log files

Works on Linux (systemd), macOS (launchd), and other supported platforms.
2025-12-13 21:44:00 +00:00
rcourtman
5e3a2849b4 chore: add upgrade detection and messaging to install script
Shows version information when upgrading an existing pulse-agent installation.
Minor UX improvement - script already handled upgrades correctly.
2025-12-13 21:22:14 +00:00
rcourtman
dd6107406f fix: Add execute permissions to shell scripts 2025-12-13 15:44:51 +00:00
rcourtman
a259b67348 feat: add Kubernetes platform support 2025-12-12 21:31:11 +00:00
rcourtman
c8adbb7ae5 Add AI monitoring enhancements and host metadata features
- Add host metadata API for custom URL editing on hosts page
- Enhance AI routing with unified resource provider lookup
- Add encryption key watcher script for debugging key issues
- Improve AI service with better command timeout handling
- Update dev environment workflow with key monitoring docs
- Fix resource store deduplication logic
2025-12-09 16:27:46 +00:00
rcourtman
8948e84fe5 feat: AI features, agent improvements, and host monitoring enhancements
AI Chat Integration:
- Multi-provider support (Anthropic, OpenAI, Ollama)
- Streaming responses with markdown rendering
- Agent command execution for remote troubleshooting
- Context-aware conversations with host/container metadata

Agent Updates:
- Add --enable-proxmox flag for automatic PVE/PBS token setup
- Improve auto-update with semver comparison (prevents downgrades)
- Add updatedFrom tracking to report previous version after update
- Reduce initial update check delay from 30s to 5s
- Add agent version column to Hosts page table

Host Metrics:
- Add DiskIO stats collection (read/write bytes, ops, time)
- Improve disk filtering to exclude Docker overlay mounts
- Add RAID array monitoring via mdadm
- Enhanced temperature sensor parsing

Frontend:
- New Agent Version column on Hosts overview table
- Improved node modal with agent-first installation flow
- Add DiskIO display in host drawer
- Better responsive handling for metric bars
2025-12-05 10:37:02 +00:00
rcourtman
53d7776d6b wip: AI chat integration with multi-provider support
- Add AI service with Anthropic, OpenAI, and Ollama providers
- Add AI chat UI component with streaming responses
- Add AI settings page for configuration
- Add agent exec framework for command execution
- Add API endpoints for AI chat and configuration
2025-12-04 20:16:53 +00:00
rcourtman
610be6914c fix: TrueNAS SCALE 24.04+ has read-only /usr/local/bin
On TrueNAS SCALE 24.04+, the root filesystem including /usr/local/bin
is read-only. The installer now tries multiple locations for the
runtime binary:

1. Execute directly from /data (if no noexec mount)
2. /usr/local/bin (older TrueNAS versions)
3. /root/bin (TrueNAS SCALE 24.04+)
4. /var/tmp (last resort)

The bootstrap script is also updated to use the determined runtime
location rather than hardcoding /usr/local/bin.

Related to #801
2025-12-03 21:02:55 +00:00
rcourtman
7d733db3a8 fix: Default sensor-proxy HTTP to 0.0.0.0:8443 for IPv4 binding
On systems with net.ipv6.bindv6only=1 (including some Proxmox 8
configurations), using ":8443" results in IPv6-only binding. Users
reported curl to 127.0.0.1:8443 hanging while [::1]:8443 worked.

Changed default from ":8443" to "0.0.0.0:8443" to explicitly bind IPv4.

Related to #805
2025-12-03 20:25:08 +00:00
rcourtman
4f23cddcae fix: Handle --http-addr with bind address in sensor-proxy installer
When using --http-addr 0.0.0.0:8443 (to bind to IPv4 only), the URL
construction was broken, producing URLs like https://192.168.31.110.0.0.0:8443

Now correctly extracts the port number from both ":8443" and "0.0.0.0:8443"
formats using ${HTTP_ADDR##*:} instead of ${HTTP_ADDR#:}

Related to #805
2025-12-03 20:16:30 +00:00
rcourtman
a11e1c1df3 fix: TrueNAS agent binary now runs from /usr/local/bin to avoid noexec
TrueNAS SCALE's /data partition may have exec=off, preventing binaries
from executing. The installer now:
- Stores the binary in /data/pulse-agent/ for persistence
- Copies it to /usr/local/bin (tmpfs, allows exec) for runtime
- Updates the bootstrap script to copy on each boot

Related to #801
2025-12-03 20:14:48 +00:00
rcourtman
774fac9edd fix: Improve TrueNAS detection for immutable filesystem installs
Added fallback detection for TrueNAS systems that may not have
/etc/truenas-version or other standard markers:

1. Check if hostname contains "truenas" (common default hostname)
2. Test if /usr/local/bin is actually writable - if not and /data
   exists, use TrueNAS installation paths

This fixes installations on TrueNAS systems where the standard
detection files are missing but the filesystem is still immutable.

Related to #801
2025-12-03 18:04:10 +00:00
rcourtman
87eb88dd98 fix: sensor-proxy installer fails silently on containers without snapshots
The SNAPSHOT_START extraction used grep in a pipeline with pipefail
enabled. When a container config has no snapshot sections (no lines
starting with '['), grep returns exit code 1, causing set -e to
terminate the script without any error message.

This affected newly created containers that hadn't been snapshotted yet,
which is the common case for fresh Pulse installations via community
scripts.

Related to #780
2025-12-03 09:04:45 +00:00
rcourtman
4b8fbe6ae2 fix: --disable-host flag now correctly disables host monitoring
The install script was not passing the --enable-host=false flag to the
agent when --disable-host was specified. Since the agent defaults to
enabling host monitoring, it was ignored.

Also adds TrueNAS SCALE support to the unified agent installer:
- Detects TrueNAS SCALE via /etc/truenas-version and other markers
- Installs to /data/pulse-agent (persists across TrueNAS upgrades)
- Creates Init/Shutdown task to restore service after TrueNAS updates
- Adds uninstall support for TrueNAS SCALE

Related to #800, #801
2025-12-03 03:04:03 +00:00
rcourtman
b3b8081426 fix: add timeout to pmxcfs operations in install-sensor-proxy.sh
Reading and writing container config from /etc/pve/lxc/ can hang
indefinitely if the Proxmox cluster filesystem (pmxcfs) is slow or
unresponsive. This causes the installer to appear to hang after
"Configuring socket bind mount..." with no further output.

Add 10-second timeouts to both cp operations and provide helpful error
messages suggesting the user check cluster health with 'pvecm status'.

Related to #738
2025-12-01 21:04:01 +00:00
rcourtman
f197dfc922 Fix sensor-proxy installer to download latest release by default
The VERSION variable was hardcoded to v4.32.0 instead of being empty,
which prevented the "fetch latest release" logic from running. When
VERSION is empty, REQUESTED_VERSION defaults to "latest" which triggers
proper release detection via GitHub API.

Related to #738
2025-12-01 06:02:42 +00:00
rcourtman
973f1f9866 Fix SSH key collision when installing sensor-proxy on multiple cluster nodes
When running install-sensor-proxy.sh on multiple nodes in a cluster, each
installation was removing all existing pulse-managed-key entries before
adding its own. This caused the following scenario:

1. Run script on node A: node A's key is added to all nodes
2. Run script on node B: node B's key replaces node A's key on all nodes
3. Result: node A's proxy can no longer SSH to other nodes

The fix changes the behavior to:
- Check if the specific SSH key already exists on the target
- Only add the key if not present (idempotent)
- Never remove existing pulse-managed-key entries

This allows multiple sensor-proxy installations to coexist in a cluster,
with each node's proxy key authorized on all nodes.

Related to #738
2025-11-30 21:03:36 +00:00
rcourtman
7e990710e9 Fix indentation in cleanup section after pvesh refactor
The previous commit left broken indentation and an orphaned else block
in the cleanup section. This fixes the structure to properly handle
the cluster nodes vs standalone node cases.

Related to #738
2025-11-29 18:41:59 +00:00
rcourtman
649278bf5f Use pvesh API for cluster node discovery in install-sensor-proxy.sh
Replace brittle pvecm nodes CLI parsing with pvesh API calls. The old
approach used awk field positions ($4) which breaks across Proxmox
versions, locales, or output format changes.

Added get_cluster_node_names() helper that:
- Prefers pvesh get /cluster/status --output-format json (structured)
- Falls back to pvecm nodes CLI parsing if pvesh unavailable
- Uses python3 for JSON parsing (always available on Proxmox)

Related to #738
2025-11-29 18:33:27 +00:00
rcourtman
a0eead95f1 Fix pvecm nodes parsing to correctly extract hostname field
The awk was using $NF which returns "(local)" on the local node's line
instead of the hostname. Changed to $4 which is the actual hostname field.

Related to #738
2025-11-29 18:24:44 +00:00
rcourtman
81eb6b018a Use pvecm nodes for cluster discovery to prefer management IPs
For multi-network Proxmox clusters (e.g., separate corosync and
management networks), the installer now uses `pvecm nodes` to get
hostnames and resolves them via /etc/hosts. This automatically
prefers management IPs when the cluster has proper /etc/hosts
configuration.

Falls back to the previous `pvecm status` IP extraction if hostname
resolution doesn't yield results.

Related to #738
2025-11-29 15:07:42 +00:00
rcourtman
b6f0d74c55 Use New-Service for Windows service creation
Switch from sc.exe create to PowerShell's New-Service cmdlet for
creating the Windows service. New-Service provides better error
handling and is more reliable. Keep sc.exe only for configuring
service recovery options (restart on failure), which New-Service
doesn't support.

Related to #776
2025-11-29 14:09:30 +00:00
rcourtman
ee9c63c880 Add jq dependency and fix secondary node support in sensor-proxy installer
Related to #738

Fixes two issues discovered by k5madsen:

1. Missing jq dependency: The sensor wrapper script uses jq extensively to
   parse SMART data JSON from smartctl but the installer never checked if
   jq was installed. Added jq to REQUIRED_CMDS list.

2. Secondary node support: When running on a secondary Proxmox cluster node
   where the container doesn't exist locally, the script now:
   - Warns instead of failing with "Container does not exist"
   - Continues installation for host temperature monitoring
   - Skips container-specific socket mount configuration

This allows users to run the installer on all cluster nodes (as intended)
to ensure the sensor-proxy service is available when containers migrate.
2025-11-28 21:08:43 +00:00
rcourtman
1d41920d91 Fix Windows agent SC CREATE binPath quoting
The binPath parameter value needs outer quotes when it contains
embedded quotes and spaces. Without this, SC.EXE parses the value
incorrectly and fails to create the service.

Related to #776
2025-11-28 18:04:24 +00:00
rcourtman
db8790a463 Fix silent sc.exe delete failures in install.ps1
- Add error handling for sc.exe delete in uninstall logic
- Add error handling for legacy agent service deletion
- Add error handling for existing service deletion before reinstall
- Show warnings when service deletion fails instead of silently continuing

Related to #735
2025-11-28 09:58:47 +00:00
rcourtman
c66e9bb0e5 Add --agent-id parameter to unified agent installers
The unified installer was missing --agent-id support that existed in
the legacy host-agent installer. This parameter allows users to specify
a custom agent identifier instead of using auto-generated IDs.

Updated both install.sh (Linux/macOS/Synology/Unraid) and install.ps1
(Windows) to accept --agent-id and pass it through to the agent binary.

Related to #772
2025-11-28 06:08:42 +00:00
rcourtman
4a27018e46 Fix silent Windows service creation failures in install.ps1
- Add Administrator privilege check at script start
- Replace silent `| Out-Null` with proper error handling for sc.exe
- Exit with error if service creation fails
- Add try/catch for Start-Service with proper error reporting

Related to #735, #760, #751
2025-11-27 21:29:51 +00:00
rcourtman
4e5977f5d9 fix(dev): export auth env vars in hot-dev script
Use load_env_file for /etc/pulse/.env to properly export
PULSE_AUTH_USER and PULSE_AUTH_PASS to the backend process.
2025-11-27 11:14:27 +00:00
rcourtman
17af64fedf security: harden Windows installer script
- Add input validation for URL (http/https), token format, and interval
- Add SHA256 checksum verification against X-Checksum-Sha256 header
- Add PE binary magic verification (MZ header)
- Add file size validation (1-100MB expected)
- Add TLS 1.2/1.3 minimum enforcement
- Add 5-minute download timeout
- Add temp file cleanup on failure
- Add binary backup/restore on installation failure
- Download to temp file before atomic move to final location
2025-11-26 13:42:45 +00:00