Commit Graph

2297 Commits

Author SHA1 Message Date
rcourtman
dcdbee3c5c feat: Add in-app help system with HelpIcon component
Add contextual help icons throughout the UI to improve feature
discoverability. Users can click (?) icons to see explanations
with examples for settings they might not understand.

- HelpIcon component with click-to-open popover
- Centralized help content registry in /content/help/
- FeatureTip component for dismissible contextual tips
- Help added to: alert delay, AI endpoints, update channel
2026-01-07 09:22:23 +00:00
rcourtman
b75b33b9fe fix: Read form values from DOM for password manager compatibility
Password managers may fill form fields programmatically without
triggering input events, causing SolidJS signals to remain empty.
This fix reads values directly from the DOM on submit, ensuring
credentials filled by password managers are properly captured.

Related to #1036
2026-01-06 22:25:11 +00:00
rcourtman
73e6a8edc5 fix: Add missing UI for physical disk polling interval setting
The previous commit (06261627) added backend support for configurable
physical disk polling intervals but didn't include the UI to configure it.

Adds a dropdown selector (5/15/30/60 minutes) that appears when physical
disk monitoring is enabled.

Related to #1007
2026-01-06 20:32:24 +00:00
rcourtman
96d06da0d7 fix: Deduplicate shared storages (NFS, RBD, PBS, etc) in cluster view
Shared storages were appearing multiple times (once per node) because
the deduplication logic only checked the Proxmox `Shared` flag. Many
storage types are inherently cluster-wide but don't set this flag:

- RBD (Ceph block storage)
- CephFS
- PBS (Proxmox Backup Server)
- GlusterFS
- NFS
- CIFS/SMB
- iSCSI

Now we detect shared storage based on both the Shared flag AND the
storage type. Inherently shared storage types are deduplicated and
shown once with a "cluster" node designation.

Related to #1049
2026-01-06 17:44:52 +00:00
rcourtman
d3116defe3 fix: Prevent panic from send on closed websocket channel
Add atomic `closed` flag to Client struct and `safeSend()` helper method
to prevent race condition when sending to client channels. The race
occurred when a client disconnected while a goroutine was trying to send
initial state - the channel could be closed between the registration
check and the actual send.

All sends to client.send now go through safeSend() which checks the
closed flag first. The flag is set atomically before closing the channel
in all code paths (unregister, dispatchToClients, broadcast, shutdown).

Related to #1048
2026-01-06 17:41:25 +00:00
rcourtman
48fdff3efb fix: Preserve ackState for old acknowledged alerts during restore
When LoadActiveAlerts skipped acknowledged alerts older than 1 hour,
it was also not populating ackState. This meant that when the same
alert (e.g., backup-age) was recreated on the next poll cycle,
preserveAlertState couldn't find any acknowledgement record and
the alert would retrigger notifications.

Now ackState is populated even for skipped old acknowledged alerts,
so if they reappear, the acknowledgement will be restored.

Related to #1043
2026-01-06 11:00:36 +00:00
rcourtman
74ea90e4b3 fix: Podman sockets not prioritized when --docker-runtime=podman
When --docker-runtime=podman is explicitly set, the agent should try
Podman-specific sockets first before falling back to environment
defaults (which try /var/run/docker.sock).

Also adds /var/run/podman/podman.sock as a candidate socket path,
which is used by CoreOS and some Fedora configurations.

Related to #1045
2026-01-06 10:56:37 +00:00
rcourtman
d7000fafb6 fix: Empty array expansion fails on macOS bash 3.2 with set -u
macOS ships with bash 3.2 (GPLv2) which has a bug where expanding
an empty array like ${array[@]} with set -u enabled throws an
"unbound variable" error, even when the array is initialized.

Use ${arr[@]+"${arr[@]}"} pattern to safely handle empty arrays.

Related to #1046
2026-01-06 10:52:44 +00:00
rcourtman
cfcba70b2b chore: Bump version to 5.0.12 2026-01-05 23:48:57 +00:00
rcourtman
d0191d136f fix: Add configurable poll timeout and handle external Ceph storage
Changes:
1. Add MAX_POLL_TIMEOUT env var for large Proxmox clusters that need
   more than 3 minutes for polling (default: 3m, minimum: 30s)
2. Handle external Ceph storage gracefully - don't mark nodes unhealthy
   when Proxmox returns 'binary not installed' (e.g., for Ceph not
   managed by Proxmox)

Related to #965
2026-01-05 23:34:33 +00:00
rcourtman
c6182b2ed3 feat: Add FreeBSD/OPNsense support for the Pulse agent
Added FreeBSD amd64 and arm64 build targets to the release process:
- Build host-agent and unified agent binaries for FreeBSD
- Package FreeBSD tarballs in releases
- Include FreeBSD binaries in universal tarball for download endpoint

Updated agent install script with FreeBSD support:
- Fixed architecture detection (FreeBSD reports 'amd64' not 'x86_64')
- Added FreeBSD rc.d service handler with proper daemon management
- Automatic service enabling via rc.conf

This enables users to run the Pulse agent on FreeBSD-based systems
like OPNsense, pfSense, and vanilla FreeBSD.

Fixes #1041
2026-01-05 18:18:06 +00:00
rcourtman
0826c4ddb2 fix: Show linked agents in Managed Agents table with badge
Previously, agents linked to Proxmox nodes were hidden from the
Settings > Agents > Managed Agents table, which confused users who
couldn't find their installed agents.

Now all agents are shown in the table, with linked agents displaying
an indigo 'Linked' badge that explains they're also merged with
Proxmox nodes in the Dashboard.

Fixes #1038
2026-01-05 17:57:11 +00:00
rcourtman
0b6bceb96f fix: Hide non-functional edit button for Docker hosts in thresholds table. Related to discussion #1040 2026-01-05 17:13:43 +00:00
rcourtman
e4d7f6fd3d fix: Allow querying non-PBS backup storage with Active=0
Previously, only PBS-type storages were queried when Active=0 because
querying inactive storage can return 500 errors. However, this caused
backups from datacenter backup tasks on shared storage (NFS, CIFS, etc.)
to not appear when the storage reported Active=0 on some nodes.

Now any storage with backup content is queried regardless of Active status.
If the storage is truly unavailable, GetStorageContent returns an error
which is already handled gracefully (logged and skipped).

Related to #1037
2026-01-05 14:53:40 +00:00
rcourtman
2cc9214336 feat: Make container update alerts a free feature
Update alerts for Docker containers are now available to all users,
not just Pro license holders. The feature alerts when container image
updates have been pending for longer than the configured delay
(default: 24 hours).

- Remove Pro license gating from update alerts
- Add FeatureUpdateAlerts to free tier features
- Remove obsolete license gating tests

Related to #1031
2026-01-04 23:59:29 +00:00
rcourtman
f210ef5517 Auto-update Helm chart version to 5.0.11 helm-chart-5.0.11 2026-01-04 20:01:07 +00:00
rcourtman
9388a13718 Auto-update Helm chart documentation 2026-01-04 20:01:06 +00:00
rcourtman
3b70e29b87 test: add PULSE_DATA_DIR to TestMainCmd
TestMainCmd was missing PULSE_DATA_DIR setup, causing it to try to
access /etc/pulse which fails in CI.
v5.0.11
2026-01-04 19:15:38 +00:00
rcourtman
21a819f6dc test: use t.Setenv for safer test cleanup
t.Setenv ensures environment variables are restored after test
completion, preventing race conditions where background goroutines
(like config watchers) might access unset env vars during cleanup.
2026-01-04 19:08:45 +00:00
rcourtman
fdba559167 test: skip tests requiring /etc/pulse in CI
Tests that use the default /etc/pulse data directory fail in CI
where the directory doesn't exist and can't be created.
2026-01-04 18:59:48 +00:00
rcourtman
1731489709 test: remove obsolete EnsureDirError test
The test was checking an error path that no longer exists -
NewConfigPersistence now falls back to /etc/pulse when directory
creation fails, and calls log.Fatal() only when that also fails.
2026-01-04 18:51:02 +00:00
rcourtman
37f5e12dc2 test: add encryption keys to remaining cmd/pulse tests
TestConfigImportCmd and TestConfigAutoImportCmd need encryption keys
in CI where /etc/pulse/.encryption.key doesn't exist.
2026-01-04 18:43:40 +00:00
rcourtman
a9d37eed8d test: fix TestLoad_ReadErrors encryption key 2026-01-04 18:24:39 +00:00
rcourtman
821783eef7 test: fix tests that create .enc files without encryption keys
Tests were failing in CI because they created nodes.enc files without
valid encryption keys, triggering the crypto safety check.

Added createTestEncryptionKey helper and fixed:
- TestLoad_MockEnv (config_load_test.go)
- Multiple tests in commands_test.go that create nodes.enc
2026-01-04 18:15:08 +00:00
rcourtman
f2be9b60f0 test: fix TestLoad_Errors to provide valid encryption key
Test was creating .enc files without a valid encryption key, which
triggers the crypto safety check that prevents generating new keys
when encrypted data exists.
2026-01-04 18:02:39 +00:00
rcourtman
d71b6bd756 fix: Allow qm/pct reboot/shutdown commands with approval
The blocked patterns for 'reboot' and 'shutdown' were too broad,
matching anywhere in the command string. This caused legitimate
Proxmox VM control commands like 'qm reboot 201' to be blocked
instead of requiring approval.

Fix by anchoring these patterns to only match bare system commands
(^reboot, ^shutdown, etc.) while allowing qm/pct variants through
the RequireApproval path.

Related to #1024
2026-01-04 17:57:51 +00:00
rcourtman
301b2fd050 test: fix config tests failing in CI when /etc/pulse doesn't exist
Tests were calling Load() without setting PULSE_DATA_DIR, causing them
to try to create /etc/pulse which fails in CI environments.

- Skip TestLoad_Defaults if /etc/pulse doesn't exist
- Add PULSE_DATA_DIR to tests that were missing it
2026-01-04 17:50:57 +00:00
rcourtman
7a1e3e9b4e Improve test coverage for cmd/pulse-sensor-proxy 2026-01-04 16:10:34 +00:00
rcourtman
f77025fb2f test: fix flaky tests with nonexistent path assertions
Tests using /nonexistent/... paths fail in sandboxed environments
where they return 'permission denied' instead of 'not exists'.
Use /tmp/... paths instead which reliably return 'not exists'.
2026-01-04 15:38:30 +00:00
rcourtman
121adbf00a chore: bump version to 5.0.11 2026-01-04 15:27:58 +00:00
rcourtman
45d4d68127 fix: Add debug logging and response format handling for replication status
- Add comprehensive debug logging to diagnose replication status fetch failures
- Handle both array and single-object response formats from Proxmox API
- Log raw response body for easier debugging
- Log success/failure for each enrichment step

This helps diagnose issue #992 where replication last/next sync times aren't
showing. The logging will reveal if the API call is failing, returning empty
data, or returning data in an unexpected format.

Related to #992
2026-01-04 15:01:32 +00:00
rcourtman
43b5fad12c fix: Add main host URL as fallback for remote cluster access
When a Proxmox cluster is discovered, Pulse now includes the user-provided
main host URL as a fallback endpoint. This handles scenarios where Proxmox
reports internal IPs that aren't reachable from Pulse's network (e.g.,
monitoring a remote cluster across different networks).

Previously, if all cluster endpoint IPs were unreachable, the connection
would fail with no fallback. Now the ClusterClient will fall back to the
main host URL, allowing Proxmox to route API calls internally.

Related to #1028
2026-01-04 14:54:03 +00:00
rcourtman
504f26c6f5 test(ai): improve coverage for patrol service
- Added TestPatrolService_RunPatrol_FullCoverage to test main patrol loop
- Added TestPatrolService_StartStop for lifecycle coverage
- Added TestPatrolService_Setters_Coverage for configuration setters
- Added TestPatrol_RunHeuristicAnalysis_Coverage for heuristic integration
- Mocked provider and state for deterministic AI patrol testing
- Addressed 0% coverage in internal/ai/patrol.go
2026-01-04 14:03:58 +00:00
rcourtman
90cce6d51b test(monitoring): fix failing snapshot tests and improve coverage
- Fix TestMonitor_PollGuestSnapshots_Coverage by correctly initializing State ID fields
- Improve PBS client to handle alternative datastore metric fields (total-space, etc.)
- Add comprehensive test coverage for PBS polling, auth failures, and datastore metrics
- Add various coverage tests for monitoring, alerts, and metadata handling
- Refactor Monitor to support better testing of client creation and auth handling
2026-01-04 10:29:40 +00:00
rcourtman
5d4e911298 feat: improve test coverage for pulse-sensor-proxy 2026-01-03 21:42:19 +00:00
rcourtman
fd7e80ae17 fix: Add clear warning when Docker token is already in use
When a Docker agent tries to register with a token that's already bound
to another agent, the error was logged generically as "Failed to send
docker report". Users had to dig into logs to understand the issue.

Now logs a prominent error message:
"DOCKER REGISTRATION FAILED: This API token is already used by another
Docker agent. Each Docker host requires its own unique token. Generate
a new token in Pulse Settings > Agents and reinstall with the new token."

Related to #1027
2026-01-03 20:56:04 +00:00
rcourtman
22e1cc5613 test(agent): achieve 95% coverage for pulse-agent 2026-01-03 20:52:42 +00:00
rcourtman
fa43628cde fix: Alert acknowledge/unacknowledge fails with reverse proxies
Reverse proxies (Traefik, Caddy, nginx) often normalize or reject URLs
containing %2F (encoded slash). Alert IDs contain forward slashes
(e.g., "docker-container-state-docker:abc/def"), causing acknowledge
requests to fail with 400 errors when going through a reverse proxy.

Added new body-based endpoints that accept alert ID in JSON body:
- POST /api/alerts/acknowledge {"id": "..."}
- POST /api/alerts/unacknowledge {"id": "..."}
- POST /api/alerts/clear {"id": "..."}

Updated frontend to use the new endpoints. Legacy path-based endpoints
are preserved for backwards compatibility.

Related to #1026
2026-01-03 20:51:25 +00:00
rcourtman
adba448419 fix(pbs): correct API paths and achieve >95% test coverage 2026-01-03 20:45:36 +00:00
rcourtman
b039b79e4a fix: Physical disk temps showing 0°C when using host agent SMART data
The mergeNVMeTempsIntoDisks and mergeHostAgentSMARTIntoDisks functions
require nodes to have LinkedHostAgentID populated to match disks with
host agent SMART data. However, the code was passing the local modelNodes
variable which doesn't have this field set - the linking happens inside
UpdateNodesForInstance which modifies the state's copy, not the local var.

Fixed by using currentState.Nodes (from GetSnapshot()) instead of
modelNodes/modelNodesCopy in both the skip-poll path and the background
goroutine. The state snapshot contains nodes with LinkedHostAgentID
already populated, allowing proper SMART data merging.

Related to #1014
2026-01-03 19:20:31 +00:00
rcourtman
abccbcafb6 fix: Container update command incorrectly removes Docker host and revokes token
When a container update command completed successfully, the server was
incorrectly returning shouldRemove=true, which caused the Docker host to
be removed and its API token revoked. This caused 401 Unauthorized errors
for subsequent agent reports.

The fix ensures shouldRemove is only true for "stop" commands, not for
"update_container" or "check_updates" commands.

Related to #1020
2026-01-03 19:05:18 +00:00
rcourtman
233278a9d2 Add Docker Swarm frontend components 2026-01-03 18:52:38 +00:00
rcourtman
ed78509f92 Fix flaky tests and improve coverage across alerts, api, and config packages
- Fix deadlock and race conditions in internal/alerts
- Add comprehensive error path tests for internal/config
- Fix 401 handling in internal/api
- Fix Docker Swarm task filtering test logic
2026-01-03 18:36:17 +00:00
rcourtman
08661cca8e fix: Add anchor target for "Manage linked agents" link
The link in the agents list banner pointed to #linked-agents but no
element had that ID, so clicking it did nothing.

Related to #1021
2026-01-03 11:33:08 +00:00
rcourtman
a47c7803bb fix: Preserve configured runtime preference during report collection
When collecting reports, the runtime re-detection was passing RuntimeAuto
instead of the user's configured preference. This caused podman to switch
back to docker on systems like CoreOS where podman provides a docker-
compatible socket at /var/run/docker.sock.

Now the current runtime (set at init from user's --docker-runtime flag)
is passed as the preference, preventing spurious runtime switching.

Related to #1022
2026-01-03 11:30:25 +00:00
rcourtman
9e339957c6 fix: Update runtime config when toggling Docker update actions setting
The DisableDockerUpdateActions setting was being saved to disk but not
updated in h.config, causing the UI toggle to appear to revert on page
refresh since the API returned the stale runtime value.

Related to #1023
2026-01-03 11:14:17 +00:00
rcourtman
fbbefa4546 Improve tests for internal/alerts package
- Fix TestSaveHistoryWithRetry_WriteError to be robust on root
- Add TestOnAlert to history_test.go
- Add pmg_anomaly_test.go for PMG anomaly detection coverage
- Add cleanup_test.go for tracking map cleanup coverage
- extend filter_evaluation_test.go to cover all guest threshold logic
2026-01-02 23:47:16 +00:00
rcourtman
3b48c4acbb Auto-update Helm chart version to 5.0.10 helm-chart-5.0.10 2026-01-02 21:30:25 +00:00
rcourtman
e19c202ff3 Auto-update Helm chart documentation 2026-01-02 21:30:23 +00:00
rcourtman
87ca7c92e0 docs: update example in dev-deploy-agent script 2026-01-02 21:08:42 +00:00