Commit Graph

1056 Commits

Author SHA1 Message Date
rcourtman
c6bd8cb74c Improve internal package test coverage 2025-12-29 17:25:21 +00:00
rcourtman
d07b471e40 Refactor Docker agent: metrics collection, security checks, and batch updates
- Separated metrics collection into internal/dockeragent/collect.go
- Added agent self-update pre-flight check (--self-test)
- Implemented signed binary verification with key rotation for updates
- Added batch update support to frontend with parallel processing
- Cleaned up agent.go and added startup cleanup for backup containers
- Updated documentation for Docker features and agent security
2025-12-29 17:20:18 +00:00
rcourtman
5ad1f5e847 feat: Merge linked host agent SMART temps into Physical Disks
When a host agent is running on a Proxmox node (linked host agent),
merge the agent's SMART disk temperature data into the Physical Disks
view for that node. This allows disk temps collected by pulse-agent
to populate the Physical Disks page without requiring Proxmox SMART
monitoring to be enabled.

Matching is done by WWN (most reliable), serial number, or device path.

Closes part of issue #909 (follow-up from MichiFr)
2025-12-29 15:39:20 +00:00
rcourtman
d377a5c464 fix: Pass SMART disk temperatures to frontend. Related to #941
The SMART disk temperature data was being collected by the agent but not
passed through to the frontend. Fixed by:

1. Added SMART field to HostSensorSummaryFrontend and created
   HostDiskSMARTFrontend type in models_frontend.go
2. Updated hostSensorSummaryToFrontend() in converters.go to include
   SMART data conversion
3. Added HostDiskSMART interface to frontend TypeScript types
4. Updated HostTemperatureCell to display disk temperatures in tooltip
   with a 'Disk Temperatures' section and fallback to SMART temps when
   no CPU/sensor temps are available
2025-12-29 15:27:46 +00:00
rcourtman
277aca3e4e fix: Only log 'Migration complete' when inline allowed_nodes actually migrated. Related to Discussion #946
The sensor proxy self-heal script runs every 5 minutes and calls migrate-to-file.
Previously it would print 'Migration complete' every time, even when already in
file mode with nothing to migrate.

Now migrateInlineToFile returns a boolean indicating if migration actually
occurred, and the CLI only prints the message when work was done.
2025-12-29 14:15:57 +00:00
rcourtman
4ce1d551e4 fix: Deduplicate disks by device+total to fix Synology storage overcounting. Related to #953
Synology NAS creates multiple shared folders (e.g., /volume1/docker, /volume1/photos)
that are all mount points on the same underlying BTRFS volume. Each reported the same
16TB total, causing Pulse to show 64TB+ instead of 16TB.

The fix tracks device+total combinations and only counts each unique pair once.
When duplicates are found, the shallowest mountpoint (e.g., /volume1) is preferred.

Added a unit test to verify the deduplication works correctly.
2025-12-29 14:03:32 +00:00
rcourtman
fd1f94babf fix: AI Commands toggle now updates immediately in UI. Related to #952
Previously, toggling AI Commands in the Agents view would show a pending state
and wait for the agent to confirm the change (up to 2 minutes). If the agent
was slow to report or the WebSocket update was missed, the toggle would appear
stuck.

Now, UpdateHostAgentConfig also updates the Host model in state immediately,
providing instant UI feedback. The agent will still receive the config on its
next report, but users see the change right away.

Added SetHostCommandsEnabled function to models.State for this purpose.
2025-12-29 13:56:29 +00:00
rcourtman
053a40d7df fix: Docker container update detection showing false positives
Fixed an issue where all Docker containers were showing 'click to update'
even when they were up to date. The root cause was comparing the wrong
digest types:

- Previously: Compared ImageID (local config hash) vs registry manifest digest
- Now: Uses RepoDigests from image inspect, which is the actual manifest
  digest that Docker received from the registry when pulling the image

For multi-arch images, the registry returns a manifest list digest, while
Docker stores the platform-specific image config digest locally. These
will never match, causing false positives for all multi-arch images.

Changes:
- Added ImageInspectWithRaw to dockerClient interface
- Added getImageRepoDigest method to extract RepoDigest from image
- Added matchesImageReference helper for Docker Hub naming conventions
- Added tests for matchesImageReference

Fixes #955
2025-12-29 13:49:04 +00:00
rcourtman
a4611739a9 fix: Hosts page not updating in real-time (SolidJS reactivity bug)
Fixed a critical reactivity bug in HostsOverview.tsx where the HostRow
component was destructuring props.host in the function body. In SolidJS,
this breaks reactivity because the destructured value is a static snapshot
captured at component creation time.

Changes:
- Removed 'const { host } = props' destructuring in HostRow
- Changed all 'host.' references to 'props.host.' to maintain reactivity
- Converted cpuPercent and diskStats to reactive getters (functions)
- Added documentation comment explaining why destructuring breaks reactivity

This fixes Issue #949 where CPU, memory, and disk values on the Hosts
page would stay stale until manual page refresh.

Related to #949
2025-12-29 11:45:45 +00:00
rcourtman
46a7b7d10a docs: Add prominent VERSION file warning to GEMINI.md
- Add critical release section at top of file
- Make VERSION file requirement impossible to miss
- Explain that version_guard job will fail if VERSION doesn't match
2025-12-29 10:11:38 +00:00
rcourtman
44fa50eed7 feat(dockeragent): improve test coverage and refactor registry dependencies
- Add comprehensive test coverage for agent report, flush buffer, and deps
- Expand flow, HTTP, CPU, and swarm test coverage
- Refactor registry access to use deps interface for better testability
- Add container update and self-update test scenarios
2025-12-29 09:57:45 +00:00
rcourtman
545990e48f feat: Allow reverting dismissed AI alerts from suppression rules
- Add Undismiss() method to FindingsStore to revert dismissed findings
- Include all dismissed findings in GetSuppressionRules() (not just suppressed)
- Add DismissedReason field to SuppressionRule struct
- Update DeleteSuppressionRule to handle dismissed (non-suppressed) findings
- Update frontend to show dismissal type badges (Suppressed/Expected/Noted)
- Change 'Delete' button to 'Reactivate' for dismissed findings

Related to #950
2025-12-29 09:44:36 +00:00
rcourtman
32111c7837 feat: Add --report-ip flag for multi-NIC systems (issue #945)
Allows specifying which IP address the agent should report, useful for:
- Multi-homed systems with separate management networks
- Systems with private monitoring interfaces
- VPN/overlay network scenarios

Usage:
  pulse-agent --report-ip 192.168.1.100
  PULSE_REPORT_IP=192.168.1.100 pulse-agent
2025-12-29 09:28:28 +00:00
rcourtman
ae1c39960f fix: Remove duplicate AI chat response streaming (issue #947)
Content was being streamed twice:
1. During each iteration of the tool loop (intended for intermediate feedback)
2. Again after the loop ended with finalContent (redundant)

This caused duplicate responses when using Ollama and other providers.
2025-12-29 09:18:05 +00:00
rcourtman
2bf8e044df feat: Add Docker container update capability
- Add container update command handling to unified agent
- Agent can now receive update_container commands from Pulse server
- Pulls latest image, stops container, creates backup, starts new container
- Automatic rollback on failure
- Backup container cleaned up after 5 minutes
- Added comprehensive test coverage for container update logic
2025-12-29 09:00:40 +00:00
rcourtman
3040800e7b fix: AI Patrol now respects exact user-configured thresholds
BREAKING CHANGE: AI Patrol now uses EXACT alert thresholds by default
instead of warning 5-15% before the threshold.

Changes:
- Default behavior: Patrol warns at your configured threshold (e.g., 96% = warns at 96%)
- New setting: 'use_proactive_thresholds' enables the old early-warning behavior
- API: Added use_proactive_thresholds to GET/PUT /api/settings/ai
- Backend: Added SetProactiveMode/GetProactiveMode to PatrolService
- Backend: Added GetThresholds to PatrolService for UI display
- Tests: Updated and added tests for both exact and proactive modes
- Also fixed unused imports in dockeragent/agent.go

When proactive mode is disabled (default):
- Watch: threshold - 5% (slight buffer)
- Warning: exact threshold

When proactive mode is enabled:
- Watch: threshold - 15%
- Warning: threshold - 5%

Related to #951
2025-12-29 08:40:34 +00:00
rcourtman
6f794753ee fix: Add Public URL setting for email notifications
Docker deployments with custom port mappings would show incorrect URLs
in email alerts because the auto-detection couldn't determine the
external port.

Added a "Public URL" setting in Settings > Network that allows users
to configure the dashboard URL used in email notifications.

- Added publicURL field to SystemSettings (persistence.go)
- Load/save publicURL in system settings handler
- Apply publicURL to notification manager on change
- Added UI input in NetworkSettingsPanel
- Shows env override warning if PULSE_PUBLIC_URL is set

Related to #944
2025-12-28 16:08:22 +00:00
rcourtman
76990a65a7 fix: Preserve user's configured hostname when agent registers with IP
When a node was manually added with a hostname (e.g., pve.example.com)
and then the agent registered using its IP address, the code would
correctly deduplicate but incorrectly overwrite the user's configured
hostname with the agent's IP.

Now when matching by IP resolution (hostname resolves to agent's IP),
we preserve the user's original hostname configuration instead of
replacing it with the IP.

Related to #940
2025-12-28 15:44:40 +00:00
rcourtman
9f3367da36 fix: Include GuestURL in NodeFrontend for cluster node navigation
The GuestURL field was missing from NodeFrontend and its converter,
causing configured Guest URLs to be ignored when clicking on cluster
node names. The frontend would fall back to the auto-detected IP
instead of using the user-configured Guest URL.

Related to #940
2025-12-28 14:49:49 +00:00
rcourtman
9063695cba fix: Preserve alert acknowledgement across transient clears
When a powered-off VM is backed up by Proxmox, the VM status briefly
changes (e.g., to "running" during backup). This caused the powered-off
alert to be cleared, deleting the ackState record. When the backup
completed and the alert was recreated, it appeared as a new unacknowledged
alert, generating a new notification.

The fix preserves ackState when alerts are removed, allowing
preserveAlertState to restore the acknowledgement when the same alert
reappears. Stale ackState entries (for alerts that don't exist) are
cleaned up after 1 hour.

Related to #937
2025-12-28 10:24:04 +00:00
rcourtman
7f8c9e37b1 fix: Preserve webhook headers when toggling enable/disable
When GetWebhooks returns webhooks, headers and customFields are masked
with ***REDACTED*** for security. However, when the frontend toggled
a webhook's enabled state, it sent back the redacted values, which
overwrote the actual header values (like Authorization tokens).

This broke webhooks after disabling and re-enabling them, as the auth
headers were replaced with "***REDACTED***".

Now UpdateWebhook detects redacted values and preserves the original
headers/customFields from the existing webhook.

Related to #938
2025-12-28 10:19:32 +00:00
rcourtman
056f503516 test: Add comprehensive tests for update detection system
- Add registry checker tests (caching, enable/disable, parsing, concurrency)
- Add alert integration tests for update detection and Pro license gating
- Add API handler tests for /api/infra-updates endpoints
- Test cleanup of tracking maps when containers are removed
- Test threshold-based alerting behavior
2025-12-27 18:54:48 +00:00
rcourtman
d1a8383cd5 feat: Gate update alerts as Pro-only feature
- Add FeatureUpdateAlerts constant for Pro license gating
- Add feature to all Pro tier feature lists
- Add SetLicenseChecker method to alerts Manager
- Check Pro license in checkDockerContainerImageUpdate before alerting
- Wire license checker from router to alert manager

Free users still see update badges in the UI.
Pro users get proactive alerts after 24h of pending updates.
2025-12-27 18:28:09 +00:00
rcourtman
cf44b0cca6 polish: Improve update detection edge cases and UX
- Add GHCR (GitHub Container Registry) token support for public images
- Clean up dockerUpdateFirstSeen tracking when containers are removed
- Improve UpdateIcon tooltip to show digest info
- Add cursor-help to indicate hoverable tooltip
2025-12-27 18:14:27 +00:00
rcourtman
5148040ac4 feat: Wire up /api/infra-updates endpoints for infrastructure update detection
- Add routes for infrastructure update detection API:
  - GET /api/infra-updates - list all container updates with filtering
  - GET /api/infra-updates/summary - aggregated stats per host
  - GET /api/infra-updates/host/{hostId} - updates for specific host
  - GET /api/infra-updates/{resourceId} - specific resource update status
  - POST /api/infra-updates/check - trigger update check (placeholder)

- Update handlers to query Docker container updates from monitor state
- Protected by auth and monitoring_read scope
2025-12-27 18:07:10 +00:00
rcourtman
b50872b686 feat: Implement unified update detection system (Phase 1)
Docker container image update detection with full stack implementation:

Backend:
- Add internal/updatedetection package with types, store, registry checker, manager
- Add registry checking to Docker agent (internal/dockeragent/registry.go)
- Add ImageDigest and UpdateStatus fields to container reports
- Add /api/infra-updates API endpoints for querying updates
- Integrate with alert system - fires after 24h of pending updates

Frontend:
- Add UpdateBadge and UpdateIcon components for update indicators
- Add updateStatus to DockerContainer TypeScript interface
- Display blue update badges in Docker unified table image column
- Add 'has:update' search filter support

Features:
- Registry digest comparison for Docker Hub, GHCR, private registries
- Auth token handling for Docker Hub public images
- Caching with 6h TTL (15min for errors)
- Configurable alert delay via UpdateAlertDelayHours (default: 24h)
- Alert metadata includes digests, pending time, image info
2025-12-27 17:58:38 +00:00
rcourtman
39941a3927 fix(agent): use IP that can reach Pulse for registration
When a Proxmox host has multiple network interfaces (management, Ceph,
cluster ring), the agent would use heuristic scoring to pick an IP,
which could select an isolated network instead of the management network.

Now the agent first determines which local IP is actually used to connect
to the Pulse server, ensuring registration uses a reachable IP. Falls back
to the heuristic scoring if connection-based detection fails.

Related to #929
2025-12-27 17:06:20 +00:00
rcourtman
eff4adda49 fix: deduplicate Ceph clusters by FSID before sending to frontend
When the same Ceph cluster is reported from multiple sources (PVE API
and host agent), it showed up twice in the UI. Now we deduplicate by
FSID before converting to frontend format, keeping the cluster entry
with the most complete data (most monitors/managers/pools reported).

Related to #928
2025-12-27 17:03:17 +00:00
rcourtman
81718fcdaa fix(agent): use specific distro name instead of family for osName
Ubuntu was showing as "debian 24.04" because we used PlatformFamily
(which is "debian" for all Debian derivatives) instead of Platform
(which is "ubuntu" for Ubuntu).

Now uses Platform first, falling back to PlatformFamily only if empty.

Related to #927
2025-12-27 15:59:03 +00:00
rcourtman
1dff90817f fix: detect duplicate nodes by IP resolution during agent auto-register. Related to #924
When an agent registers using an IP address, check if any existing node's
hostname resolves to that same IP. This prevents duplicates when a node
was manually configured via hostname and later the agent is installed
which registers using the host's IP.

Changes:
- Add extractHostIP() to extract IP from URL if present
- Add resolveHostnameToIP() with 2s timeout for DNS resolution
- During agent auto-registration, check if existing hostname-based
  configs resolve to the new IP and update instead of creating duplicates
- Add test for extractHostIP helper function
2025-12-27 11:02:00 +00:00
rcourtman
861be84f8c fix(agent): improve backward compat for PBS-only hosts. Related to #925
The legacy state file could represent either PVE or PBS registration,
depending on what was installed at the time. Now we check what's
currently installed to determine the correct behavior:
- If PVE is installed: legacy file means PVE was registered
- If PBS-only (no PVE): legacy file means PBS was registered
2025-12-27 10:46:51 +00:00
rcourtman
0865ca3512 feat(agent): detect and register both PVE and PBS on same host. Related to #925
When PBS is installed directly on a PVE host (an officially supported
configuration), the agent now detects and registers BOTH products instead
of only detecting PVE.

Changes:
- Add detectProxmoxTypes() to detect all Proxmox products on a host
- Add RunAll() method to register each detected product separately
- Use per-type state files (proxmox-pve-registered, proxmox-pbs-registered)
  to track registration status for each product independently
- Maintain backward compatibility with legacy single state file
- Add tests for new state file path logic
2025-12-27 10:41:44 +00:00
rcourtman
b27b76ae46 feat: implement agent self-unregistration and UI improvements
- Add DELETE /api/agents/unregister endpoint for agent self-unregistration
- Agent now unregisters itself from Pulse server when uninstalled
- Add clarifying note in UnifiedAgents explaining linked agents behavior
- Linked agents are managed via their PVE node but this is now explained in UI
- Add LastSeen field to HostAgent model for better agent status tracking
2025-12-26 23:20:55 +00:00
rcourtman
8c440b6f54 feat: notify server during agent uninstallation
- Add /api/agents/host/uninstall endpoint for agent self-unregistration
- Update install.sh to notify server during --uninstall (reads agent ID from disk)
- Update install.ps1 with same logic for Windows
- Update frontend uninstall command to include URL/token flags

This ensures that when an agent is uninstalled, the host record is
immediately removed from Pulse and any linked PVE nodes have their
+Agent badge cleared.
2025-12-26 22:38:46 +00:00
rcourtman
22d6c1d8a5 fix: Redirect to GitHub releases for agent binary when not available locally
When the unified agent binary isn't found locally (happens on LXC/barebone
installations that update via web UI which only updates the pulse binary),
redirect to GitHub releases using HTTP 307.

This complements the install.sh GitHub proxy fallback from 7b6613bb.

Related to #909
2025-12-26 20:16:15 +00:00
rcourtman
80cc9b30a1 fix: Add GitHub fallback for install scripts on LXC/barebone updates
When install.sh or install.ps1 don't exist locally (happens on LXC/barebone
installations that were updated via web UI which only updates the binary),
fallback to fetching from GitHub raw content.

Related to #909
2025-12-26 19:49:38 +00:00
rcourtman
4a7306f6b8 fix: Auto-clear stale LinkedHostAgentID references during node updates
When nodes are updated, now validates that LinkedHostAgentID points to
an existing host agent. References to deleted host agents are automatically
cleared, fixing the 'Agent' tag persistence for users who removed agent
entries before commit c394d24.

Related to #920
2025-12-26 19:45:31 +00:00
rcourtman
cf577e715f fix: Clear node host agent link when agent is removed
When a host agent is deleted via the UI, the LinkedHostAgentID on any
PVE nodes that were linked to it was not being cleared. This caused
the "Agent" tag to persist in the UI after uninstalling the agent.

Related to #920
2025-12-26 17:52:32 +00:00
rcourtman
7f5ea636db fix: Skip webhook re-notifications for acknowledged alerts
Acknowledged alerts were still triggering repeated webhook notifications
because the re-notification logic only checked cooldown period, not
acknowledgment status. Now acknowledged alerts are skipped entirely.

Related to #921
2025-12-26 17:47:28 +00:00
rcourtman
4277aa753c feat(pbs): turnkey PBS setup with password auth
When adding a PBS node with username/password credentials, Pulse now
automatically:
1. Connects to PBS using the provided credentials
2. Creates a 'pulse-monitor@pbs' user with Audit permissions
3. Generates an API token
4. Stores the token instead of the password

This enables one-click PBS setup for Docker/containerized deployments
where you can't easily run the agent installer. Simply enter root@pam
credentials in the UI and Pulse handles the rest.

Falls back to password auth if token creation fails (e.g., old PBS
version or permission issues).
2025-12-26 10:12:04 +00:00
rcourtman
3d671c1824 feat(pbs): add API-based token creation for turnkey PBS setup
- Added PBS client methods: CreateUser, SetUserACL, CreateUserToken
- Added SetupMonitoringAccess() turnkey method that creates user + token
- Updated handleSecureAutoRegister to use PBS API for token creation
- Enables one-click PBS setup for Docker/containerized deployments

When users provide PBS root credentials, Pulse can now create the
monitoring user and API token remotely via the PBS API, eliminating
the need to SSH/exec into the container manually.
2025-12-26 10:08:41 +00:00
rcourtman
86e41effc0 feat: Display environment variables for Docker containers
- Add Env field to Container struct in pkg/agents/docker/report.go
- Extract env vars from inspect.Config.Env in Docker agent
- Mask sensitive values (password, secret, key, token, etc.) with ***
- Display env vars in container drawer with green badges (amber for masked)
- Add tests for maskSensitiveEnvVars function

Related to #916
2025-12-25 23:52:57 +00:00
rcourtman
fe3b4ed5b6 fix: require Pro license for auto-fix and autonomous mode
- patrol.go: Auto-fix now requires both config flag AND ai_autofix license
- service.go: IsAutonomous() checks for ai_autofix license before enabling
- ai_handlers.go: API returns 402 if enabling auto-fix/autonomous without license
2025-12-25 21:26:46 +00:00
rcourtman
08c04b78ae feat: add power consumption monitoring (Intel RAPL + AMD Energy)
- Add power.go with Intel RAPL and AMD energy driver support
- Read CPU package, core, and DRAM power consumption in watts
- Sample energy counters over 100ms interval to calculate power
- Add PowerWatts field to Sensors struct for API reporting
- Integrate power collection into host agent sensor gathering
- Add comprehensive tests for power collection module

Supports Intel CPUs (Sandy Bridge+) via RAPL and AMD Ryzen/EPYC
via the amd_energy kernel module.

Closes community-scripts/ProxmoxVE#9575
2025-12-25 21:14:12 +00:00
rcourtman
7dd6c0d57a feat: Collect and display all lm-sensors data (fans, DDR5, etc.)
Extended lm-sensors parsing to capture all sensor readings:
- Fan speeds (RPM) from SuperIO chips like NCT6687
- Additional temperatures (DDR5/RAM, motherboard, etc.)
- All sensors not already captured as CPU/NVMe/GPU

Updated frontend tooltip to display fans and additional sensors
in separate sections with formatted names.

Closes discussion #911
2025-12-25 19:08:03 +00:00
rcourtman
c1422882bd feat: Add disk exclusion filter for host agent. Closes #896
Users can now exclude specific mount points from disk monitoring:
- Via CLI: --disk-exclude /mnt/backup --disk-exclude '/media/*'
- Via env: PULSE_DISK_EXCLUDE=/mnt/backup,*pbs*

Patterns support:
- Exact paths: /mnt/backup
- Prefix patterns: /mnt/ext*
- Contains patterns: *pbs*

This addresses the common case where external disks or
PBS datastores are being monitored but shouldn't be.
2025-12-25 12:04:40 +00:00
rcourtman
05aeab43ad test: Add unit tests for smartctl package
- Test disk type detection (NVMe, SAS, SATA)
- Test WWN formatting
- Test struct creation and JSON parsing
- Test NVMe temperature fallback location
2025-12-25 11:55:37 +00:00
rcourtman
8f9d5c1120 feat: Agent collects S.M.A.R.T. disk data via smartctl. Related to #907
- Add smartctl package to collect disk temperature and health data
- Add SMART field to agent Sensors struct
- Host agent now runs smartctl to collect disk temps when available
- Backend processes agent SMART data for temperature display
- Graceful fallback when smartctl not installed
2025-12-25 11:37:53 +00:00
rcourtman
800fab10c2 fix: Use LinkedNodeID for temperature matching to fix duplicate hostname bug
When two Proxmox nodes have the same hostname (e.g., 'px1' on different IPs),
the getHostAgentTemperature function was matching by hostname alone, causing
both nodes to show temperature from whichever host agent appeared first.

The fix:
- Added getHostAgentTemperatureByID that first tries matching by LinkedNodeID
  (the unique node ID) before falling back to hostname matching
- Updated the caller to pass modelNode.ID for precise matching
- Maintains backwards compatibility for setups where linking hasn't occurred

Related to #891
2025-12-25 10:00:19 +00:00
rcourtman
9a9c50f8b1 fix: Properly close command client WebSocket when disabling remotely
When the server disables command execution for an agent, we now properly
call Close() on the command client to tear down the WebSocket connection.
Previously we just set the pointer to nil which left the goroutine running
with an orphaned connection.
2025-12-25 08:09:42 +00:00