Commit Graph

2231 Commits

Author SHA1 Message Date
rcourtman
3029cce172 fix(patrol): address multiple issues in patrol service
- Add missing KubernetesChecked field to persistence (data was being lost)
- Fix Duration field to properly convert between ms and nanoseconds
- Add automatic cleanup of stale stream subscribers (memory leak fix)
- Add error tracking for findings persistence with callback support
- Add GetPersistenceStatus() and SetOnSaveError() methods
- Add tests for new error tracking functionality
2026-01-02 12:45:00 +00:00
rcourtman
3e6ebd593c fix(alerts): resolve mapping and formatting issues for disk temperature thresholds (#1013) 2026-01-02 11:27:48 +00:00
rcourtman
773376fa5d docs: add deep dive summaries for notifications, discovery, and agent exec 2026-01-02 11:18:28 +00:00
rcourtman
d71754743c docs: Add PULSE_DISABLE_DOCKER_UPDATE_ACTIONS documentation
- Add to DOCKER.md configuration table and new 'Disabling Update Features' section
- Add to CONFIGURATION.md monitoring overrides table
- Clarify difference between disabling update detection vs hiding buttons
2026-01-02 10:35:04 +00:00
rcourtman
60220ee161 feat: Add server-wide control to disable Docker update actions
Implements PULSE_DISABLE_DOCKER_UPDATE_ACTIONS environment variable and
Settings UI toggle to hide Docker container update buttons while still
allowing update detection. This addresses requests for a 'read-only' mode
in production environments.

Backend:
- Add DisableDockerUpdateActions to SystemSettings and Config structs
- Add environment variable parsing with EnvOverrides tracking
- Expose setting in GET/POST /api/config/system endpoints
- Block update API with 403 when disabled (defense-in-depth)

Frontend:
- Add disableDockerUpdateActions to SystemConfig type
- Create systemSettings store for reactive access to server config
- Add Docker Settings card in Settings → Agents tab with toggle
- Show env lock badge when set via environment variable

UpdateButton improvements:
- Properly handle loading state (disabled + visual indicator)
- Use Solid.js Show components for proper reactivity
- Show read-only UpdateBadge when updates disabled
- Show interactive button when updates enabled

Closes discussion #982
2026-01-02 10:29:43 +00:00
rcourtman
0751e3ca94 Auto-update Helm chart version to 5.0.9 helm-chart-5.0.9 2026-01-02 00:55:10 +00:00
rcourtman
06cd8c415f Auto-update Helm chart documentation 2026-01-02 00:55:10 +00:00
rcourtman
c654f1486d fix: Docker agent token conflict on reconnect. Related to #1008 v5.0.9 2026-01-02 00:03:23 +00:00
rcourtman
6bb272d3dc fix: Ensure Env Var takes precedence over system settings for HideLocalLogin. Related to #857 2026-01-01 23:36:18 +00:00
rcourtman
1feff00cc5 chore: Bump version to 5.0.9. Related to #1009 2026-01-01 23:27:15 +00:00
rcourtman
4ed03f23c2 fix: use Instance field for backup/snapshot state sync instead of ID prefix
This resolves issues where snapshots/backups persist after deletion if the
Instance field didn't match the ID prefix (due to case changes, name changes, etc).

Now consistent with how VMs, Containers, Storage, etc. are filtered.

Also adds Instance field to BackupTask model for completeness.

Addresses #1009 (refs #991)
2026-01-01 23:22:38 +00:00
rcourtman
661645585a fix: cleanup completed docker commands to prevent re-execution. Address #1010 2026-01-01 23:14:54 +00:00
rcourtman
df1ff42280 fix: Add backup freshness thresholds to UI. Related to #839 2026-01-01 23:06:34 +00:00
rcourtman
83935fa871 feat(ai): enhance AI Patrol with baseline anomaly detection and correlation learning
This update integrates learned baselines into the heuristic analysis to detect abnormal behavior and records significant events (migrations, restarts, spikes) for correlation analysis. Also fixed syntax errors in Ollama integration tests.
2026-01-01 23:00:43 +00:00
rcourtman
8bab7c83ad feat(ai): enhance AI Patrol with baseline anomaly detection and correlation learning
This update integrates learned baselines into the heuristic analysis to detect abnormal behavior and records significant events (migrations, restarts, spikes) for correlation analysis.
2026-01-01 23:00:18 +00:00
rcourtman
002cf36ee0 fix(patrol): use title as fallback for finding key in LLM findings 2026-01-01 22:49:04 +00:00
rcourtman
b225d22395 fix(patrol): use normalizedKey in generateFindingID for stable finding IDs 2026-01-01 22:46:28 +00:00
rcourtman
3fdf753a5b Enhance devcontainer and CI workflows
- Add persistent volume mounts for Go/npm caches (faster rebuilds)
- Add shell config with helpful aliases and custom prompt
- Add comprehensive devcontainer documentation
- Add pre-commit hooks for Go formatting and linting
- Use go-version-file in CI workflows instead of hardcoded versions
- Simplify docker compose commands with --wait flag
- Add gitignore entries for devcontainer auth files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 22:29:15 +00:00
rcourtman
cb99673b7c Improve devcontainer configuration
- Simplify Dockerfile: use golang:1.24 base, install Node via features
- Add proper port forwarding for Pulse (7655 frontend, 7656 API)
- Add Vue Volar extension for frontend development
- Add start-pulse-dev.sh helper script for auto-starting dev server
- Add FRONTEND_DEV_HOST to containerEnv for proper binding
- Add .env.devcontainer to .gitignore (local override file)
- Update frontend dependencies

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 19:42:01 +00:00
rcourtman
3ea837c727 Add devcontainer configuration 2026-01-01 16:52:05 +00:00
rcourtman
9abe9c47a2 feat(alerts): add disk temperature alerts for host agents
- Add DiskTemperature threshold to ThresholdConfig (default: 55°C trigger, 50°C clear)
- Process host SMART sensor data in CheckHost to generate disk_temperature alerts
- Add 'Disk Temp °C' column to Host Agents thresholds table in UI
- Make temperature tooltip interactive and scrollable to fix overflow issues
- Update AlertThresholds type to include diskTemperature field

Closes: #941
2026-01-01 16:31:34 +00:00
rcourtman
5b7a68bcc0 fix: Add VERSION build arg to all Docker builds in CI workflows 2026-01-01 16:14:56 +00:00
rcourtman
034f086d9d fix: Ensure correct version injection in Docker builds (Related to #1005) 2026-01-01 16:11:47 +00:00
rcourtman
ee45323312 feat: Allow configuring physical disk polling interval in UI (Related to #1007) 2026-01-01 16:00:28 +00:00
rcourtman
7d12c4a23b fix: Host view temperature color now respects configured thresholds. Related to #984 2026-01-01 15:32:42 +00:00
rcourtman
a4c3295c1a fix: Ensure AI commands toggle remains stable vs agent reports. Related to #952 2026-01-01 15:30:27 +00:00
rcourtman
926c8e0ba5 fix: Improve OIDC GET login error handling with proper redirects
On errors, redirect back to login page with error params instead of
showing plain text error pages. This ensures users see friendly error
messages in the UI.

Related to #1006
2026-01-01 14:53:00 +00:00
rcourtman
491f6d13c7 fix: OIDC login now uses server-side redirect for same-window navigation
Changed OIDC login flow from fetch+JavaScript redirect to direct GET
navigation with server-side HTTP redirect. This guarantees same-window
navigation in all browsers, including Arc which was opening new windows
for JavaScript-driven navigations.

Backend: /api/oidc/login now supports both GET (redirect) and POST (JSON)
Frontend: Simplified to use window.location.href to GET endpoint

Related to #1006
2026-01-01 14:45:23 +00:00
rcourtman
3b201b4a88 fix: Data race in pollGuestSnapshots accessing state without proper lock
pollGuestSnapshots was reading m.state.VMs and m.state.Containers while
only holding the Monitor's mutex (m.mu), not the State's internal mutex.
This caused a data race where VMs/containers could be modified by another
goroutine while being read, leading to stale or missing snapshot data.

Symptoms: Deleted snapshots persisting in UI, new snapshots not appearing,
only fixable by service restart.

Fix: Use GetSnapshot() which properly acquires State's mutex and returns
a consistent copy of the data.

Related to #991
2026-01-01 14:39:10 +00:00
rcourtman
18f0db89a9 legal: set governing law to England and Wales 2026-01-01 12:19:19 +00:00
rcourtman
f130d6fc60 legal: add terms of service and safety disclaimers for AI features 2026-01-01 12:15:58 +00:00
rcourtman
6e34c80c58 Auto-update Helm chart version to 5.0.8 helm-chart-5.0.8 2026-01-01 11:12:05 +00:00
rcourtman
a569aeb1d4 Auto-update Helm chart documentation 2026-01-01 11:12:05 +00:00
rcourtman
0c87357fe4 Prepare v5.0.8 release v5.0.8 2026-01-01 10:27:42 +00:00
rcourtman
9a4ab102e5 fix: Handle 'in_progress' status in command acknowledgements. Related to #988 2026-01-01 10:17:50 +00:00
rcourtman
94717ba867 feat(agent): add --docker-runtime flag for podman/docker selection
On systems where Docker compatibility layer obscures Podman (like CoreOS),
the auto-detection can fail. Users can now force the runtime:

  --docker-runtime podman
  PULSE_DOCKER_RUNTIME=podman

Valid values: auto (default), docker, podman

Related to Discussion #958
2026-01-01 00:24:37 +00:00
rcourtman
e3b3785582 feat(agent): add option to disable Docker update checks
Add PULSE_DISABLE_DOCKER_UPDATE_CHECKS environment variable and
--disable-docker-update-checks flag to disable Docker image update
detection. This is useful for:
- Avoiding Docker Hub rate limits
- Users who don't want update notifications in their dashboard

Related to Discussion #982
2026-01-01 00:20:49 +00:00
rcourtman
567a4ad147 fix(replication): fetch status from per-node endpoint
The /cluster/replication endpoint only returns job configuration (guest,
schedule, source, target), not status data (last_sync, next_sync,
duration, fail_count, state).

This fix enriches each replication job with status from the per-node
endpoint /nodes/{node}/replication/{id}/status to get timing and state
data needed for proper UI display.

Added integration tests to verify:
- Status endpoint is called and data is merged correctly
- Graceful handling when status endpoint fails

Fixes #992
2025-12-31 23:58:06 +00:00
rcourtman
724362504e fix: Add SELinux context restoration for Fedora/RHEL systems. Related to #996
On SELinux-enforcing systems (Fedora, RHEL, CentOS), binaries installed to
non-standard locations need proper security contexts for systemd to execute
them. Without this, systemd fails with 'Permission denied' even when the
binary has correct Unix permissions.

Changes:
- Add restore_selinux_contexts() function to both install scripts
- Uses restorecon (preferred) or chcon (fallback) to set bin_t context
- Only runs when SELinux is detected and enforcing
- Called after binary installation, before systemd service start
2025-12-31 23:12:53 +00:00
rcourtman
c1f4b8f40b feat: PULSE_DISK_EXCLUDE now applies to SMART monitoring. Related to #983
Previously, the PULSE_DISK_EXCLUDE environment variable and --disk-exclude
flag only filtered mount points in the hostmetrics collector. This change
extends the exclusion to SMART data collection.

Changes:
- Updated smartctl.CollectLocal() to accept diskExclude patterns
- Added matchesDeviceExclude() for block device pattern matching
- Patterns support: exact match (sda), prefix (nvme*), contains (*cache*)
- Updated hostagent to pass DiskExclude to SMART collector
- Added comprehensive tests for pattern matching
- Updated documentation
2025-12-31 23:07:01 +00:00
rcourtman
3a7e26f42f fix: Temperature text color now respects configured thresholds. Related to #984
Previously, the TemperatureGauge component used hardcoded thresholds
(critical: 80°C, warning: 70°C) for text coloring. Now it uses the
user-configured temperature threshold from alert settings.

Changes:
- Add getTemperatureThreshold() helper to alertsActivation store
- Pass critical/warning props to TemperatureGauge in NodeSummaryTable
- Warning is set to (threshold - 5°C) matching the hysteresis pattern
2025-12-31 23:00:36 +00:00
rcourtman
d804471889 fix: Unified agent Docker module now uses same agent ID as host module
When running as a unified agent (pulse-agent with --enable-docker), the
Docker module was using a different fallback chain for agent ID than the
host module. In unified mode with empty machineID, the Docker module fell
back to daemonID while the host module fell back to hostname. This caused
the server to reject Docker reports with 'token already in use by agent'
errors because the same API token was bound to different agent IDs.

The fix ensures that in unified mode, the Docker module uses the exact
same fallback chain as the host module: machineID -> hostname. The daemonID
fallback is only used in standalone mode for backward compatibility.

Fixes #985, #986
2025-12-31 10:35:00 +00:00
rcourtman
652854af00 fix: Reduce Docker image size by avoiding duplicate binary copies
The runtime stage was copying both amd64 and arm64 pulse binaries to /tmp/,
then selecting one based on TARGETARCH and deleting the rest. Due to Docker's
immutable layers, the deleted binaries were still counted toward image size.

Changed to copy directly using TARGETARCH variable substitution, which only
copies the needed binary for the target architecture.

This saves ~34MB per architecture in the final image.

Note: The agent_runtime stage and /opt/pulse/bin/ download binaries still have
room for optimization, but require more complex changes.

Related to #981
2025-12-31 10:29:38 +00:00
rcourtman
3796408f04 fix: Preserve alert acknowledgement for long-standing alerts during backup
When a powered-off VM is backed up by Proxmox, the alert briefly disappears
as the VM status changes. The previous fix (3830e701) preserved ackState when
alerts were removed, but the cleanup TTL was measured from the acknowledgement
time. For alerts acknowledged > 1 hour ago (common for intentionally powered-off
VMs), the ackState was immediately considered stale and deleted when cleanup ran.

The fix adds an inactiveAt timestamp to track when an alert was removed, and
uses this time for the cleanup TTL instead of the acknowledgement time. This
ensures acknowledgement state is preserved for at least 1 hour after the alert
disappears, regardless of when it was originally acknowledged.

Related to #980
2025-12-31 09:49:11 +00:00
rcourtman
efe3fca534 Auto-update Helm chart version to 5.0.7 helm-chart-5.0.7 2025-12-31 00:27:48 +00:00
rcourtman
c46887f89b Auto-update Helm chart documentation 2025-12-31 00:27:47 +00:00
rcourtman
2b68bfe6ee fix: Update tests for RAID alerting md0/md1 skip and AI gating message change
- RAID tests now use /dev/md2 since md0/md1 are skipped for Synology compatibility
- AI handler tests now expect 'AI is not enabled' message after AI gating change
v5.0.7
2025-12-30 23:39:55 +00:00
rcourtman
336714c610 chore: Bump version to 5.0.7 2025-12-30 23:26:59 +00:00
rcourtman
dc65b96e6d fix: Improve light theme contrast for low I/O values. Related to #976
Changed text-gray-300 to text-gray-500 for I/O values under 1 MB/s in light mode.
The previous color was barely visible against the white background.
2025-12-30 22:35:39 +00:00
rcourtman
ed6c3d9c93 fix: Prevent acknowledged alerts from retriggering notifications. Related to #975
dispatchAlert() now checks if an alert is already acknowledged before sending
notifications. Previously, acknowledged alerts (especially backup-age alerts)
would continue to dispatch notifications every poll cycle because the
acknowledgement check was missing from the dispatch path.

The fix adds an early return in dispatchAlert() when alert.Acknowledged is true,
matching the existing checks for flapping, activation state, and quiet hours.
2025-12-30 22:20:47 +00:00