Commit Graph

171 Commits

Author SHA1 Message Date
rcourtman
7049f5b43c refactor: simplify temperature monitoring after sensor proxy removal
Remove proxy-related temperature code paths:
- temperature.go: remove proxy client integration and fallback logic
- config.go: remove SensorProxyEnabled and related config fields
- monitor.go: remove proxy client initialization and state

Temperature monitoring now relies solely on the unified agent approach.
2026-01-21 12:00:28 +00:00
rcourtman
d4a6c0d2e8 refactor: remove legacy pulse-sensor-proxy temperature monitoring
The sensor proxy approach for temperature monitoring has been superseded
by the unified agent architecture where host agents report temperature
data directly. This removes:

- cmd/pulse-sensor-proxy/ - standalone proxy daemon
- internal/tempproxy/ - client library
- internal/api/*temperature_proxy* - API handlers and tests
- internal/api/sensor_proxy_gate* - feature gate
- internal/monitoring/*proxy_test* - proxy-specific tests
- scripts/*sensor-proxy* - installation and management scripts
- security/apparmor/, security/seccomp/ - proxy security profiles

Temperature monitoring remains available via the unified agent approach.
2026-01-21 11:59:04 +00:00
rcourtman
9b8a79df93 fix(test): add graceful shutdown wait to TestRunServer_WebSocket 2026-01-20 18:20:37 +00:00
rcourtman
ee63d438cc docs: standardize markdown syntax and remove deprecated sensor-proxy docs 2026-01-20 09:43:49 +00:00
rcourtman
035436ad6e fix: add mutex to prevent concurrent map writes in Docker agent CPU tracking
The agent was crashing with 'fatal error: concurrent map writes' when
handleCheckUpdatesCommand spawned a goroutine that called collectOnce
concurrently with the main collection loop. Both code paths access
a.prevContainerCPU without synchronization.

Added a.cpuMu mutex to protect all accesses to prevContainerCPU in:
- pruneStaleCPUSamples()
- collectContainer() delete operation
- calculateContainerCPUPercent()

Related to #1063
2026-01-15 21:10:55 +00:00
rcourtman
8c7581d32c feat(profiles): add AI-assisted profile suggestions
Add ability for users to describe what kind of agent profile they need
in natural language, and have AI generate a suggestion with name,
description, config values, and rationale.

- Add ProfileSuggestionHandler with schema-aware prompting
- Add SuggestProfileModal component with example prompts
- Update AgentProfilesPanel with suggest button and description field
- Streamline ValidConfigKeys to only agent-supported settings
- Update profile validation tests for simplified schema
2026-01-15 13:24:18 +00:00
rcourtman
9b49d3171d feat(pbs): add datastore exclusion to reduce PBS log noise
Users with removable/unmounted datastores (e.g., external HDDs for
offline backup) experienced excessive PBS log entries because Pulse
was querying all datastores including unavailable ones.

Added `excludeDatastores` field to PBS node configuration that accepts
patterns to exclude specific datastores from monitoring:
- Exact names: "exthdd1500gb"
- Prefix patterns: "ext*"
- Suffix patterns: "*hdd"
- Contains patterns: "*removable*"

Pattern matching is case-insensitive.

Fixes #1105
2026-01-14 12:26:18 +00:00
rcourtman
3e2824a7ff feat: remove Enterprise badges, simplify Pro upgrade prompts
- Replace barrel import in AuditLogPanel.tsx to fix ad-blocker crash
- Remove all Enterprise/Pro badges from nav and feature headers
- Simplify upgrade CTAs to clean 'Upgrade to Pro' links
- Update docs: PULSE_PRO.md, API.md, README.md, SECURITY.md
- Align terminology: single Pro tier, no separate Enterprise tier

Also includes prior refactoring:
- Move auth package to pkg/auth for enterprise reuse
- Export server functions for testability
- Stabilize CLI tests
2026-01-09 16:51:08 +00:00
rcourtman
5c4399d69f feat(agent): add DisableCeph toggle, report_ip remote config, and improved IP detection (#929) 2026-01-09 14:45:29 +00:00
rcourtman
7db6b3e47d feat: Add AI chat session sync across devices
Implements server-side persistence for AI chat sessions, allowing users
to continue conversations across devices and browser sessions. Related
to #1059.

Backend:
- Add chat session CRUD API endpoints (GET/PUT/DELETE)
- Add persistence layer with per-user session storage
- Support session cleanup for old sessions (90 days)
- Multi-user support via auth context

Frontend:
- Rewrite aiChat store with server sync (debounced)
- Add session management UI (new conversation, switch, delete)
- Local storage as fallback/cache
- Initialize sync on app startup when AI is enabled
2026-01-08 10:47:45 +00:00
rcourtman
3f0808e9f9 docs: comprehensive core and Pro documentation overhaul
- Major updates to README.md and docs/README.md for Pulse v5
- Added technical deep-dives for Pulse Pro (docs/PULSE_PRO.md) and AI Patrol (docs/AI.md)
- Updated Prometheus metrics documentation and Helm schema for metrics separation
- Refreshed security, installation, and deployment documentation for unified agent models
- Cleaned up legacy summary files
2026-01-07 17:38:27 +00:00
rcourtman
dcdbee3c5c feat: Add in-app help system with HelpIcon component
Add contextual help icons throughout the UI to improve feature
discoverability. Users can click (?) icons to see explanations
with examples for settings they might not understand.

- HelpIcon component with click-to-open popover
- Centralized help content registry in /content/help/
- FeatureTip component for dismissible contextual tips
- Help added to: alert delay, AI endpoints, update channel
2026-01-07 09:22:23 +00:00
rcourtman
3b70e29b87 test: add PULSE_DATA_DIR to TestMainCmd
TestMainCmd was missing PULSE_DATA_DIR setup, causing it to try to
access /etc/pulse which fails in CI.
2026-01-04 19:15:38 +00:00
rcourtman
21a819f6dc test: use t.Setenv for safer test cleanup
t.Setenv ensures environment variables are restored after test
completion, preventing race conditions where background goroutines
(like config watchers) might access unset env vars during cleanup.
2026-01-04 19:08:45 +00:00
rcourtman
fdba559167 test: skip tests requiring /etc/pulse in CI
Tests that use the default /etc/pulse data directory fail in CI
where the directory doesn't exist and can't be created.
2026-01-04 18:59:48 +00:00
rcourtman
37f5e12dc2 test: add encryption keys to remaining cmd/pulse tests
TestConfigImportCmd and TestConfigAutoImportCmd need encryption keys
in CI where /etc/pulse/.encryption.key doesn't exist.
2026-01-04 18:43:40 +00:00
rcourtman
821783eef7 test: fix tests that create .enc files without encryption keys
Tests were failing in CI because they created nodes.enc files without
valid encryption keys, triggering the crypto safety check.

Added createTestEncryptionKey helper and fixed:
- TestLoad_MockEnv (config_load_test.go)
- Multiple tests in commands_test.go that create nodes.enc
2026-01-04 18:15:08 +00:00
rcourtman
7a1e3e9b4e Improve test coverage for cmd/pulse-sensor-proxy 2026-01-04 16:10:34 +00:00
rcourtman
f77025fb2f test: fix flaky tests with nonexistent path assertions
Tests using /nonexistent/... paths fail in sandboxed environments
where they return 'permission denied' instead of 'not exists'.
Use /tmp/... paths instead which reliably return 'not exists'.
2026-01-04 15:38:30 +00:00
rcourtman
45d4d68127 fix: Add debug logging and response format handling for replication status
- Add comprehensive debug logging to diagnose replication status fetch failures
- Handle both array and single-object response formats from Proxmox API
- Log raw response body for easier debugging
- Log success/failure for each enrichment step

This helps diagnose issue #992 where replication last/next sync times aren't
showing. The logging will reveal if the API call is failing, returning empty
data, or returning data in an unexpected format.

Related to #992
2026-01-04 15:01:32 +00:00
rcourtman
43b5fad12c fix: Add main host URL as fallback for remote cluster access
When a Proxmox cluster is discovered, Pulse now includes the user-provided
main host URL as a fallback endpoint. This handles scenarios where Proxmox
reports internal IPs that aren't reachable from Pulse's network (e.g.,
monitoring a remote cluster across different networks).

Previously, if all cluster endpoint IPs were unreachable, the connection
would fail with no fallback. Now the ClusterClient will fall back to the
main host URL, allowing Proxmox to route API calls internally.

Related to #1028
2026-01-04 14:54:03 +00:00
rcourtman
5d4e911298 feat: improve test coverage for pulse-sensor-proxy 2026-01-03 21:42:19 +00:00
rcourtman
22e1cc5613 test(agent): achieve 95% coverage for pulse-agent 2026-01-03 20:52:42 +00:00
rcourtman
fa43628cde fix: Alert acknowledge/unacknowledge fails with reverse proxies
Reverse proxies (Traefik, Caddy, nginx) often normalize or reject URLs
containing %2F (encoded slash). Alert IDs contain forward slashes
(e.g., "docker-container-state-docker:abc/def"), causing acknowledge
requests to fail with 400 errors when going through a reverse proxy.

Added new body-based endpoints that accept alert ID in JSON body:
- POST /api/alerts/acknowledge {"id": "..."}
- POST /api/alerts/unacknowledge {"id": "..."}
- POST /api/alerts/clear {"id": "..."}

Updated frontend to use the new endpoints. Legacy path-based endpoints
are preserved for backwards compatibility.

Related to #1026
2026-01-03 20:51:25 +00:00
rcourtman
ed78509f92 Fix flaky tests and improve coverage across alerts, api, and config packages
- Fix deadlock and race conditions in internal/alerts
- Add comprehensive error path tests for internal/config
- Fix 401 handling in internal/api
- Fix Docker Swarm task filtering test logic
2026-01-03 18:36:17 +00:00
rcourtman
9e339957c6 fix: Update runtime config when toggling Docker update actions setting
The DisableDockerUpdateActions setting was being saved to disk but not
updated in h.config, causing the UI toggle to appear to revert on page
refresh since the API returned the stale runtime value.

Related to #1023
2026-01-03 11:14:17 +00:00
rcourtman
94717ba867 feat(agent): add --docker-runtime flag for podman/docker selection
On systems where Docker compatibility layer obscures Podman (like CoreOS),
the auto-detection can fail. Users can now force the runtime:

  --docker-runtime podman
  PULSE_DOCKER_RUNTIME=podman

Valid values: auto (default), docker, podman

Related to Discussion #958
2026-01-01 00:24:37 +00:00
rcourtman
e3b3785582 feat(agent): add option to disable Docker update checks
Add PULSE_DISABLE_DOCKER_UPDATE_CHECKS environment variable and
--disable-docker-update-checks flag to disable Docker image update
detection. This is useful for:
- Avoiding Docker Hub rate limits
- Users who don't want update notifications in their dashboard

Related to Discussion #982
2026-01-01 00:20:49 +00:00
rcourtman
59eca65ff6 fix: Wire up LOG_FILE, LOG_MAX_SIZE, LOG_MAX_AGE, LOG_COMPRESS config options. Related to #979
The logging config options were defined but never passed to logging.Init(),
making the documented file-based log rotation non-functional.
2025-12-30 21:49:26 +00:00
rcourtman
df3ff171b9 fix: Honor DisableAutoUpdate config and disable Docker disk metrics by default 2025-12-29 23:37:30 +00:00
rcourtman
6ac5e3ebfe chore: Clean up build scripts and remove unused Docker agent entry point 2025-12-29 23:37:16 +00:00
rcourtman
d07b471e40 Refactor Docker agent: metrics collection, security checks, and batch updates
- Separated metrics collection into internal/dockeragent/collect.go
- Added agent self-update pre-flight check (--self-test)
- Implemented signed binary verification with key rotation for updates
- Added batch update support to frontend with parallel processing
- Cleaned up agent.go and added startup cleanup for backup containers
- Updated documentation for Docker features and agent security
2025-12-29 17:20:18 +00:00
rcourtman
277aca3e4e fix: Only log 'Migration complete' when inline allowed_nodes actually migrated. Related to Discussion #946
The sensor proxy self-heal script runs every 5 minutes and calls migrate-to-file.
Previously it would print 'Migration complete' every time, even when already in
file mode with nothing to migrate.

Now migrateInlineToFile returns a boolean indicating if migration actually
occurred, and the CLI only prints the message when work was done.
2025-12-29 14:15:57 +00:00
rcourtman
32111c7837 feat: Add --report-ip flag for multi-NIC systems (issue #945)
Allows specifying which IP address the agent should report, useful for:
- Multi-homed systems with separate management networks
- Systems with private monitoring interfaces
- VPN/overlay network scenarios

Usage:
  pulse-agent --report-ip 192.168.1.100
  PULSE_REPORT_IP=192.168.1.100 pulse-agent
2025-12-29 09:28:28 +00:00
rcourtman
2bf8e044df feat: Add Docker container update capability
- Add container update command handling to unified agent
- Agent can now receive update_container commands from Pulse server
- Pulls latest image, stops container, creates backup, starts new container
- Automatic rollback on failure
- Backup container cleaned up after 5 minutes
- Added comprehensive test coverage for container update logic
2025-12-29 09:00:40 +00:00
rcourtman
c1422882bd feat: Add disk exclusion filter for host agent. Closes #896
Users can now exclude specific mount points from disk monitoring:
- Via CLI: --disk-exclude /mnt/backup --disk-exclude '/media/*'
- Via env: PULSE_DISK_EXCLUDE=/mnt/backup,*pbs*

Patterns support:
- Exact paths: /mnt/backup
- Prefix patterns: /mnt/ext*
- Contains patterns: *pbs*

This addresses the common case where external disks or
PBS datastores are being monitored but shouldn't be.
2025-12-25 12:04:40 +00:00
rcourtman
2420c2affb feat: Commands disabled by default, require --enable-commands to opt-in
BREAKING CHANGE: AI command execution on agents is now disabled by default.
Users who want AI auto-fix must explicitly enable it with --enable-commands
flag or PULSE_ENABLE_COMMANDS=true environment variable.

Changes:
- Add --enable-commands flag (opt-in for command execution)
- Commands disabled by default for security (defense-in-depth)
- --disable-commands is now deprecated (logs warning, no longer needed)
- PULSE_DISABLE_COMMANDS deprecated in favor of PULSE_ENABLE_COMMANDS
- Update installer script to use --enable-commands
- Backwards compatibility: PULSE_DISABLE_COMMANDS=false still enables commands

This addresses community feedback about secure defaults for arbitrary
command execution on production infrastructure.

Related to #889
2025-12-24 17:36:44 +00:00
rcourtman
92988ae0e6 fix: allow duplicate hostnames for different Proxmox hosts. Related to #891
PROBLEM:
When two Proxmox hosts have the same hostname (e.g., 'px1' on different networks),
the auto-registration was matching by name and overwriting the first with the second.
This has been a recurring issue (#104) with at least 3 prior fix attempts.

ROOT CAUSE:
The auto-register handler matched existing nodes by BOTH Host URL and Name.
Matching by name is incorrect - different physical hosts can share hostnames.

FIXES:
1. Remove name-based matching in auto-registration - match by Host URL only
2. Add disambiguateNodeName() to append IP when duplicate hostnames exist
3. Add regression tests to prevent this from breaking again

Now when registering two hosts named 'px1':
- First becomes: px1
- Second becomes: px1 (10.0.2.224)
Both are stored as separate nodes with their own credentials.
2025-12-24 16:05:07 +00:00
rcourtman
16bd9970e9 feat: add CLI commands for mock mode management
New commands:
  pulse mock enable   - Enable mock mode
  pulse mock disable  - Disable mock mode
  pulse mock status   - Show current status

Makes it easy to toggle between mock and real data without
manually editing config files.
2025-12-22 17:26:57 +00:00
rcourtman
1d64b4c31a fix: show Removed Docker Hosts section in UI for re-enrollment
The 'Removed Docker Hosts' section was not appearing in Settings -> Agents
even when hosts were blocked from re-enrolling. This prevented users from
using the 'Allow re-enroll' button to unblock their Docker agents.

Root cause: The WebSocket store was missing:
1. The 'removedDockerHosts' property in its initial state
2. A handler to process removedDockerHosts data from WebSocket messages

This meant the backend was correctly sending the data, but the frontend
was completely ignoring it.

Changes:
- Add removedDockerHosts to WebSocket store initial state and message handler
- Add removedDockerHosts to App.tsx fallback state for consistency
- Add missing BroadcastState call after AllowDockerHostReenroll succeeds

Also includes previous fixes from this session:
- Add PULSE_AGENT_URL as alias for PULSE_AGENT_CONNECT_URL (config.go)
- Add runtime Docker/Podman auto-detection in pulse-agent (main.go)

Fixes issue reported by darthrater78 in discussion #845
2025-12-19 17:57:04 +00:00
rcourtman
4d1138793d feat(license): add initial license implementation structure to fix build 2025-12-19 17:01:57 +00:00
rcourtman
2b48b0a459 feat: add --kube-include-all-deployments flag for Kubernetes agent
Adds IncludeAllDeployments option to show all deployments, not just
problem ones (where replicas don't match desired). This provides parity
with the existing --kube-include-all-pods flag.

- Add IncludeAllDeployments to kubernetesagent.Config
- Add --kube-include-all-deployments flag and PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS env var
- Update collectDeployments to respect the new flag
- Add test for IncludeAllDeployments functionality
- Update UNIFIED_AGENT.md documentation

Addresses feedback from PR #855
2025-12-18 20:58:30 +00:00
rcourtman
30f01771ac Add meaningful tests for host agent and exec websocket 2025-12-17 17:02:01 +00:00
rcourtman
47dfa5d703 test: expand cmd and agent update coverage 2025-12-17 13:28:17 +00:00
rcourtman
67bde72c93 Improve test coverage 2025-12-17 12:00:59 +00:00
rcourtman
a259b67348 feat: add Kubernetes platform support 2025-12-12 21:31:11 +00:00
rcourtman
cbb89c4b6a feat: Docker agent retry, UI column improvements, and IP tooltip enhancements
- Add exponential backoff retry for Docker agent startup (main.go)
- Fix Docker resource/image column widths with proper truncation
- Unify IP tooltip styling across hosts and guests with detailed network info
- Improve column visibility defaults and sticky column handling
- Various component refinements for Dashboard, Storage, and Backups views
2025-12-12 08:26:36 +00:00
rcourtman
1e3fdb6f63 feat(ai): Enhanced AI patrol system with alert triggers and history persistence
- Add alert-triggered AI analysis for real-time incident response
- Implement patrol history persistence across restarts
- Add patrol schedule configuration UI in AI Settings
- Enhance AIChat with patrol status and manual trigger controls
- Add resource store improvements for AI context building
- Expand Alerts page with AI-powered analysis integration
- Add Vite proxy config for AI API endpoints
- Support both Anthropic and OpenAI providers with streaming
2025-12-10 21:08:22 +00:00
rcourtman
5a15a1820b fix(sensor-proxy): Make nodeGate.acquire() context-aware to prevent goroutine leaks
The acquire() function blocked indefinitely without respecting context
cancellation. When clients disconnect while waiting for the per-node
lock, goroutines would remain blocked forever, connections accumulate
in CLOSE_WAIT state, and rate limiter semaphores are never released.

Added acquireContext() that respects context cancellation and updated
both HTTP and RPC handlers to use it. This prevents:
- Goroutine leaks from cancelled requests
- CLOSE_WAIT connection accumulation
- Cascading failures from filled semaphores

Related to #832
2025-12-10 20:14:28 +00:00
rcourtman
ae7b66ecff refactor(ai): Remove over-engineered URL discovery service
Keep only the simple AI-powered approach:
- set_resource_url tool lets AI save discovered URLs
- Users ask AI directly: 'Find URLs for my containers'
- AI uses its intelligence to discover and set URLs

Removed:
- URLDiscoveryService (rigid port scanning)
- Bulk discovery API endpoints
- Frontend discovery button

The AI itself is smart enough to iterate through resources
and discover URLs when asked.
2025-12-10 08:35:24 +00:00