182 Commits

Author SHA1 Message Date
rcourtman
df43f08cf2 fix: address linting issues and test adjustments
cmd/eval/main.go:
- Fix fmt.Errorf format string lint warning (use %s instead of bare string)

internal/logging/logging_test.go:
- Update tests to account for LogBroadcaster wrapper in baseWriter
- Use string representation checks instead of direct pointer comparison
- Verify both the underlying writer and broadcaster are present
2026-02-01 23:27:11 +00:00
rcourtman
4af5fc4246 refactor(config): rename BackendHost/BackendPort to BindAddress
Simplify server config by consolidating BackendHost and BackendPort into
a single BindAddress field. The port is now solely controlled by FrontendPort.

Changes:
- Replace BackendHost/BackendPort with BindAddress in Config struct
- Add deprecation warning for BACKEND_HOST env var (use BIND_ADDRESS)
- Update connection timeout default from 45s to 60s
- Remove backendPort from SystemSettings and frontend types
- Update server.go to use cfg.BindAddress
- Update all tests to use new config field names
2026-02-01 23:26:32 +00:00
rcourtman
82ddeac454 refactor(ai): refine agentic loop compaction and knowledge accumulation
- Inject wrap-up nudges/escalations after token/turn thresholds are met
- Update compaction logic to include key accumulated facts in summaries
- Refine knowledge extraction and accumulation tests
- Update main entry point for revised AI configuration
2026-01-31 19:33:43 +00:00
rcourtman
17208cbf9d docs: update AI evaluation matrix and approval workflow documentation 2026-01-30 19:00:40 +00:00
rcourtman
0e880f3c89 feat(eval): improve patrol eval with polling-based completion
Refactor patrol eval runner to use a dual approach:
1. Poll GET /api/ai/patrol/status until Running=false (primary signal)
2. Best-effort SSE stream connection for tool event visibility

Changes:
- Add status polling loop with configurable timeout
- Make SSE stream optional (may not connect in time)
- Add Completed flag to PatrolRunResult
- Improve assertion error messages
- Add new scenarios and assertions

This is more reliable than relying solely on SSE stream which
may timeout waiting for headers during slow patrol initialization.
2026-01-29 08:20:39 +00:00
rcourtman
c409e7a05e feat(eval): add patrol-specific eval scenarios and assertions
Add comprehensive patrol evaluation framework:

- patrol.go: Runner for patrol scenarios with streaming support
- patrol_assertions.go: Assertions for tool usage, findings, timing
- patrol_scenarios.go: Scenarios for basic, investigation, finding quality
- eval_test.go: Unit tests for patrol eval runner

Scenarios:
- patrol-basic: Verifies patrol completes with tools and findings
- patrol-investigation: Ensures investigation before reporting
- patrol-finding-quality: Validates finding structure and evidence

Run with: go run ./cmd/eval -scenario patrol
2026-01-28 23:19:11 +00:00
rcourtman
44fecc37c0 feat(eval): enhance AI eval harness with retries and reporting
- Add retry logic for transient failures (phantom, stream, empty response)
- Add environment variable overrides for infrastructure naming
- Add JSON report output per scenario
- Expand assertions with new validation types
- Add more comprehensive test scenarios
- Add docs/EVAL.md with usage documentation

The eval harness now better handles flaky AI responses and provides
detailed reports for debugging.
2026-01-28 21:24:12 +00:00
rcourtman
13a6f7750c Minor updates to main and proxmox client 2026-01-28 16:52:50 +00:00
rcourtman
a04d41ce2c Add end-to-end evaluation framework for AI assistant testing
Implement comprehensive eval framework for testing Pulse Assistant:

Core components:
- Runner: Executes scenarios against live API with SSE stream parsing
- Assertions: Reusable checks (tool usage, content, duration, errors)
- Scenarios: Multi-step test workflows with configurable assertions

Basic scenarios:
- QuickSmokeTest: Minimal functionality verification
- ReadOnlyInfrastructure: List, logs, status operations
- RoutingValidation: Command routing to correct targets
- LogTailing: Bounded log commands complete properly
- Discovery: Infrastructure discovery capabilities

Advanced scenarios:
- TroubleshootingScenario: Multi-step investigation workflow
- DeepDiveScenario: Thorough single-service investigation
- ConfigInspectionScenario: Reading configuration files
- ResourceAnalysisScenario: Cross-container resource comparison
- MultiNodeScenario: Operations across Proxmox nodes
- DockerInDockerScenario: Docker containers inside LXCs
- ContextChainScenario: Context retention across turns

Usage: go test ./internal/ai/eval -live -run TestQuickSmokeTest
2026-01-28 16:49:24 +00:00
rcourtman
54a3e7f4af feat: add host agent sysinfo and improve test coverage
New Features:
- Add sysinfo module for system information collection
- Enhance agent with improved metrics handling

Test Coverage:
- Add sysinfo tests
- Add commands coverage tests
- Add hostagent coverage tests
- Add mock collector for testing
- Improve agent, metrics, sensors, and proxmox setup tests
2026-01-24 22:42:46 +00:00
rcourtman
f378a36a5e Feat: Docker Unified Table and frontend UI improvements
- Implemented Docker Unified Table and Cluster Services view
- Improved date/time formatting utilities
- Updated agent tests for multi-tenancy support in Pulse Unified Agent
2026-01-22 16:43:41 +00:00
rcourtman
7049f5b43c refactor: simplify temperature monitoring after sensor proxy removal
Remove proxy-related temperature code paths:
- temperature.go: remove proxy client integration and fallback logic
- config.go: remove SensorProxyEnabled and related config fields
- monitor.go: remove proxy client initialization and state

Temperature monitoring now relies solely on the unified agent approach.
2026-01-21 12:00:28 +00:00
rcourtman
d4a6c0d2e8 refactor: remove legacy pulse-sensor-proxy temperature monitoring
The sensor proxy approach for temperature monitoring has been superseded
by the unified agent architecture where host agents report temperature
data directly. This removes:

- cmd/pulse-sensor-proxy/ - standalone proxy daemon
- internal/tempproxy/ - client library
- internal/api/*temperature_proxy* - API handlers and tests
- internal/api/sensor_proxy_gate* - feature gate
- internal/monitoring/*proxy_test* - proxy-specific tests
- scripts/*sensor-proxy* - installation and management scripts
- security/apparmor/, security/seccomp/ - proxy security profiles

Temperature monitoring remains available via the unified agent approach.
2026-01-21 11:59:04 +00:00
rcourtman
9b8a79df93 fix(test): add graceful shutdown wait to TestRunServer_WebSocket 2026-01-20 18:20:37 +00:00
rcourtman
ee63d438cc docs: standardize markdown syntax and remove deprecated sensor-proxy docs 2026-01-20 09:43:49 +00:00
rcourtman
035436ad6e fix: add mutex to prevent concurrent map writes in Docker agent CPU tracking
The agent was crashing with 'fatal error: concurrent map writes' when
handleCheckUpdatesCommand spawned a goroutine that called collectOnce
concurrently with the main collection loop. Both code paths access
a.prevContainerCPU without synchronization.

Added a.cpuMu mutex to protect all accesses to prevContainerCPU in:
- pruneStaleCPUSamples()
- collectContainer() delete operation
- calculateContainerCPUPercent()

Related to #1063
2026-01-15 21:10:55 +00:00
rcourtman
8c7581d32c feat(profiles): add AI-assisted profile suggestions
Add ability for users to describe what kind of agent profile they need
in natural language, and have AI generate a suggestion with name,
description, config values, and rationale.

- Add ProfileSuggestionHandler with schema-aware prompting
- Add SuggestProfileModal component with example prompts
- Update AgentProfilesPanel with suggest button and description field
- Streamline ValidConfigKeys to only agent-supported settings
- Update profile validation tests for simplified schema
2026-01-15 13:24:18 +00:00
rcourtman
9b49d3171d feat(pbs): add datastore exclusion to reduce PBS log noise
Users with removable/unmounted datastores (e.g., external HDDs for
offline backup) experienced excessive PBS log entries because Pulse
was querying all datastores including unavailable ones.

Added `excludeDatastores` field to PBS node configuration that accepts
patterns to exclude specific datastores from monitoring:
- Exact names: "exthdd1500gb"
- Prefix patterns: "ext*"
- Suffix patterns: "*hdd"
- Contains patterns: "*removable*"

Pattern matching is case-insensitive.

Fixes #1105
2026-01-14 12:26:18 +00:00
rcourtman
3e2824a7ff feat: remove Enterprise badges, simplify Pro upgrade prompts
- Replace barrel import in AuditLogPanel.tsx to fix ad-blocker crash
- Remove all Enterprise/Pro badges from nav and feature headers
- Simplify upgrade CTAs to clean 'Upgrade to Pro' links
- Update docs: PULSE_PRO.md, API.md, README.md, SECURITY.md
- Align terminology: single Pro tier, no separate Enterprise tier

Also includes prior refactoring:
- Move auth package to pkg/auth for enterprise reuse
- Export server functions for testability
- Stabilize CLI tests
2026-01-09 16:51:08 +00:00
rcourtman
5c4399d69f feat(agent): add DisableCeph toggle, report_ip remote config, and improved IP detection (#929) 2026-01-09 14:45:29 +00:00
rcourtman
7db6b3e47d feat: Add AI chat session sync across devices
Implements server-side persistence for AI chat sessions, allowing users
to continue conversations across devices and browser sessions. Related
to #1059.

Backend:
- Add chat session CRUD API endpoints (GET/PUT/DELETE)
- Add persistence layer with per-user session storage
- Support session cleanup for old sessions (90 days)
- Multi-user support via auth context

Frontend:
- Rewrite aiChat store with server sync (debounced)
- Add session management UI (new conversation, switch, delete)
- Local storage as fallback/cache
- Initialize sync on app startup when AI is enabled
2026-01-08 10:47:45 +00:00
rcourtman
3f0808e9f9 docs: comprehensive core and Pro documentation overhaul
- Major updates to README.md and docs/README.md for Pulse v5
- Added technical deep-dives for Pulse Pro (docs/PULSE_PRO.md) and AI Patrol (docs/AI.md)
- Updated Prometheus metrics documentation and Helm schema for metrics separation
- Refreshed security, installation, and deployment documentation for unified agent models
- Cleaned up legacy summary files
2026-01-07 17:38:27 +00:00
rcourtman
dcdbee3c5c feat: Add in-app help system with HelpIcon component
Add contextual help icons throughout the UI to improve feature
discoverability. Users can click (?) icons to see explanations
with examples for settings they might not understand.

- HelpIcon component with click-to-open popover
- Centralized help content registry in /content/help/
- FeatureTip component for dismissible contextual tips
- Help added to: alert delay, AI endpoints, update channel
2026-01-07 09:22:23 +00:00
rcourtman
3b70e29b87 test: add PULSE_DATA_DIR to TestMainCmd
TestMainCmd was missing PULSE_DATA_DIR setup, causing it to try to
access /etc/pulse which fails in CI.
2026-01-04 19:15:38 +00:00
rcourtman
21a819f6dc test: use t.Setenv for safer test cleanup
t.Setenv ensures environment variables are restored after test
completion, preventing race conditions where background goroutines
(like config watchers) might access unset env vars during cleanup.
2026-01-04 19:08:45 +00:00
rcourtman
fdba559167 test: skip tests requiring /etc/pulse in CI
Tests that use the default /etc/pulse data directory fail in CI
where the directory doesn't exist and can't be created.
2026-01-04 18:59:48 +00:00
rcourtman
37f5e12dc2 test: add encryption keys to remaining cmd/pulse tests
TestConfigImportCmd and TestConfigAutoImportCmd need encryption keys
in CI where /etc/pulse/.encryption.key doesn't exist.
2026-01-04 18:43:40 +00:00
rcourtman
821783eef7 test: fix tests that create .enc files without encryption keys
Tests were failing in CI because they created nodes.enc files without
valid encryption keys, triggering the crypto safety check.

Added createTestEncryptionKey helper and fixed:
- TestLoad_MockEnv (config_load_test.go)
- Multiple tests in commands_test.go that create nodes.enc
2026-01-04 18:15:08 +00:00
rcourtman
7a1e3e9b4e Improve test coverage for cmd/pulse-sensor-proxy 2026-01-04 16:10:34 +00:00
rcourtman
f77025fb2f test: fix flaky tests with nonexistent path assertions
Tests using /nonexistent/... paths fail in sandboxed environments
where they return 'permission denied' instead of 'not exists'.
Use /tmp/... paths instead which reliably return 'not exists'.
2026-01-04 15:38:30 +00:00
rcourtman
45d4d68127 fix: Add debug logging and response format handling for replication status
- Add comprehensive debug logging to diagnose replication status fetch failures
- Handle both array and single-object response formats from Proxmox API
- Log raw response body for easier debugging
- Log success/failure for each enrichment step

This helps diagnose issue #992 where replication last/next sync times aren't
showing. The logging will reveal if the API call is failing, returning empty
data, or returning data in an unexpected format.

Related to #992
2026-01-04 15:01:32 +00:00
rcourtman
43b5fad12c fix: Add main host URL as fallback for remote cluster access
When a Proxmox cluster is discovered, Pulse now includes the user-provided
main host URL as a fallback endpoint. This handles scenarios where Proxmox
reports internal IPs that aren't reachable from Pulse's network (e.g.,
monitoring a remote cluster across different networks).

Previously, if all cluster endpoint IPs were unreachable, the connection
would fail with no fallback. Now the ClusterClient will fall back to the
main host URL, allowing Proxmox to route API calls internally.

Related to #1028
2026-01-04 14:54:03 +00:00
rcourtman
5d4e911298 feat: improve test coverage for pulse-sensor-proxy 2026-01-03 21:42:19 +00:00
rcourtman
22e1cc5613 test(agent): achieve 95% coverage for pulse-agent 2026-01-03 20:52:42 +00:00
rcourtman
fa43628cde fix: Alert acknowledge/unacknowledge fails with reverse proxies
Reverse proxies (Traefik, Caddy, nginx) often normalize or reject URLs
containing %2F (encoded slash). Alert IDs contain forward slashes
(e.g., "docker-container-state-docker:abc/def"), causing acknowledge
requests to fail with 400 errors when going through a reverse proxy.

Added new body-based endpoints that accept alert ID in JSON body:
- POST /api/alerts/acknowledge {"id": "..."}
- POST /api/alerts/unacknowledge {"id": "..."}
- POST /api/alerts/clear {"id": "..."}

Updated frontend to use the new endpoints. Legacy path-based endpoints
are preserved for backwards compatibility.

Related to #1026
2026-01-03 20:51:25 +00:00
rcourtman
ed78509f92 Fix flaky tests and improve coverage across alerts, api, and config packages
- Fix deadlock and race conditions in internal/alerts
- Add comprehensive error path tests for internal/config
- Fix 401 handling in internal/api
- Fix Docker Swarm task filtering test logic
2026-01-03 18:36:17 +00:00
rcourtman
9e339957c6 fix: Update runtime config when toggling Docker update actions setting
The DisableDockerUpdateActions setting was being saved to disk but not
updated in h.config, causing the UI toggle to appear to revert on page
refresh since the API returned the stale runtime value.

Related to #1023
2026-01-03 11:14:17 +00:00
rcourtman
94717ba867 feat(agent): add --docker-runtime flag for podman/docker selection
On systems where Docker compatibility layer obscures Podman (like CoreOS),
the auto-detection can fail. Users can now force the runtime:

  --docker-runtime podman
  PULSE_DOCKER_RUNTIME=podman

Valid values: auto (default), docker, podman

Related to Discussion #958
2026-01-01 00:24:37 +00:00
rcourtman
e3b3785582 feat(agent): add option to disable Docker update checks
Add PULSE_DISABLE_DOCKER_UPDATE_CHECKS environment variable and
--disable-docker-update-checks flag to disable Docker image update
detection. This is useful for:
- Avoiding Docker Hub rate limits
- Users who don't want update notifications in their dashboard

Related to Discussion #982
2026-01-01 00:20:49 +00:00
rcourtman
59eca65ff6 fix: Wire up LOG_FILE, LOG_MAX_SIZE, LOG_MAX_AGE, LOG_COMPRESS config options. Related to #979
The logging config options were defined but never passed to logging.Init(),
making the documented file-based log rotation non-functional.
2025-12-30 21:49:26 +00:00
rcourtman
df3ff171b9 fix: Honor DisableAutoUpdate config and disable Docker disk metrics by default 2025-12-29 23:37:30 +00:00
rcourtman
6ac5e3ebfe chore: Clean up build scripts and remove unused Docker agent entry point 2025-12-29 23:37:16 +00:00
rcourtman
d07b471e40 Refactor Docker agent: metrics collection, security checks, and batch updates
- Separated metrics collection into internal/dockeragent/collect.go
- Added agent self-update pre-flight check (--self-test)
- Implemented signed binary verification with key rotation for updates
- Added batch update support to frontend with parallel processing
- Cleaned up agent.go and added startup cleanup for backup containers
- Updated documentation for Docker features and agent security
2025-12-29 17:20:18 +00:00
rcourtman
277aca3e4e fix: Only log 'Migration complete' when inline allowed_nodes actually migrated. Related to Discussion #946
The sensor proxy self-heal script runs every 5 minutes and calls migrate-to-file.
Previously it would print 'Migration complete' every time, even when already in
file mode with nothing to migrate.

Now migrateInlineToFile returns a boolean indicating if migration actually
occurred, and the CLI only prints the message when work was done.
2025-12-29 14:15:57 +00:00
rcourtman
32111c7837 feat: Add --report-ip flag for multi-NIC systems (issue #945)
Allows specifying which IP address the agent should report, useful for:
- Multi-homed systems with separate management networks
- Systems with private monitoring interfaces
- VPN/overlay network scenarios

Usage:
  pulse-agent --report-ip 192.168.1.100
  PULSE_REPORT_IP=192.168.1.100 pulse-agent
2025-12-29 09:28:28 +00:00
rcourtman
2bf8e044df feat: Add Docker container update capability
- Add container update command handling to unified agent
- Agent can now receive update_container commands from Pulse server
- Pulls latest image, stops container, creates backup, starts new container
- Automatic rollback on failure
- Backup container cleaned up after 5 minutes
- Added comprehensive test coverage for container update logic
2025-12-29 09:00:40 +00:00
rcourtman
c1422882bd feat: Add disk exclusion filter for host agent. Closes #896
Users can now exclude specific mount points from disk monitoring:
- Via CLI: --disk-exclude /mnt/backup --disk-exclude '/media/*'
- Via env: PULSE_DISK_EXCLUDE=/mnt/backup,*pbs*

Patterns support:
- Exact paths: /mnt/backup
- Prefix patterns: /mnt/ext*
- Contains patterns: *pbs*

This addresses the common case where external disks or
PBS datastores are being monitored but shouldn't be.
2025-12-25 12:04:40 +00:00
rcourtman
2420c2affb feat: Commands disabled by default, require --enable-commands to opt-in
BREAKING CHANGE: AI command execution on agents is now disabled by default.
Users who want AI auto-fix must explicitly enable it with --enable-commands
flag or PULSE_ENABLE_COMMANDS=true environment variable.

Changes:
- Add --enable-commands flag (opt-in for command execution)
- Commands disabled by default for security (defense-in-depth)
- --disable-commands is now deprecated (logs warning, no longer needed)
- PULSE_DISABLE_COMMANDS deprecated in favor of PULSE_ENABLE_COMMANDS
- Update installer script to use --enable-commands
- Backwards compatibility: PULSE_DISABLE_COMMANDS=false still enables commands

This addresses community feedback about secure defaults for arbitrary
command execution on production infrastructure.

Related to #889
2025-12-24 17:36:44 +00:00
rcourtman
92988ae0e6 fix: allow duplicate hostnames for different Proxmox hosts. Related to #891
PROBLEM:
When two Proxmox hosts have the same hostname (e.g., 'px1' on different networks),
the auto-registration was matching by name and overwriting the first with the second.
This has been a recurring issue (#104) with at least 3 prior fix attempts.

ROOT CAUSE:
The auto-register handler matched existing nodes by BOTH Host URL and Name.
Matching by name is incorrect - different physical hosts can share hostnames.

FIXES:
1. Remove name-based matching in auto-registration - match by Host URL only
2. Add disambiguateNodeName() to append IP when duplicate hostnames exist
3. Add regression tests to prevent this from breaking again

Now when registering two hosts named 'px1':
- First becomes: px1
- Second becomes: px1 (10.0.2.224)
Both are stored as separate nodes with their own credentials.
2025-12-24 16:05:07 +00:00
rcourtman
16bd9970e9 feat: add CLI commands for mock mode management
New commands:
  pulse mock enable   - Enable mock mode
  pulse mock disable  - Disable mock mode
  pulse mock status   - Show current status

Makes it easy to toggle between mock and real data without
manually editing config files.
2025-12-22 17:26:57 +00:00