Commit Graph

137 Commits

Author SHA1 Message Date
rcourtman
2bf8e044df feat: Add Docker container update capability
- Add container update command handling to unified agent
- Agent can now receive update_container commands from Pulse server
- Pulls latest image, stops container, creates backup, starts new container
- Automatic rollback on failure
- Backup container cleaned up after 5 minutes
- Added comprehensive test coverage for container update logic
2025-12-29 09:00:40 +00:00
rcourtman
c1422882bd feat: Add disk exclusion filter for host agent. Closes #896
Users can now exclude specific mount points from disk monitoring:
- Via CLI: --disk-exclude /mnt/backup --disk-exclude '/media/*'
- Via env: PULSE_DISK_EXCLUDE=/mnt/backup,*pbs*

Patterns support:
- Exact paths: /mnt/backup
- Prefix patterns: /mnt/ext*
- Contains patterns: *pbs*

This addresses the common case where external disks or
PBS datastores are being monitored but shouldn't be.
2025-12-25 12:04:40 +00:00
rcourtman
2420c2affb feat: Commands disabled by default, require --enable-commands to opt-in
BREAKING CHANGE: AI command execution on agents is now disabled by default.
Users who want AI auto-fix must explicitly enable it with --enable-commands
flag or PULSE_ENABLE_COMMANDS=true environment variable.

Changes:
- Add --enable-commands flag (opt-in for command execution)
- Commands disabled by default for security (defense-in-depth)
- --disable-commands is now deprecated (logs warning, no longer needed)
- PULSE_DISABLE_COMMANDS deprecated in favor of PULSE_ENABLE_COMMANDS
- Update installer script to use --enable-commands
- Backwards compatibility: PULSE_DISABLE_COMMANDS=false still enables commands

This addresses community feedback about secure defaults for arbitrary
command execution on production infrastructure.

Related to #889
2025-12-24 17:36:44 +00:00
rcourtman
92988ae0e6 fix: allow duplicate hostnames for different Proxmox hosts. Related to #891
PROBLEM:
When two Proxmox hosts have the same hostname (e.g., 'px1' on different networks),
the auto-registration was matching by name and overwriting the first with the second.
This has been a recurring issue (#104) with at least 3 prior fix attempts.

ROOT CAUSE:
The auto-register handler matched existing nodes by BOTH Host URL and Name.
Matching by name is incorrect - different physical hosts can share hostnames.

FIXES:
1. Remove name-based matching in auto-registration - match by Host URL only
2. Add disambiguateNodeName() to append IP when duplicate hostnames exist
3. Add regression tests to prevent this from breaking again

Now when registering two hosts named 'px1':
- First becomes: px1
- Second becomes: px1 (10.0.2.224)
Both are stored as separate nodes with their own credentials.
2025-12-24 16:05:07 +00:00
rcourtman
16bd9970e9 feat: add CLI commands for mock mode management
New commands:
  pulse mock enable   - Enable mock mode
  pulse mock disable  - Disable mock mode
  pulse mock status   - Show current status

Makes it easy to toggle between mock and real data without
manually editing config files.
2025-12-22 17:26:57 +00:00
rcourtman
1d64b4c31a fix: show Removed Docker Hosts section in UI for re-enrollment
The 'Removed Docker Hosts' section was not appearing in Settings -> Agents
even when hosts were blocked from re-enrolling. This prevented users from
using the 'Allow re-enroll' button to unblock their Docker agents.

Root cause: The WebSocket store was missing:
1. The 'removedDockerHosts' property in its initial state
2. A handler to process removedDockerHosts data from WebSocket messages

This meant the backend was correctly sending the data, but the frontend
was completely ignoring it.

Changes:
- Add removedDockerHosts to WebSocket store initial state and message handler
- Add removedDockerHosts to App.tsx fallback state for consistency
- Add missing BroadcastState call after AllowDockerHostReenroll succeeds

Also includes previous fixes from this session:
- Add PULSE_AGENT_URL as alias for PULSE_AGENT_CONNECT_URL (config.go)
- Add runtime Docker/Podman auto-detection in pulse-agent (main.go)

Fixes issue reported by darthrater78 in discussion #845
2025-12-19 17:57:04 +00:00
rcourtman
4d1138793d feat(license): add initial license implementation structure to fix build 2025-12-19 17:01:57 +00:00
rcourtman
2b48b0a459 feat: add --kube-include-all-deployments flag for Kubernetes agent
Adds IncludeAllDeployments option to show all deployments, not just
problem ones (where replicas don't match desired). This provides parity
with the existing --kube-include-all-pods flag.

- Add IncludeAllDeployments to kubernetesagent.Config
- Add --kube-include-all-deployments flag and PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS env var
- Update collectDeployments to respect the new flag
- Add test for IncludeAllDeployments functionality
- Update UNIFIED_AGENT.md documentation

Addresses feedback from PR #855
2025-12-18 20:58:30 +00:00
rcourtman
30f01771ac Add meaningful tests for host agent and exec websocket 2025-12-17 17:02:01 +00:00
rcourtman
47dfa5d703 test: expand cmd and agent update coverage 2025-12-17 13:28:17 +00:00
rcourtman
67bde72c93 Improve test coverage 2025-12-17 12:00:59 +00:00
rcourtman
a259b67348 feat: add Kubernetes platform support 2025-12-12 21:31:11 +00:00
rcourtman
cbb89c4b6a feat: Docker agent retry, UI column improvements, and IP tooltip enhancements
- Add exponential backoff retry for Docker agent startup (main.go)
- Fix Docker resource/image column widths with proper truncation
- Unify IP tooltip styling across hosts and guests with detailed network info
- Improve column visibility defaults and sticky column handling
- Various component refinements for Dashboard, Storage, and Backups views
2025-12-12 08:26:36 +00:00
rcourtman
1e3fdb6f63 feat(ai): Enhanced AI patrol system with alert triggers and history persistence
- Add alert-triggered AI analysis for real-time incident response
- Implement patrol history persistence across restarts
- Add patrol schedule configuration UI in AI Settings
- Enhance AIChat with patrol status and manual trigger controls
- Add resource store improvements for AI context building
- Expand Alerts page with AI-powered analysis integration
- Add Vite proxy config for AI API endpoints
- Support both Anthropic and OpenAI providers with streaming
2025-12-10 21:08:22 +00:00
rcourtman
5a15a1820b fix(sensor-proxy): Make nodeGate.acquire() context-aware to prevent goroutine leaks
The acquire() function blocked indefinitely without respecting context
cancellation. When clients disconnect while waiting for the per-node
lock, goroutines would remain blocked forever, connections accumulate
in CLOSE_WAIT state, and rate limiter semaphores are never released.

Added acquireContext() that respects context cancellation and updated
both HTTP and RPC handlers to use it. This prevents:
- Goroutine leaks from cancelled requests
- CLOSE_WAIT connection accumulation
- Cascading failures from filled semaphores

Related to #832
2025-12-10 20:14:28 +00:00
rcourtman
ae7b66ecff refactor(ai): Remove over-engineered URL discovery service
Keep only the simple AI-powered approach:
- set_resource_url tool lets AI save discovered URLs
- Users ask AI directly: 'Find URLs for my containers'
- AI uses its intelligence to discover and set URLs

Removed:
- URLDiscoveryService (rigid port scanning)
- Bulk discovery API endpoints
- Frontend discovery button

The AI itself is smart enough to iterate through resources
and discover URLs when asked.
2025-12-10 08:35:24 +00:00
rcourtman
927ac76bad feat: AI integration, Docker metrics, RAID display, and infrastructure improvements
- Add Claude OAuth authentication support with hybrid API key/OAuth flow
- Implement Docker container historical metrics in backend and charts API
- Add CEPH cluster data collection and new Ceph page
- Enhance RAID status display with detailed tooltips and visual indicators
- Fix host deduplication logic with Docker bridge IP filtering
- Fix NVMe temperature collection in host agent
- Add comprehensive test coverage for new features
- Improve frontend sparklines and metrics history handling
- Fix navigation issues and frontend reload loops
2025-12-09 09:29:27 +00:00
rcourtman
e1e83c8295 fix: complete unified resources WebSocket integration
Backend:
- Call SetMonitor after router creation to inject resource store
- Add debug logging for resource population and broadcast

Frontend:
- Add resources array to WebSocket store initial state
- Handle resources in WebSocket message processing
- Use reconcile for efficient state updates

The unified resources are now properly:
1. Populated from StateSnapshot on each broadcast cycle
2. Converted to frontend format (ResourceFrontend)
3. Included in WebSocket state messages
4. Received and stored in frontend state
5. Consumed by migrated route components

Console now shows '[DashboardView] Using unified resources: VMs: X'
confirming the migration is working end-to-end.
2025-12-07 23:52:00 +00:00
rcourtman
bcd7b550d4 AI Problem Solver implementation and various fixes
- Implement 'Show Problems Only' toggle combining degraded status, high CPU/memory alerts, and needs backup filters
- Add 'Investigate with AI' button to filter bar for problematic guests
- Fix dashboard column sizing inconsistencies between bars and sparklines view modes
- Fix PBS backups display and polling
- Refine AI prompt for general-purpose usage
- Fix frontend flickering and reload loops during initial load
- Integrate persistent SQLite metrics store with Monitor
- Fortify AI command routing with improved validation and logging
- Fix CSRF token handling for note deletion
- Debug and fix AI command execution issues
- Various AI reliability improvements and command safety enhancements
2025-12-06 23:46:08 +00:00
rcourtman
8948e84fe5 feat: AI features, agent improvements, and host monitoring enhancements
AI Chat Integration:
- Multi-provider support (Anthropic, OpenAI, Ollama)
- Streaming responses with markdown rendering
- Agent command execution for remote troubleshooting
- Context-aware conversations with host/container metadata

Agent Updates:
- Add --enable-proxmox flag for automatic PVE/PBS token setup
- Improve auto-update with semver comparison (prevents downgrades)
- Add updatedFrom tracking to report previous version after update
- Reduce initial update check delay from 30s to 5s
- Add agent version column to Hosts page table

Host Metrics:
- Add DiskIO stats collection (read/write bytes, ops, time)
- Improve disk filtering to exclude Docker overlay mounts
- Add RAID array monitoring via mdadm
- Enhanced temperature sensor parsing

Frontend:
- New Agent Version column on Hosts overview table
- Improved node modal with agent-first installation flow
- Add DiskIO display in host drawer
- Better responsive handling for metric bars
2025-12-05 10:37:02 +00:00
rcourtman
53d7776d6b wip: AI chat integration with multi-provider support
- Add AI service with Anthropic, OpenAI, and Ollama providers
- Add AI chat UI component with streaming responses
- Add AI settings page for configuration
- Add agent exec framework for command execution
- Add API endpoints for AI chat and configuration
2025-12-04 20:16:53 +00:00
rcourtman
59b713d176 fix: Force tcp4 network for IPv4 addresses in sensor-proxy HTTP mode
On dual-stack systems with net.ipv6.bindv6only=1 (like some Proxmox 8
configurations), Go's net.Listen("tcp", "0.0.0.0:8443") may still bind
to IPv6-only. This caused IPv4 localhost connections to hang while
IPv6 worked.

Fix by detecting IPv4 addresses and explicitly using "tcp4" network
type when creating the listener. Related to #805
2025-12-04 20:09:37 +00:00
rcourtman
7d733db3a8 fix: Default sensor-proxy HTTP to 0.0.0.0:8443 for IPv4 binding
On systems with net.ipv6.bindv6only=1 (including some Proxmox 8
configurations), using ":8443" results in IPv6-only binding. Users
reported curl to 127.0.0.1:8443 hanging while [::1]:8443 worked.

Changed default from ":8443" to "0.0.0.0:8443" to explicitly bind IPv4.

Related to #805
2025-12-03 20:25:08 +00:00
rcourtman
7fc15417e4 Add health/metrics server and proper cleanup to unified agent
- Add /healthz (liveness) and /readyz (readiness) endpoints
- Add /metrics endpoint with Prometheus metrics (pulse_agent_info, pulse_agent_up)
- Properly call dockerAgent.Close() on shutdown
- New config: -health-addr flag and PULSE_HEALTH_ADDR env (default :9191)
- Set to empty string to disable health server
2025-12-02 22:42:05 +00:00
rcourtman
4f824ab148 style: Apply gofmt to 37 files
Standardize code formatting across test files and monitor.go.
No functional changes.
2025-12-02 17:21:48 +00:00
rcourtman
97672b4701 Add unit tests for validation utility functions (pulse-sensor-proxy)
Add 42 test cases for security-critical validation utility functions:

- TestStripNodeDelimiters (10 cases): IPv6 bracket handling, edge cases
- TestParseNodeIP (10 cases): IPv4/IPv6 parsing with bracket support
- TestNormalizeAllowlistEntry (11 cases): case normalization, whitespace
  handling, IPv6 full form compression
- TestIPAllowed (11 cases): CIDR matching, hosts map lookup, nil handling

These functions are used for node allowlist validation to prevent SSRF
attacks in the sensor proxy.
2025-11-30 17:04:43 +00:00
rcourtman
786a78af85 Add unit tests for dedupeUint32 and dedupeStrings (pulse-sensor-proxy) 2025-11-30 15:26:42 +00:00
rcourtman
48af5615b9 Add unit tests for pulse-sensor-proxy utility functions
- hashIPToUID: 11 test cases covering IP hashing for rate limiting
  (determinism, range bounds, collision detection, boundary values)
- extractNodesFromYAML: 17 test cases covering YAML node list parsing
  (map format, list format, mixed types, edge cases)

First test files for config_cmd.go and http_server.go utilities.
2025-11-30 14:19:43 +00:00
rcourtman
9476de40a6 Add pagination to backup list for large backup counts
- Add pagination (100 items per page) to prevent UI lockup with 2500+ backups
- Show year in date labels for non-current year backups
- Reset to page 1 when filters change
- Add First/Previous/Next/Last navigation controls

Fixes #541
2025-11-30 09:55:01 +00:00
rcourtman
5c90fb102b Add unit tests for pulse-agent utility functions (unified agent)
53 test cases covering 4 functions:
- gatherTags: environment/flag tag merging (18 cases)
- parseLogLevel: log level parsing (30 cases)
- defaultLogLevel: default value resolution (10 cases)
- multiValue: flag.Value interface for repeatable flags (6 cases)

Key difference from pulse-host-agent: parseLogLevel in unified agent delegates
directly to zerolog.ParseLevel without range validation, so trace/fatal/panic
levels are accepted (unlike pulse-host-agent which restricts to debug-error).

First test file for cmd/pulse-agent package.
2025-11-30 08:34:09 +00:00
rcourtman
048708b2c7 Add unit tests for pulse-host-agent utility functions
60 test cases for gatherTags, parseLogLevel, defaultLogLevel, multiValue.
First test file for cmd/pulse-host-agent package.
2025-11-30 08:17:18 +00:00
rcourtman
a3b64d4c00 Add unit tests for parseLogLevel and splitStringList (dockeragent) 2025-11-30 06:50:09 +00:00
rcourtman
f9a4df2e5a Add unit tests for sanitizeNodeLabel (pulse-sensor-proxy/metrics.go) 2025-11-30 04:50:51 +00:00
rcourtman
deb5c3cd23 Add unit tests for pulse-sensor-proxy capability functions
Test Capability.Has, parseCapabilityList, capabilityNames, and
constant values. 54 test cases covering bitmask operations, parsing,
case insensitivity, whitespace handling, unknown values, and round-trip
consistency.
2025-11-30 03:32:53 +00:00
rcourtman
dbf32baaa0 Rebuild agent token bindings on API token config reload
When api_tokens.json is modified on disk, the ConfigWatcher reloads
the tokens into memory. However, the Monitor's dockerTokenBindings and
hostTokenBindings maps were not synchronized with the new token set,
causing orphaned bindings when agents reconnect after reinstall.

Add SetAPITokenReloadCallback to ConfigWatcher that triggers Monitor's
new RebuildTokenBindings method after token reload. This method
reconstructs the binding maps from current Docker host and host agent
state, keeping only bindings for tokens that still exist in config.

Related to #773
2025-11-29 14:09:30 +00:00
rcourtman
5f33a2fa78 feat: add self-update capability to standalone pulse-host-agent
The standalone pulse-host-agent was missing self-update functionality
that existed in pulse-docker-agent and the unified pulse-agent.

Changes:
- Add agentupdate integration to pulse-host-agent
- Add --no-auto-update flag and PULSE_NO_AUTO_UPDATE env var
- Update Windows service to use errgroup pattern with auto-updater
- Move version from internal/hostagent to main package for ldflags

Related to #737
2025-11-27 20:21:05 +00:00
rcourtman
94259a45da Add Windows service support to unified agent
Port Windows SCM integration from pulse-host-agent to pulse-agent,
enabling the unified agent to run as a Windows service with proper
start/stop handling and event logging.

Related to #766
2025-11-27 17:00:03 +00:00
rcourtman
6a8258be14 chore: remove dead code and unused exports
Remove ~900 lines of unused code identified by static analysis:

Go:
- internal/logging: Remove 10 unused functions (InitFromConfig, New,
  FromContext, WithLogger, etc.) that were built but never integrated
- cmd/pulse-sensor-proxy: Remove 7 dead validation functions for a
  removed command execution feature
- internal/metrics: Remove 8 unused notification metric functions and
  10 Prometheus metrics that were never wired up

Frontend:
- Delete ActivationBanner.tsx stub component
- Remove unused exports: stopMetricsSampler, getSamplerStatus,
  formatSpeedCompact, parseMetricKey, getResourceAlerts
2025-11-27 13:17:39 +00:00
rcourtman
2ff0a0988f refactor: remove unnecessary type conversions
Remove redundant type conversions identified by unconvert linter:
- Remove int() conversions for already-int VMID fields
- Remove int64() conversions for already-int64 arithmetic results
- Remove uint64() conversions for already-uint64 Disk/MaxDisk fields
- Remove int() on syscall.Stdin (already int constant)
2025-11-27 10:33:35 +00:00
rcourtman
fdb2f86d90 style: fix revive linter warnings
- Mark unused stub parameters with underscore
- Rename 'copy' variable to avoid shadowing builtin
- Remove unnecessary else blocks after return statements
2025-11-27 10:26:26 +00:00
rcourtman
bd2635cabe chore: remove deprecated build tags and use strings.ReplaceAll
- Remove redundant // +build directives (go:build is sufficient in Go 1.17+)
- Replace strings.Replace(..., -1) with strings.ReplaceAll
2025-11-27 10:16:08 +00:00
rcourtman
6ff345fb6b chore: fix staticcheck SA warnings
- Fix SA4006 unused value issues in ssh.go, validation.go, generator.go
- Replace deprecated ioutil with io/os in config.go
- Replace deprecated tar.TypeRegA with tar.TypeReg
- Remove deprecated rand.Seed calls (auto-seeded in Go 1.20+)
- Fix always-true nil check in main.go
- Fix impossible nil comparison in tempproxy/client.go
- Add nil check for config in monitor.New()
2025-11-27 09:16:53 +00:00
rcourtman
bc9e89696b chore: fix staticcheck U1000 unused code warnings
- Remove unused ipv6Regex from validation.go
- Suppress unused recordAlertFired/recordAlertResolved hooks (kept for future use)
- Remove unused apiLimiter rate limiter
- Remove unused stopOnce fields from csrf_store.go and session_store.go
- Remove unused lastBroadcast field from hub.go
- Remove unused lastUsedIndex field from cluster_client.go
2025-11-27 09:12:17 +00:00
rcourtman
3fce14469c chore: remove legacy proxy handlers and unused functions
Remove legacy V1 handlers replaced by V2 versions:
- sendError (replaced by sendErrorV2)
- handleGetStatus (replaced by handleGetStatusV2)
- handleEnsureClusterKeys (replaced by handleEnsureClusterKeysV2)
- handleRegisterNodes (replaced by handleRegisterNodesV2)
- handleGetTemperature (replaced by handleGetTemperatureV2)

Also remove related unused functions:
- getPublicKey wrapper (only getPublicKeyFrom is used)
- pushSSHKey wrapper (only pushSSHKeyFrom is used)
- nodeValidator.ipAllowed method (standalone ipAllowed is used)
- validateConfigFile (never called)
- runServiceDebug (Windows debug mode, never called)
2025-11-27 08:41:28 +00:00
rcourtman
01f7d81d38 style: fix gofmt formatting inconsistencies
Run gofmt -w to fix tab/space inconsistencies across 33 files.
2025-11-26 23:44:36 +00:00
rcourtman
6853a0ffd1 feat: serve install scripts from GitHub releases instead of main branch
Scripts like install.sh and install-sensor-proxy.sh are now attached
as release assets and downloaded from releases/latest/download/ URLs.
This ensures users always get scripts compatible with their installed
version, even while development continues on main.

Changes:
- build-release.sh: copy install scripts to release directory
- create-release.yml: upload scripts as release assets
- Updated all documentation and code references to use release URLs
- Scripts reference each other via release URLs for consistency
2025-11-26 08:59:59 +00:00
rcourtman
ae3b78d661 fix: propagate unified agent version and improve legacy cleanup
Issues found during scenario testing:

1. Version propagation: The hostagent and dockeragent packages were
   reporting their own Version (0.1.0-dev) instead of the unified
   agent's version. Added AgentVersion config field to pass the
   parent's version down.

2. macOS legacy cleanup: The install.sh script was missing cleanup
   for pulse-docker-agent on macOS.

3. Windows legacy cleanup: The install.ps1 script was missing cleanup
   for legacy PulseHostAgent and PulseDockerAgent services.

These fixes ensure:
- Unified agent reports consistent version across host/docker metrics
- Legacy agents are properly removed on all platforms during upgrade
- Users migrating from legacy agents get a clean transition
2025-11-25 23:39:10 +00:00
rcourtman
ea335546fc feat: improve legacy agent detection and migration UX
Add seamless migration path from legacy agents to unified agent:

- Add AgentType field to report payloads (unified vs legacy detection)
- Update server to detect legacy agents by type instead of version
- Add UI banner showing upgrade command when legacy agents are detected
- Add deprecation notice to install-host-agent.ps1
- Create install-docker-agent.sh stub that redirects to unified installer

Legacy agents (pulse-host-agent, pulse-docker-agent) now show a "Legacy"
badge in the UI with a one-click copy command to upgrade to the unified
agent.
2025-11-25 23:26:22 +00:00
rcourtman
0436101ee5 feat: add auto-update support for unified agent
Implement self-update capability for the unified pulse-agent binary:

- Add internal/agentupdate package with cross-platform update logic
- Hourly version checks against /api/agent/version endpoint
- SHA256 checksum verification for downloaded binaries
- Atomic binary replacement with backup/rollback on failure
- Support for Linux, macOS, and Windows (10 platform/arch combinations)

Build and release changes:
- Dockerfile builds unified agent for all platforms
- build-release.sh includes unified agent in release artifacts
- validate-release.sh validates unified agent binaries
- Install scripts (install.sh, install.ps1) use correct URL format

Related to #727, #737
2025-11-25 23:15:03 +00:00
courtmanr@gmail.com
930c086556 WIP: Save all pending changes including frontend updates and unified agent scaffolding 2025-11-25 11:27:07 +00:00