Commit Graph

63 Commits

Author SHA1 Message Date
rcourtman
8412cc7ddb fix: env overrides and OS-aware test improvements
- Add PBS/PMG polling interval environment variable overrides in config.go
- Fix temp path expectation in detect_root_test.go using filepath.Join
- Use EvalSymlinks for symlink target comparison in self_update_test.go
- Add Linux-only skip for MAC fallback test in agent_new_test.go
- Add OS-aware RAID/SMART assertions in agent_metrics_test.go
2026-01-22 13:49:05 +00:00
rcourtman
a6a8efaa65 test: Add comprehensive test coverage across packages
New test files with expanded coverage:

API tests:
- ai_handler_test.go: AI handler unit tests with mocking
- agent_profiles_tools_test.go: Profile management tests
- alerts_endpoints_test.go: Alert API endpoint tests
- alerts_test.go: Updated for interface changes
- audit_handlers_test.go: Audit handler tests
- frontend_embed_test.go: Frontend embedding tests
- metadata_handlers_test.go, metadata_provider_test.go: Metadata tests
- notifications_test.go: Updated for interface changes
- profile_suggestions_test.go: Profile suggestion tests
- saml_service_test.go: SAML authentication tests
- sensor_proxy_gate_test.go: Sensor proxy tests
- updates_test.go: Updated for interface changes

Agent tests:
- dockeragent/signature_test.go: Docker agent signature tests
- hostagent/agent_metrics_test.go: Host agent metrics tests
- hostagent/commands_test.go: Command execution tests
- hostagent/network_helpers_test.go: Network helper tests
- hostagent/proxmox_setup_test.go: Updated setup tests
- kubernetesagent/*_test.go: Kubernetes agent tests

Core package tests:
- monitoring/kubernetes_agents_test.go, reload_test.go
- remoteconfig/client_test.go, signature_test.go
- sensors/collector_test.go
- updates/adapter_installsh_*_test.go: Install adapter tests
- updates/manager_*_test.go: Update manager tests
- websocket/hub_*_test.go: WebSocket hub tests

Library tests:
- pkg/audit/export_test.go: Audit export tests
- pkg/metrics/store_test.go: Metrics store tests
- pkg/proxmox/*_test.go: Proxmox client tests
- pkg/reporting/reporting_test.go: Reporting tests
- pkg/server/*_test.go: Server tests
- pkg/tlsutil/extra_test.go: TLS utility tests

Total: ~8000 lines of new test code
2026-01-19 19:26:18 +00:00
rcourtman
035436ad6e fix: add mutex to prevent concurrent map writes in Docker agent CPU tracking
The agent was crashing with 'fatal error: concurrent map writes' when
handleCheckUpdatesCommand spawned a goroutine that called collectOnce
concurrently with the main collection loop. Both code paths access
a.prevContainerCPU without synchronization.

Added a.cpuMu mutex to protect all accesses to prevContainerCPU in:
- pruneStaleCPUSamples()
- collectContainer() delete operation
- calculateContainerCPUPercent()

Related to #1063
2026-01-15 21:10:55 +00:00
rcourtman
4ff9e58c23 fix(dockeragent): use fallback memory reading in Docker-on-LXC
When Docker daemon runs inside an LXC container, it may report 0 for
MemTotal because it can't read the cgroup memory limits correctly.
This caused the UI to show "0B / 7GB" and trigger false alerts with
overflow percentages (214748364799.6%).

The fix checks if Docker's info.MemTotal is 0 and falls back to
gopsutil's /proc/meminfo reading (snapshot.Memory.TotalBytes) which
works correctly in LXC environments.

Fixes #1075
2026-01-10 18:41:41 +00:00
rcourtman
95fb896a03 fix: Agent 405 errors when reverse proxy redirects HTTP to HTTPS
When a user's reverse proxy redirects HTTP to HTTPS, Go's default HTTP
client behavior converts POST requests to GET on 301/302 redirects
(per HTTP specification). This causes the Pulse server to return 405
"Only POST is allowed" errors.

Added CheckRedirect to all agent HTTP clients (host, docker, kubernetes)
that returns a clear error message guiding users to use the correct
protocol in their --url flag instead of silently following redirects.

Related to #1058
2026-01-07 17:56:07 +00:00
rcourtman
74ea90e4b3 fix: Podman sockets not prioritized when --docker-runtime=podman
When --docker-runtime=podman is explicitly set, the agent should try
Podman-specific sockets first before falling back to environment
defaults (which try /var/run/docker.sock).

Also adds /var/run/podman/podman.sock as a candidate socket path,
which is used by CoreOS and some Fedora configurations.

Related to #1045
2026-01-06 10:56:37 +00:00
rcourtman
fd7e80ae17 fix: Add clear warning when Docker token is already in use
When a Docker agent tries to register with a token that's already bound
to another agent, the error was logged generically as "Failed to send
docker report". Users had to dig into logs to understand the issue.

Now logs a prominent error message:
"DOCKER REGISTRATION FAILED: This API token is already used by another
Docker agent. Each Docker host requires its own unique token. Generate
a new token in Pulse Settings > Agents and reinstall with the new token."

Related to #1027
2026-01-03 20:56:04 +00:00
rcourtman
ed78509f92 Fix flaky tests and improve coverage across alerts, api, and config packages
- Fix deadlock and race conditions in internal/alerts
- Add comprehensive error path tests for internal/config
- Fix 401 handling in internal/api
- Fix Docker Swarm task filtering test logic
2026-01-03 18:36:17 +00:00
rcourtman
a47c7803bb fix: Preserve configured runtime preference during report collection
When collecting reports, the runtime re-detection was passing RuntimeAuto
instead of the user's configured preference. This caused podman to switch
back to docker on systems like CoreOS where podman provides a docker-
compatible socket at /var/run/docker.sock.

Now the current runtime (set at init from user's --docker-runtime flag)
is passed as the preference, preventing spurious runtime switching.

Related to #1022
2026-01-03 11:30:25 +00:00
rcourtman
3fdf753a5b Enhance devcontainer and CI workflows
- Add persistent volume mounts for Go/npm caches (faster rebuilds)
- Add shell config with helpful aliases and custom prompt
- Add comprehensive devcontainer documentation
- Add pre-commit hooks for Go formatting and linting
- Use go-version-file in CI workflows instead of hardcoded versions
- Simplify docker compose commands with --wait flag
- Add gitignore entries for devcontainer auth files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 22:29:15 +00:00
rcourtman
e3b3785582 feat(agent): add option to disable Docker update checks
Add PULSE_DISABLE_DOCKER_UPDATE_CHECKS environment variable and
--disable-docker-update-checks flag to disable Docker image update
detection. This is useful for:
- Avoiding Docker Hub rate limits
- Users who don't want update notifications in their dashboard

Related to Discussion #982
2026-01-01 00:20:49 +00:00
rcourtman
d804471889 fix: Unified agent Docker module now uses same agent ID as host module
When running as a unified agent (pulse-agent with --enable-docker), the
Docker module was using a different fallback chain for agent ID than the
host module. In unified mode with empty machineID, the Docker module fell
back to daemonID while the host module fell back to hostname. This caused
the server to reject Docker reports with 'token already in use by agent'
errors because the same API token was bound to different agent IDs.

The fix ensures that in unified mode, the Docker module uses the exact
same fallback chain as the host module: machineID -> hostname. The daemonID
fallback is only used in standalone mode for backward compatibility.

Fixes #985, #986
2025-12-31 10:35:00 +00:00
rcourtman
c22dac5d8d fix: Docker container memory now subtracts inactive_file on cgroup v2 systems. Fixes container memory reporting to match 'docker stats' output by excluding reclaimable filesystem cache. Related to #435 2025-12-30 12:31:57 +00:00
rcourtman
e42cbe38f0 test: Improve discovery and Docker agent test coverage 2025-12-29 23:37:10 +00:00
rcourtman
4225f905b0 feat: Add manual Docker update check button. Related to #955 2025-12-29 23:37:05 +00:00
rcourtman
e6477a6998 fix: Resolve manifest lists for correct update detection. Related to #955 2025-12-29 17:36:16 +00:00
rcourtman
d07b471e40 Refactor Docker agent: metrics collection, security checks, and batch updates
- Separated metrics collection into internal/dockeragent/collect.go
- Added agent self-update pre-flight check (--self-test)
- Implemented signed binary verification with key rotation for updates
- Added batch update support to frontend with parallel processing
- Cleaned up agent.go and added startup cleanup for backup containers
- Updated documentation for Docker features and agent security
2025-12-29 17:20:18 +00:00
rcourtman
053a40d7df fix: Docker container update detection showing false positives
Fixed an issue where all Docker containers were showing 'click to update'
even when they were up to date. The root cause was comparing the wrong
digest types:

- Previously: Compared ImageID (local config hash) vs registry manifest digest
- Now: Uses RepoDigests from image inspect, which is the actual manifest
  digest that Docker received from the registry when pulling the image

For multi-arch images, the registry returns a manifest list digest, while
Docker stores the platform-specific image config digest locally. These
will never match, causing false positives for all multi-arch images.

Changes:
- Added ImageInspectWithRaw to dockerClient interface
- Added getImageRepoDigest method to extract RepoDigest from image
- Added matchesImageReference helper for Docker Hub naming conventions
- Added tests for matchesImageReference

Fixes #955
2025-12-29 13:49:04 +00:00
rcourtman
44fa50eed7 feat(dockeragent): improve test coverage and refactor registry dependencies
- Add comprehensive test coverage for agent report, flush buffer, and deps
- Expand flow, HTTP, CPU, and swarm test coverage
- Refactor registry access to use deps interface for better testability
- Add container update and self-update test scenarios
2025-12-29 09:57:45 +00:00
rcourtman
32111c7837 feat: Add --report-ip flag for multi-NIC systems (issue #945)
Allows specifying which IP address the agent should report, useful for:
- Multi-homed systems with separate management networks
- Systems with private monitoring interfaces
- VPN/overlay network scenarios

Usage:
  pulse-agent --report-ip 192.168.1.100
  PULSE_REPORT_IP=192.168.1.100 pulse-agent
2025-12-29 09:28:28 +00:00
rcourtman
ae1c39960f fix: Remove duplicate AI chat response streaming (issue #947)
Content was being streamed twice:
1. During each iteration of the tool loop (intended for intermediate feedback)
2. Again after the loop ended with finalContent (redundant)

This caused duplicate responses when using Ollama and other providers.
2025-12-29 09:18:05 +00:00
rcourtman
2bf8e044df feat: Add Docker container update capability
- Add container update command handling to unified agent
- Agent can now receive update_container commands from Pulse server
- Pulls latest image, stops container, creates backup, starts new container
- Automatic rollback on failure
- Backup container cleaned up after 5 minutes
- Added comprehensive test coverage for container update logic
2025-12-29 09:00:40 +00:00
rcourtman
3040800e7b fix: AI Patrol now respects exact user-configured thresholds
BREAKING CHANGE: AI Patrol now uses EXACT alert thresholds by default
instead of warning 5-15% before the threshold.

Changes:
- Default behavior: Patrol warns at your configured threshold (e.g., 96% = warns at 96%)
- New setting: 'use_proactive_thresholds' enables the old early-warning behavior
- API: Added use_proactive_thresholds to GET/PUT /api/settings/ai
- Backend: Added SetProactiveMode/GetProactiveMode to PatrolService
- Backend: Added GetThresholds to PatrolService for UI display
- Tests: Updated and added tests for both exact and proactive modes
- Also fixed unused imports in dockeragent/agent.go

When proactive mode is disabled (default):
- Watch: threshold - 5% (slight buffer)
- Warning: exact threshold

When proactive mode is enabled:
- Watch: threshold - 15%
- Warning: threshold - 5%

Related to #951
2025-12-29 08:40:34 +00:00
rcourtman
9f3367da36 fix: Include GuestURL in NodeFrontend for cluster node navigation
The GuestURL field was missing from NodeFrontend and its converter,
causing configured Guest URLs to be ignored when clicking on cluster
node names. The frontend would fall back to the auto-detected IP
instead of using the user-configured Guest URL.

Related to #940
2025-12-28 14:49:49 +00:00
rcourtman
056f503516 test: Add comprehensive tests for update detection system
- Add registry checker tests (caching, enable/disable, parsing, concurrency)
- Add alert integration tests for update detection and Pro license gating
- Add API handler tests for /api/infra-updates endpoints
- Test cleanup of tracking maps when containers are removed
- Test threshold-based alerting behavior
2025-12-27 18:54:48 +00:00
rcourtman
cf44b0cca6 polish: Improve update detection edge cases and UX
- Add GHCR (GitHub Container Registry) token support for public images
- Clean up dockerUpdateFirstSeen tracking when containers are removed
- Improve UpdateIcon tooltip to show digest info
- Add cursor-help to indicate hoverable tooltip
2025-12-27 18:14:27 +00:00
rcourtman
b50872b686 feat: Implement unified update detection system (Phase 1)
Docker container image update detection with full stack implementation:

Backend:
- Add internal/updatedetection package with types, store, registry checker, manager
- Add registry checking to Docker agent (internal/dockeragent/registry.go)
- Add ImageDigest and UpdateStatus fields to container reports
- Add /api/infra-updates API endpoints for querying updates
- Integrate with alert system - fires after 24h of pending updates

Frontend:
- Add UpdateBadge and UpdateIcon components for update indicators
- Add updateStatus to DockerContainer TypeScript interface
- Display blue update badges in Docker unified table image column
- Add 'has:update' search filter support

Features:
- Registry digest comparison for Docker Hub, GHCR, private registries
- Auth token handling for Docker Hub public images
- Caching with 6h TTL (15min for errors)
- Configurable alert delay via UpdateAlertDelayHours (default: 24h)
- Alert metadata includes digests, pending time, image info
2025-12-27 17:58:38 +00:00
rcourtman
86e41effc0 feat: Display environment variables for Docker containers
- Add Env field to Container struct in pkg/agents/docker/report.go
- Extract env vars from inspect.Config.Env in Docker agent
- Mask sensitive values (password, secret, key, token, etc.) with ***
- Display env vars in container drawer with green badges (amber for masked)
- Add tests for maskSensitiveEnvVars function

Related to #916
2025-12-25 23:52:57 +00:00
rcourtman
c1422882bd feat: Add disk exclusion filter for host agent. Closes #896
Users can now exclude specific mount points from disk monitoring:
- Via CLI: --disk-exclude /mnt/backup --disk-exclude '/media/*'
- Via env: PULSE_DISK_EXCLUDE=/mnt/backup,*pbs*

Patterns support:
- Exact paths: /mnt/backup
- Prefix patterns: /mnt/ext*
- Contains patterns: *pbs*

This addresses the common case where external disks or
PBS datastores are being monitored but shouldn't be.
2025-12-25 12:04:40 +00:00
rcourtman
d12ab31703 feat(docker-agent): add payload size logging for debugging body-too-large errors
Related to #823

- Log payload size (in KB and bytes) at debug level
- Warn when payload approaches 400KB (512KB limit)
- Helps diagnose 'request body too large' errors
2025-12-14 21:10:06 +00:00
rcourtman
b4a33c4f2d Fix offline buffering: add tests, remove unused config, fix flaky test
- Add unit tests for internal/buffer package
- Fix misleading "ring buffer" comment (it's a bounded FIFO queue)
- Remove unused BufferCapacity config field from both agents
- Rewrite flaky integration test to use polling instead of fixed sleeps
2025-12-02 22:31:44 +00:00
courtmanr@gmail.com
caf0c10206 feat: Implement offline buffering for host and docker agents
- Add internal/buffer package with generic ring buffer
- Add buffering logic to host agent for failed reports
- Add buffering logic to docker agent for failed reports
- Add BufferCapacity configuration option
- Add integration tests for buffering logic
2025-12-02 22:12:47 +00:00
rcourtman
4f824ab148 style: Apply gofmt to 37 files
Standardize code formatting across test files and monitor.go.
No functional changes.
2025-12-02 17:21:48 +00:00
rcourtman
8360ed8916 Add unit tests for dockeragent runtime detection functions
Test coverage for:
- detectRuntime: 11 test cases covering podman/docker detection via
  endpoint path, InitBinary, ServerVersion, DriverStatus, SecurityOptions
- buildRuntimeCandidates: 6 test cases verifying candidate ordering,
  deduplication, and preference-based filtering
- randomDuration: 5 test cases for boundary conditions and randomness
- determineSelfUpdateArch: validates architecture detection output

Coverage increased from 17.5% to 21.4%.
2025-11-30 08:47:46 +00:00
rcourtman
1fc4807a07 Add unit tests for Docker swarm utility functions (dockeragent)
Test coverage for serviceMode, buildContainerIndex, lookupContainer,
copyStringMap, and isTaskCompletedState. 52 test cases covering service
mode detection, container index building/lookup, and task state classification.
Coverage improved from 14.7% to 17.5%.
2025-11-30 05:32:52 +00:00
rcourtman
943c2b5082 Add unit tests for extractPodmanMetadata and detectHostRemovedError (dockeragent)
- TestExtractPodmanMetadata: 14 test cases covering pod metadata, infra
  containers, compose metadata, auto-update settings, user namespace
  handling, whitespace trimming, and precedence rules
- TestDetectHostRemovedError: 11 test cases covering JSON parsing,
  case-insensitive matching, error code validation, and edge cases

Coverage improved from 11.7% to 14.7% for internal/dockeragent.
2025-11-30 05:03:31 +00:00
rcourtman
b1bc704e3a Consolidate duplicate normalizeVersion functions into shared utility
- Move normalizeVersion to utils.NormalizeVersion for single source of truth
- Update agentupdate and dockeragent packages to use shared function
- Add 14 test cases for version normalization

This prevents bugs like issue #773 where a fix applied to one copy
but not the other caused an update loop.
2025-11-29 22:57:33 +00:00
rcourtman
97d3f30a7f Add unit tests for dockeragent utility functions
Tests for calculateCPUPercent, calculateMemoryUsage, safeFloat,
parseTime, trimLeadingSlash, and summarizeBlockIO. 28 test cases
covering edge cases like zero deltas, cache handling, NaN/Inf,
and case-insensitive op matching.

Coverage improved from 8.0% to 11.7%.
2025-11-29 22:18:10 +00:00
rcourtman
04d1e1bcf4 Fix standalone docker agent version comparison prefix mismatch
The unified agent got the version normalization fix (1b866598), but the
standalone docker agent's checkForUpdates() still used direct string
comparison. When server returns "4.34.0" and agent has "v4.34.0", this
caused an infinite self-update loop.

Apply the same normalizeVersion() function used in the unified agent.

Related to #773
2025-11-29 00:04:43 +00:00
rcourtman
b5798012fc Fix Docker CPU calculation on systemUsage counter reset
When systemUsage counter goes backward (common in unprivileged LXC
containers), the previous code used the absolute value as systemDelta.
This created an artificially small denominator, inflating CPU to ~100%.

Now leaves systemDelta as 0 on counter reset, falling through to the
time-based calculation which produces accurate results.

Related to #770
2025-11-28 15:07:49 +00:00
rcourtman
d425bc3df4 fix: multiple agent installation and update issues
- Default enableDocker to false in UI to prevent unintended Docker
  agent activation on host-only installs (Related to #766)
- Deploy agent scripts and binaries during web UI upgrades, not just
  the main binary (Related to #760)
- Apply symlink resolution fix to standalone docker agent self-update
  to prevent cross-device rename failures (Related to #737)
2025-11-27 15:49:03 +00:00
rcourtman
8152197207 fix: mark unused parameters to satisfy unparam linter
Mark intentionally unused parameters with underscore to:
- Silence unparam warnings for legitimate unused parameters
- Keep function signatures intact for API compatibility
- Remove unused req from serveChecksum helper
2025-11-27 10:12:48 +00:00
rcourtman
e1b9c133c3 fix: remove ineffectual assignments
- Fix loop variable reassignment in config_handlers.go
- Remove redundant boolean assignments in swarm.go
2025-11-27 09:48:29 +00:00
rcourtman
f76c1fb43b chore: update to non-deprecated Docker SDK types
- Use container.Summary instead of types.Container
- Use swarmtypes.ServiceListOptions instead of types.ServiceListOptions
- Use swarmtypes.TaskListOptions instead of types.TaskListOptions

These types were deprecated in favor of package-specific types.
2025-11-27 09:36:05 +00:00
rcourtman
dc4669f9f6 security: harden agent installers and auto-update mechanism
Install script (scripts/install.sh):
- Add multi-platform support: Unraid, OpenRC/Alpine, Synology DSM 6/7
- Add input validation for URL, token format, and interval
- Add binary magic verification (ELF/Mach-O/PE)
- Add cleanup trap for temp files
- Wrap script in main() for partial download protection
- Fix shellcheck compliance issues
- Add curl timeouts

Agent auto-update (agentupdate, dockeragent):
- Enforce TLS 1.2 minimum version
- Make SHA256 checksum verification mandatory
- Add 100MB binary size limit
- Add binary magic verification before replacement
- Add Unraid persistent binary update after self-update
- Add 5-minute download timeout

Frontend:
- Update Linux install description to note auto-detection of init systems
2025-11-26 13:14:58 +00:00
rcourtman
9daf1d5398 fix: cache daemon ID at init to prevent Podman token binding conflicts
Podman can return unstable or empty daemon IDs across API calls. When
the agent fetched info.ID on every report cycle, this could cause the
agent identity to change mid-session, triggering "token already in use"
errors on the server.

Cache the daemon ID at initialization and use it consistently for all
reports.

Related to #740
2025-11-26 10:23:22 +00:00
rcourtman
ae3b78d661 fix: propagate unified agent version and improve legacy cleanup
Issues found during scenario testing:

1. Version propagation: The hostagent and dockeragent packages were
   reporting their own Version (0.1.0-dev) instead of the unified
   agent's version. Added AgentVersion config field to pass the
   parent's version down.

2. macOS legacy cleanup: The install.sh script was missing cleanup
   for pulse-docker-agent on macOS.

3. Windows legacy cleanup: The install.ps1 script was missing cleanup
   for legacy PulseHostAgent and PulseDockerAgent services.

These fixes ensure:
- Unified agent reports consistent version across host/docker metrics
- Legacy agents are properly removed on all platforms during upgrade
- Users migrating from legacy agents get a clean transition
2025-11-25 23:39:10 +00:00
rcourtman
ea335546fc feat: improve legacy agent detection and migration UX
Add seamless migration path from legacy agents to unified agent:

- Add AgentType field to report payloads (unified vs legacy detection)
- Update server to detect legacy agents by type instead of version
- Add UI banner showing upgrade command when legacy agents are detected
- Add deprecation notice to install-host-agent.ps1
- Create install-docker-agent.sh stub that redirects to unified installer

Legacy agents (pulse-host-agent, pulse-docker-agent) now show a "Legacy"
badge in the UI with a one-click copy command to upgrade to the unified
agent.
2025-11-25 23:26:22 +00:00
courtmanr@gmail.com
4640633430 Improve agent update logging and installer warnings (related to #737) 2025-11-23 22:07:37 +00:00
rcourtman
6fb839cbdf Add log level control for docker agent
Related to #742
2025-11-22 07:43:48 +00:00