- Add comprehensive debug logging to diagnose replication status fetch failures
- Handle both array and single-object response formats from Proxmox API
- Log raw response body for easier debugging
- Log success/failure for each enrichment step
This helps diagnose issue #992 where replication last/next sync times aren't
showing. The logging will reveal if the API call is failing, returning empty
data, or returning data in an unexpected format.
Related to #992
When a Proxmox cluster is discovered, Pulse now includes the user-provided
main host URL as a fallback endpoint. This handles scenarios where Proxmox
reports internal IPs that aren't reachable from Pulse's network (e.g.,
monitoring a remote cluster across different networks).
Previously, if all cluster endpoint IPs were unreachable, the connection
would fail with no fallback. Now the ClusterClient will fall back to the
main host URL, allowing Proxmox to route API calls internally.
Related to #1028
Reverse proxies (Traefik, Caddy, nginx) often normalize or reject URLs
containing %2F (encoded slash). Alert IDs contain forward slashes
(e.g., "docker-container-state-docker:abc/def"), causing acknowledge
requests to fail with 400 errors when going through a reverse proxy.
Added new body-based endpoints that accept alert ID in JSON body:
- POST /api/alerts/acknowledge {"id": "..."}
- POST /api/alerts/unacknowledge {"id": "..."}
- POST /api/alerts/clear {"id": "..."}
Updated frontend to use the new endpoints. Legacy path-based endpoints
are preserved for backwards compatibility.
Related to #1026
The DisableDockerUpdateActions setting was being saved to disk but not
updated in h.config, causing the UI toggle to appear to revert on page
refresh since the API returned the stale runtime value.
Related to #1023
On systems where Docker compatibility layer obscures Podman (like CoreOS),
the auto-detection can fail. Users can now force the runtime:
--docker-runtime podman
PULSE_DOCKER_RUNTIME=podman
Valid values: auto (default), docker, podman
Related to Discussion #958
Add PULSE_DISABLE_DOCKER_UPDATE_CHECKS environment variable and
--disable-docker-update-checks flag to disable Docker image update
detection. This is useful for:
- Avoiding Docker Hub rate limits
- Users who don't want update notifications in their dashboard
Related to Discussion #982
- Separated metrics collection into internal/dockeragent/collect.go
- Added agent self-update pre-flight check (--self-test)
- Implemented signed binary verification with key rotation for updates
- Added batch update support to frontend with parallel processing
- Cleaned up agent.go and added startup cleanup for backup containers
- Updated documentation for Docker features and agent security
The sensor proxy self-heal script runs every 5 minutes and calls migrate-to-file.
Previously it would print 'Migration complete' every time, even when already in
file mode with nothing to migrate.
Now migrateInlineToFile returns a boolean indicating if migration actually
occurred, and the CLI only prints the message when work was done.
Allows specifying which IP address the agent should report, useful for:
- Multi-homed systems with separate management networks
- Systems with private monitoring interfaces
- VPN/overlay network scenarios
Usage:
pulse-agent --report-ip 192.168.1.100
PULSE_REPORT_IP=192.168.1.100 pulse-agent
- Add container update command handling to unified agent
- Agent can now receive update_container commands from Pulse server
- Pulls latest image, stops container, creates backup, starts new container
- Automatic rollback on failure
- Backup container cleaned up after 5 minutes
- Added comprehensive test coverage for container update logic
Users can now exclude specific mount points from disk monitoring:
- Via CLI: --disk-exclude /mnt/backup --disk-exclude '/media/*'
- Via env: PULSE_DISK_EXCLUDE=/mnt/backup,*pbs*
Patterns support:
- Exact paths: /mnt/backup
- Prefix patterns: /mnt/ext*
- Contains patterns: *pbs*
This addresses the common case where external disks or
PBS datastores are being monitored but shouldn't be.
BREAKING CHANGE: AI command execution on agents is now disabled by default.
Users who want AI auto-fix must explicitly enable it with --enable-commands
flag or PULSE_ENABLE_COMMANDS=true environment variable.
Changes:
- Add --enable-commands flag (opt-in for command execution)
- Commands disabled by default for security (defense-in-depth)
- --disable-commands is now deprecated (logs warning, no longer needed)
- PULSE_DISABLE_COMMANDS deprecated in favor of PULSE_ENABLE_COMMANDS
- Update installer script to use --enable-commands
- Backwards compatibility: PULSE_DISABLE_COMMANDS=false still enables commands
This addresses community feedback about secure defaults for arbitrary
command execution on production infrastructure.
Related to #889
PROBLEM:
When two Proxmox hosts have the same hostname (e.g., 'px1' on different networks),
the auto-registration was matching by name and overwriting the first with the second.
This has been a recurring issue (#104) with at least 3 prior fix attempts.
ROOT CAUSE:
The auto-register handler matched existing nodes by BOTH Host URL and Name.
Matching by name is incorrect - different physical hosts can share hostnames.
FIXES:
1. Remove name-based matching in auto-registration - match by Host URL only
2. Add disambiguateNodeName() to append IP when duplicate hostnames exist
3. Add regression tests to prevent this from breaking again
Now when registering two hosts named 'px1':
- First becomes: px1
- Second becomes: px1 (10.0.2.224)
Both are stored as separate nodes with their own credentials.
New commands:
pulse mock enable - Enable mock mode
pulse mock disable - Disable mock mode
pulse mock status - Show current status
Makes it easy to toggle between mock and real data without
manually editing config files.
The 'Removed Docker Hosts' section was not appearing in Settings -> Agents
even when hosts were blocked from re-enrolling. This prevented users from
using the 'Allow re-enroll' button to unblock their Docker agents.
Root cause: The WebSocket store was missing:
1. The 'removedDockerHosts' property in its initial state
2. A handler to process removedDockerHosts data from WebSocket messages
This meant the backend was correctly sending the data, but the frontend
was completely ignoring it.
Changes:
- Add removedDockerHosts to WebSocket store initial state and message handler
- Add removedDockerHosts to App.tsx fallback state for consistency
- Add missing BroadcastState call after AllowDockerHostReenroll succeeds
Also includes previous fixes from this session:
- Add PULSE_AGENT_URL as alias for PULSE_AGENT_CONNECT_URL (config.go)
- Add runtime Docker/Podman auto-detection in pulse-agent (main.go)
Fixes issue reported by darthrater78 in discussion #845
Adds IncludeAllDeployments option to show all deployments, not just
problem ones (where replicas don't match desired). This provides parity
with the existing --kube-include-all-pods flag.
- Add IncludeAllDeployments to kubernetesagent.Config
- Add --kube-include-all-deployments flag and PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS env var
- Update collectDeployments to respect the new flag
- Add test for IncludeAllDeployments functionality
- Update UNIFIED_AGENT.md documentation
Addresses feedback from PR #855
- Add exponential backoff retry for Docker agent startup (main.go)
- Fix Docker resource/image column widths with proper truncation
- Unify IP tooltip styling across hosts and guests with detailed network info
- Improve column visibility defaults and sticky column handling
- Various component refinements for Dashboard, Storage, and Backups views
- Add alert-triggered AI analysis for real-time incident response
- Implement patrol history persistence across restarts
- Add patrol schedule configuration UI in AI Settings
- Enhance AIChat with patrol status and manual trigger controls
- Add resource store improvements for AI context building
- Expand Alerts page with AI-powered analysis integration
- Add Vite proxy config for AI API endpoints
- Support both Anthropic and OpenAI providers with streaming
The acquire() function blocked indefinitely without respecting context
cancellation. When clients disconnect while waiting for the per-node
lock, goroutines would remain blocked forever, connections accumulate
in CLOSE_WAIT state, and rate limiter semaphores are never released.
Added acquireContext() that respects context cancellation and updated
both HTTP and RPC handlers to use it. This prevents:
- Goroutine leaks from cancelled requests
- CLOSE_WAIT connection accumulation
- Cascading failures from filled semaphores
Related to #832
Keep only the simple AI-powered approach:
- set_resource_url tool lets AI save discovered URLs
- Users ask AI directly: 'Find URLs for my containers'
- AI uses its intelligence to discover and set URLs
Removed:
- URLDiscoveryService (rigid port scanning)
- Bulk discovery API endpoints
- Frontend discovery button
The AI itself is smart enough to iterate through resources
and discover URLs when asked.
- Add Claude OAuth authentication support with hybrid API key/OAuth flow
- Implement Docker container historical metrics in backend and charts API
- Add CEPH cluster data collection and new Ceph page
- Enhance RAID status display with detailed tooltips and visual indicators
- Fix host deduplication logic with Docker bridge IP filtering
- Fix NVMe temperature collection in host agent
- Add comprehensive test coverage for new features
- Improve frontend sparklines and metrics history handling
- Fix navigation issues and frontend reload loops
Backend:
- Call SetMonitor after router creation to inject resource store
- Add debug logging for resource population and broadcast
Frontend:
- Add resources array to WebSocket store initial state
- Handle resources in WebSocket message processing
- Use reconcile for efficient state updates
The unified resources are now properly:
1. Populated from StateSnapshot on each broadcast cycle
2. Converted to frontend format (ResourceFrontend)
3. Included in WebSocket state messages
4. Received and stored in frontend state
5. Consumed by migrated route components
Console now shows '[DashboardView] Using unified resources: VMs: X'
confirming the migration is working end-to-end.
- Implement 'Show Problems Only' toggle combining degraded status, high CPU/memory alerts, and needs backup filters
- Add 'Investigate with AI' button to filter bar for problematic guests
- Fix dashboard column sizing inconsistencies between bars and sparklines view modes
- Fix PBS backups display and polling
- Refine AI prompt for general-purpose usage
- Fix frontend flickering and reload loops during initial load
- Integrate persistent SQLite metrics store with Monitor
- Fortify AI command routing with improved validation and logging
- Fix CSRF token handling for note deletion
- Debug and fix AI command execution issues
- Various AI reliability improvements and command safety enhancements
- Add AI service with Anthropic, OpenAI, and Ollama providers
- Add AI chat UI component with streaming responses
- Add AI settings page for configuration
- Add agent exec framework for command execution
- Add API endpoints for AI chat and configuration
On dual-stack systems with net.ipv6.bindv6only=1 (like some Proxmox 8
configurations), Go's net.Listen("tcp", "0.0.0.0:8443") may still bind
to IPv6-only. This caused IPv4 localhost connections to hang while
IPv6 worked.
Fix by detecting IPv4 addresses and explicitly using "tcp4" network
type when creating the listener. Related to #805
On systems with net.ipv6.bindv6only=1 (including some Proxmox 8
configurations), using ":8443" results in IPv6-only binding. Users
reported curl to 127.0.0.1:8443 hanging while [::1]:8443 worked.
Changed default from ":8443" to "0.0.0.0:8443" to explicitly bind IPv4.
Related to #805
- Add /healthz (liveness) and /readyz (readiness) endpoints
- Add /metrics endpoint with Prometheus metrics (pulse_agent_info, pulse_agent_up)
- Properly call dockerAgent.Close() on shutdown
- New config: -health-addr flag and PULSE_HEALTH_ADDR env (default :9191)
- Set to empty string to disable health server
Add 42 test cases for security-critical validation utility functions:
- TestStripNodeDelimiters (10 cases): IPv6 bracket handling, edge cases
- TestParseNodeIP (10 cases): IPv4/IPv6 parsing with bracket support
- TestNormalizeAllowlistEntry (11 cases): case normalization, whitespace
handling, IPv6 full form compression
- TestIPAllowed (11 cases): CIDR matching, hosts map lookup, nil handling
These functions are used for node allowlist validation to prevent SSRF
attacks in the sensor proxy.
- hashIPToUID: 11 test cases covering IP hashing for rate limiting
(determinism, range bounds, collision detection, boundary values)
- extractNodesFromYAML: 17 test cases covering YAML node list parsing
(map format, list format, mixed types, edge cases)
First test files for config_cmd.go and http_server.go utilities.
- Add pagination (100 items per page) to prevent UI lockup with 2500+ backups
- Show year in date labels for non-current year backups
- Reset to page 1 when filters change
- Add First/Previous/Next/Last navigation controls
Fixes#541
53 test cases covering 4 functions:
- gatherTags: environment/flag tag merging (18 cases)
- parseLogLevel: log level parsing (30 cases)
- defaultLogLevel: default value resolution (10 cases)
- multiValue: flag.Value interface for repeatable flags (6 cases)
Key difference from pulse-host-agent: parseLogLevel in unified agent delegates
directly to zerolog.ParseLevel without range validation, so trace/fatal/panic
levels are accepted (unlike pulse-host-agent which restricts to debug-error).
First test file for cmd/pulse-agent package.
Test Capability.Has, parseCapabilityList, capabilityNames, and
constant values. 54 test cases covering bitmask operations, parsing,
case insensitivity, whitespace handling, unknown values, and round-trip
consistency.
When api_tokens.json is modified on disk, the ConfigWatcher reloads
the tokens into memory. However, the Monitor's dockerTokenBindings and
hostTokenBindings maps were not synchronized with the new token set,
causing orphaned bindings when agents reconnect after reinstall.
Add SetAPITokenReloadCallback to ConfigWatcher that triggers Monitor's
new RebuildTokenBindings method after token reload. This method
reconstructs the binding maps from current Docker host and host agent
state, keeping only bindings for tokens that still exist in config.
Related to #773