Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-02-18 00:17:39 +01:00

Author	SHA1	Message	Date
rcourtman	45d4d68127	fix: Add debug logging and response format handling for replication status - Add comprehensive debug logging to diagnose replication status fetch failures - Handle both array and single-object response formats from Proxmox API - Log raw response body for easier debugging - Log success/failure for each enrichment step This helps diagnose issue #992 where replication last/next sync times aren't showing. The logging will reveal if the API call is failing, returning empty data, or returning data in an unexpected format. Related to #992	2026-01-04 15:01:32 +00:00
rcourtman	43b5fad12c	fix: Add main host URL as fallback for remote cluster access When a Proxmox cluster is discovered, Pulse now includes the user-provided main host URL as a fallback endpoint. This handles scenarios where Proxmox reports internal IPs that aren't reachable from Pulse's network (e.g., monitoring a remote cluster across different networks). Previously, if all cluster endpoint IPs were unreachable, the connection would fail with no fallback. Now the ClusterClient will fall back to the main host URL, allowing Proxmox to route API calls internally. Related to #1028	2026-01-04 14:54:03 +00:00
rcourtman	5d4e911298	feat: improve test coverage for pulse-sensor-proxy	2026-01-03 21:42:19 +00:00
rcourtman	22e1cc5613	test(agent): achieve 95% coverage for pulse-agent	2026-01-03 20:52:42 +00:00
rcourtman	fa43628cde	fix: Alert acknowledge/unacknowledge fails with reverse proxies Reverse proxies (Traefik, Caddy, nginx) often normalize or reject URLs containing %2F (encoded slash). Alert IDs contain forward slashes (e.g., "docker-container-state-docker:abc/def"), causing acknowledge requests to fail with 400 errors when going through a reverse proxy. Added new body-based endpoints that accept alert ID in JSON body: - POST /api/alerts/acknowledge {"id": "..."} - POST /api/alerts/unacknowledge {"id": "..."} - POST /api/alerts/clear {"id": "..."} Updated frontend to use the new endpoints. Legacy path-based endpoints are preserved for backwards compatibility. Related to #1026	2026-01-03 20:51:25 +00:00
rcourtman	ed78509f92	Fix flaky tests and improve coverage across alerts, api, and config packages - Fix deadlock and race conditions in internal/alerts - Add comprehensive error path tests for internal/config - Fix 401 handling in internal/api - Fix Docker Swarm task filtering test logic	2026-01-03 18:36:17 +00:00
rcourtman	9e339957c6	fix: Update runtime config when toggling Docker update actions setting The DisableDockerUpdateActions setting was being saved to disk but not updated in h.config, causing the UI toggle to appear to revert on page refresh since the API returned the stale runtime value. Related to #1023	2026-01-03 11:14:17 +00:00
rcourtman	94717ba867	feat(agent): add --docker-runtime flag for podman/docker selection On systems where Docker compatibility layer obscures Podman (like CoreOS), the auto-detection can fail. Users can now force the runtime: --docker-runtime podman PULSE_DOCKER_RUNTIME=podman Valid values: auto (default), docker, podman Related to Discussion #958	2026-01-01 00:24:37 +00:00
rcourtman	e3b3785582	feat(agent): add option to disable Docker update checks Add PULSE_DISABLE_DOCKER_UPDATE_CHECKS environment variable and --disable-docker-update-checks flag to disable Docker image update detection. This is useful for: - Avoiding Docker Hub rate limits - Users who don't want update notifications in their dashboard Related to Discussion #982	2026-01-01 00:20:49 +00:00
rcourtman	59eca65ff6	fix: Wire up LOG_FILE, LOG_MAX_SIZE, LOG_MAX_AGE, LOG_COMPRESS config options. Related to #979 The logging config options were defined but never passed to logging.Init(), making the documented file-based log rotation non-functional.	2025-12-30 21:49:26 +00:00
rcourtman	df3ff171b9	fix: Honor DisableAutoUpdate config and disable Docker disk metrics by default	2025-12-29 23:37:30 +00:00
rcourtman	6ac5e3ebfe	chore: Clean up build scripts and remove unused Docker agent entry point	2025-12-29 23:37:16 +00:00
rcourtman	d07b471e40	Refactor Docker agent: metrics collection, security checks, and batch updates - Separated metrics collection into internal/dockeragent/collect.go - Added agent self-update pre-flight check (--self-test) - Implemented signed binary verification with key rotation for updates - Added batch update support to frontend with parallel processing - Cleaned up agent.go and added startup cleanup for backup containers - Updated documentation for Docker features and agent security	2025-12-29 17:20:18 +00:00
rcourtman	277aca3e4e	fix: Only log 'Migration complete' when inline allowed_nodes actually migrated. Related to Discussion #946 The sensor proxy self-heal script runs every 5 minutes and calls migrate-to-file. Previously it would print 'Migration complete' every time, even when already in file mode with nothing to migrate. Now migrateInlineToFile returns a boolean indicating if migration actually occurred, and the CLI only prints the message when work was done.	2025-12-29 14:15:57 +00:00
rcourtman	32111c7837	feat: Add --report-ip flag for multi-NIC systems (issue #945 ) Allows specifying which IP address the agent should report, useful for: - Multi-homed systems with separate management networks - Systems with private monitoring interfaces - VPN/overlay network scenarios Usage: pulse-agent --report-ip 192.168.1.100 PULSE_REPORT_IP=192.168.1.100 pulse-agent	2025-12-29 09:28:28 +00:00
rcourtman	2bf8e044df	feat: Add Docker container update capability - Add container update command handling to unified agent - Agent can now receive update_container commands from Pulse server - Pulls latest image, stops container, creates backup, starts new container - Automatic rollback on failure - Backup container cleaned up after 5 minutes - Added comprehensive test coverage for container update logic	2025-12-29 09:00:40 +00:00
rcourtman	c1422882bd	feat: Add disk exclusion filter for host agent. Closes #896 Users can now exclude specific mount points from disk monitoring: - Via CLI: --disk-exclude /mnt/backup --disk-exclude '/media/' - Via env: PULSE_DISK_EXCLUDE=/mnt/backup,pbs* Patterns support: - Exact paths: /mnt/backup - Prefix patterns: /mnt/ext* - Contains patterns: pbs This addresses the common case where external disks or PBS datastores are being monitored but shouldn't be.	2025-12-25 12:04:40 +00:00
rcourtman	2420c2affb	feat: Commands disabled by default, require --enable-commands to opt-in BREAKING CHANGE: AI command execution on agents is now disabled by default. Users who want AI auto-fix must explicitly enable it with --enable-commands flag or PULSE_ENABLE_COMMANDS=true environment variable. Changes: - Add --enable-commands flag (opt-in for command execution) - Commands disabled by default for security (defense-in-depth) - --disable-commands is now deprecated (logs warning, no longer needed) - PULSE_DISABLE_COMMANDS deprecated in favor of PULSE_ENABLE_COMMANDS - Update installer script to use --enable-commands - Backwards compatibility: PULSE_DISABLE_COMMANDS=false still enables commands This addresses community feedback about secure defaults for arbitrary command execution on production infrastructure. Related to #889	2025-12-24 17:36:44 +00:00
rcourtman	92988ae0e6	fix: allow duplicate hostnames for different Proxmox hosts. Related to #891 PROBLEM: When two Proxmox hosts have the same hostname (e.g., 'px1' on different networks), the auto-registration was matching by name and overwriting the first with the second. This has been a recurring issue (#104) with at least 3 prior fix attempts. ROOT CAUSE: The auto-register handler matched existing nodes by BOTH Host URL and Name. Matching by name is incorrect - different physical hosts can share hostnames. FIXES: 1. Remove name-based matching in auto-registration - match by Host URL only 2. Add disambiguateNodeName() to append IP when duplicate hostnames exist 3. Add regression tests to prevent this from breaking again Now when registering two hosts named 'px1': - First becomes: px1 - Second becomes: px1 (10.0.2.224) Both are stored as separate nodes with their own credentials.	2025-12-24 16:05:07 +00:00
rcourtman	16bd9970e9	feat: add CLI commands for mock mode management New commands: pulse mock enable - Enable mock mode pulse mock disable - Disable mock mode pulse mock status - Show current status Makes it easy to toggle between mock and real data without manually editing config files.	2025-12-22 17:26:57 +00:00
rcourtman	1d64b4c31a	fix: show Removed Docker Hosts section in UI for re-enrollment The 'Removed Docker Hosts' section was not appearing in Settings -> Agents even when hosts were blocked from re-enrolling. This prevented users from using the 'Allow re-enroll' button to unblock their Docker agents. Root cause: The WebSocket store was missing: 1. The 'removedDockerHosts' property in its initial state 2. A handler to process removedDockerHosts data from WebSocket messages This meant the backend was correctly sending the data, but the frontend was completely ignoring it. Changes: - Add removedDockerHosts to WebSocket store initial state and message handler - Add removedDockerHosts to App.tsx fallback state for consistency - Add missing BroadcastState call after AllowDockerHostReenroll succeeds Also includes previous fixes from this session: - Add PULSE_AGENT_URL as alias for PULSE_AGENT_CONNECT_URL (config.go) - Add runtime Docker/Podman auto-detection in pulse-agent (main.go) Fixes issue reported by darthrater78 in discussion #845	2025-12-19 17:57:04 +00:00
rcourtman	4d1138793d	feat(license): add initial license implementation structure to fix build	2025-12-19 17:01:57 +00:00
rcourtman	2b48b0a459	feat: add --kube-include-all-deployments flag for Kubernetes agent Adds IncludeAllDeployments option to show all deployments, not just problem ones (where replicas don't match desired). This provides parity with the existing --kube-include-all-pods flag. - Add IncludeAllDeployments to kubernetesagent.Config - Add --kube-include-all-deployments flag and PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS env var - Update collectDeployments to respect the new flag - Add test for IncludeAllDeployments functionality - Update UNIFIED_AGENT.md documentation Addresses feedback from PR #855	2025-12-18 20:58:30 +00:00
rcourtman	30f01771ac	Add meaningful tests for host agent and exec websocket	2025-12-17 17:02:01 +00:00
rcourtman	47dfa5d703	test: expand cmd and agent update coverage	2025-12-17 13:28:17 +00:00
rcourtman	67bde72c93	Improve test coverage	2025-12-17 12:00:59 +00:00
rcourtman	a259b67348	feat: add Kubernetes platform support	2025-12-12 21:31:11 +00:00
rcourtman	cbb89c4b6a	feat: Docker agent retry, UI column improvements, and IP tooltip enhancements - Add exponential backoff retry for Docker agent startup (main.go) - Fix Docker resource/image column widths with proper truncation - Unify IP tooltip styling across hosts and guests with detailed network info - Improve column visibility defaults and sticky column handling - Various component refinements for Dashboard, Storage, and Backups views	2025-12-12 08:26:36 +00:00
rcourtman	1e3fdb6f63	feat(ai): Enhanced AI patrol system with alert triggers and history persistence - Add alert-triggered AI analysis for real-time incident response - Implement patrol history persistence across restarts - Add patrol schedule configuration UI in AI Settings - Enhance AIChat with patrol status and manual trigger controls - Add resource store improvements for AI context building - Expand Alerts page with AI-powered analysis integration - Add Vite proxy config for AI API endpoints - Support both Anthropic and OpenAI providers with streaming	2025-12-10 21:08:22 +00:00
rcourtman	5a15a1820b	fix(sensor-proxy): Make nodeGate.acquire() context-aware to prevent goroutine leaks The acquire() function blocked indefinitely without respecting context cancellation. When clients disconnect while waiting for the per-node lock, goroutines would remain blocked forever, connections accumulate in CLOSE_WAIT state, and rate limiter semaphores are never released. Added acquireContext() that respects context cancellation and updated both HTTP and RPC handlers to use it. This prevents: - Goroutine leaks from cancelled requests - CLOSE_WAIT connection accumulation - Cascading failures from filled semaphores Related to #832	2025-12-10 20:14:28 +00:00
rcourtman	ae7b66ecff	refactor(ai): Remove over-engineered URL discovery service Keep only the simple AI-powered approach: - set_resource_url tool lets AI save discovered URLs - Users ask AI directly: 'Find URLs for my containers' - AI uses its intelligence to discover and set URLs Removed: - URLDiscoveryService (rigid port scanning) - Bulk discovery API endpoints - Frontend discovery button The AI itself is smart enough to iterate through resources and discover URLs when asked.	2025-12-10 08:35:24 +00:00
rcourtman	927ac76bad	feat: AI integration, Docker metrics, RAID display, and infrastructure improvements - Add Claude OAuth authentication support with hybrid API key/OAuth flow - Implement Docker container historical metrics in backend and charts API - Add CEPH cluster data collection and new Ceph page - Enhance RAID status display with detailed tooltips and visual indicators - Fix host deduplication logic with Docker bridge IP filtering - Fix NVMe temperature collection in host agent - Add comprehensive test coverage for new features - Improve frontend sparklines and metrics history handling - Fix navigation issues and frontend reload loops	2025-12-09 09:29:27 +00:00
rcourtman	e1e83c8295	fix: complete unified resources WebSocket integration Backend: - Call SetMonitor after router creation to inject resource store - Add debug logging for resource population and broadcast Frontend: - Add resources array to WebSocket store initial state - Handle resources in WebSocket message processing - Use reconcile for efficient state updates The unified resources are now properly: 1. Populated from StateSnapshot on each broadcast cycle 2. Converted to frontend format (ResourceFrontend) 3. Included in WebSocket state messages 4. Received and stored in frontend state 5. Consumed by migrated route components Console now shows '[DashboardView] Using unified resources: VMs: X' confirming the migration is working end-to-end.	2025-12-07 23:52:00 +00:00
rcourtman	bcd7b550d4	AI Problem Solver implementation and various fixes - Implement 'Show Problems Only' toggle combining degraded status, high CPU/memory alerts, and needs backup filters - Add 'Investigate with AI' button to filter bar for problematic guests - Fix dashboard column sizing inconsistencies between bars and sparklines view modes - Fix PBS backups display and polling - Refine AI prompt for general-purpose usage - Fix frontend flickering and reload loops during initial load - Integrate persistent SQLite metrics store with Monitor - Fortify AI command routing with improved validation and logging - Fix CSRF token handling for note deletion - Debug and fix AI command execution issues - Various AI reliability improvements and command safety enhancements	2025-12-06 23:46:08 +00:00
rcourtman	8948e84fe5	feat: AI features, agent improvements, and host monitoring enhancements AI Chat Integration: - Multi-provider support (Anthropic, OpenAI, Ollama) - Streaming responses with markdown rendering - Agent command execution for remote troubleshooting - Context-aware conversations with host/container metadata Agent Updates: - Add --enable-proxmox flag for automatic PVE/PBS token setup - Improve auto-update with semver comparison (prevents downgrades) - Add updatedFrom tracking to report previous version after update - Reduce initial update check delay from 30s to 5s - Add agent version column to Hosts page table Host Metrics: - Add DiskIO stats collection (read/write bytes, ops, time) - Improve disk filtering to exclude Docker overlay mounts - Add RAID array monitoring via mdadm - Enhanced temperature sensor parsing Frontend: - New Agent Version column on Hosts overview table - Improved node modal with agent-first installation flow - Add DiskIO display in host drawer - Better responsive handling for metric bars	2025-12-05 10:37:02 +00:00
rcourtman	53d7776d6b	wip: AI chat integration with multi-provider support - Add AI service with Anthropic, OpenAI, and Ollama providers - Add AI chat UI component with streaming responses - Add AI settings page for configuration - Add agent exec framework for command execution - Add API endpoints for AI chat and configuration	2025-12-04 20:16:53 +00:00
rcourtman	59b713d176	fix: Force tcp4 network for IPv4 addresses in sensor-proxy HTTP mode On dual-stack systems with net.ipv6.bindv6only=1 (like some Proxmox 8 configurations), Go's net.Listen("tcp", "0.0.0.0:8443") may still bind to IPv6-only. This caused IPv4 localhost connections to hang while IPv6 worked. Fix by detecting IPv4 addresses and explicitly using "tcp4" network type when creating the listener. Related to #805	2025-12-04 20:09:37 +00:00
rcourtman	7d733db3a8	fix: Default sensor-proxy HTTP to 0.0.0.0:8443 for IPv4 binding On systems with net.ipv6.bindv6only=1 (including some Proxmox 8 configurations), using ":8443" results in IPv6-only binding. Users reported curl to 127.0.0.1:8443 hanging while [::1]:8443 worked. Changed default from ":8443" to "0.0.0.0:8443" to explicitly bind IPv4. Related to #805	2025-12-03 20:25:08 +00:00
rcourtman	7fc15417e4	Add health/metrics server and proper cleanup to unified agent - Add /healthz (liveness) and /readyz (readiness) endpoints - Add /metrics endpoint with Prometheus metrics (pulse_agent_info, pulse_agent_up) - Properly call dockerAgent.Close() on shutdown - New config: -health-addr flag and PULSE_HEALTH_ADDR env (default :9191) - Set to empty string to disable health server	2025-12-02 22:42:05 +00:00
rcourtman	4f824ab148	style: Apply gofmt to 37 files Standardize code formatting across test files and monitor.go. No functional changes.	2025-12-02 17:21:48 +00:00
rcourtman	97672b4701	Add unit tests for validation utility functions (pulse-sensor-proxy) Add 42 test cases for security-critical validation utility functions: - TestStripNodeDelimiters (10 cases): IPv6 bracket handling, edge cases - TestParseNodeIP (10 cases): IPv4/IPv6 parsing with bracket support - TestNormalizeAllowlistEntry (11 cases): case normalization, whitespace handling, IPv6 full form compression - TestIPAllowed (11 cases): CIDR matching, hosts map lookup, nil handling These functions are used for node allowlist validation to prevent SSRF attacks in the sensor proxy.	2025-11-30 17:04:43 +00:00
rcourtman	786a78af85	Add unit tests for dedupeUint32 and dedupeStrings (pulse-sensor-proxy)	2025-11-30 15:26:42 +00:00
rcourtman	48af5615b9	Add unit tests for pulse-sensor-proxy utility functions - hashIPToUID: 11 test cases covering IP hashing for rate limiting (determinism, range bounds, collision detection, boundary values) - extractNodesFromYAML: 17 test cases covering YAML node list parsing (map format, list format, mixed types, edge cases) First test files for config_cmd.go and http_server.go utilities.	2025-11-30 14:19:43 +00:00
rcourtman	9476de40a6	Add pagination to backup list for large backup counts - Add pagination (100 items per page) to prevent UI lockup with 2500+ backups - Show year in date labels for non-current year backups - Reset to page 1 when filters change - Add First/Previous/Next/Last navigation controls Fixes #541	2025-11-30 09:55:01 +00:00
rcourtman	5c90fb102b	Add unit tests for pulse-agent utility functions (unified agent) 53 test cases covering 4 functions: - gatherTags: environment/flag tag merging (18 cases) - parseLogLevel: log level parsing (30 cases) - defaultLogLevel: default value resolution (10 cases) - multiValue: flag.Value interface for repeatable flags (6 cases) Key difference from pulse-host-agent: parseLogLevel in unified agent delegates directly to zerolog.ParseLevel without range validation, so trace/fatal/panic levels are accepted (unlike pulse-host-agent which restricts to debug-error). First test file for cmd/pulse-agent package.	2025-11-30 08:34:09 +00:00
rcourtman	048708b2c7	Add unit tests for pulse-host-agent utility functions 60 test cases for gatherTags, parseLogLevel, defaultLogLevel, multiValue. First test file for cmd/pulse-host-agent package.	2025-11-30 08:17:18 +00:00
rcourtman	a3b64d4c00	Add unit tests for parseLogLevel and splitStringList (dockeragent)	2025-11-30 06:50:09 +00:00
rcourtman	f9a4df2e5a	Add unit tests for sanitizeNodeLabel (pulse-sensor-proxy/metrics.go)	2025-11-30 04:50:51 +00:00
rcourtman	deb5c3cd23	Add unit tests for pulse-sensor-proxy capability functions Test Capability.Has, parseCapabilityList, capabilityNames, and constant values. 54 test cases covering bitmask operations, parsing, case insensitivity, whitespace handling, unknown values, and round-trip consistency.	2025-11-30 03:32:53 +00:00
rcourtman	dbf32baaa0	Rebuild agent token bindings on API token config reload When api_tokens.json is modified on disk, the ConfigWatcher reloads the tokens into memory. However, the Monitor's dockerTokenBindings and hostTokenBindings maps were not synchronized with the new token set, causing orphaned bindings when agents reconnect after reinstall. Add SetAPITokenReloadCallback to ConfigWatcher that triggers Monitor's new RebuildTokenBindings method after token reload. This method reconstructs the binding maps from current Docker host and host agent state, keeping only bindings for tokens that still exist in config. Related to #773	2025-11-29 14:09:30 +00:00

1 2 3 4

152 Commits