Commit Graph

142 Commits

Author SHA1 Message Date
rcourtman
0ddbf37c59 feat(auth): add policy evaluator and SQLite auth manager for RBAC
- Add policy evaluator for fine-grained access control
- Implement SQLite-backed auth manager for user/role persistence
- Support role-based permissions evaluation
2026-01-12 15:20:49 +00:00
rcourtman
d0ba203203 feat(audit): add comprehensive audit logging system
- Add SQLite-backed audit logger for persistent audit trails
- Implement cryptographic signing for tamper detection
- Add audit log export functionality
- Add webhook notifications for audit events
2026-01-12 15:20:33 +00:00
rcourtman
b2a6cd0fa3 fix(agent): add FreeBSD platform support to agent download and UI (#1051)
- Add freebsd-amd64 and freebsd-arm64 to normalizeUnifiedAgentArch()
  so the download endpoint serves FreeBSD binaries when requested
- Add FreeBSD/pfSense/OPNsense platform option to agent setup UI
  with note about bash installation requirement
- Add FreeBSD test cases to unified_agent_test.go

Fixes installation on pfSense/OPNsense where users were getting 404
errors because the backend didn't recognize the freebsd-amd64 arch
parameter from install.sh.
2026-01-11 23:51:12 +00:00
rcourtman
f527e6ebd0 docs: fix Kubernetes DaemonSet deployment guide
Fixes #1091 - addresses all three documentation issues reported:

1. Binary path: Changed from /usr/local/bin/pulse-agent (which doesn't
   exist in the main image) to /opt/pulse/bin/pulse-agent-linux-amd64

2. PULSE_AGENT_ID: Added to example and documented why it's required
   for DaemonSets (prevents token conflicts when all pods share one
   API token)

3. Resource visibility flags: Added PULSE_KUBE_INCLUDE_ALL_PODS and
   PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS to example, with explanation
   of the default behavior (show only problematic resources)

Also added tolerations, resource requests/limits, and ARM64 note.
2026-01-11 21:43:23 +00:00
rcourtman
80444a9022 fix(monitor): use cluster quorum status instead of endpoint count for health
Previously, when some cluster endpoints were unreachable (e.g., backup
nodes intentionally offline), the cluster was marked as "degraded" even
though the Proxmox cluster itself was healthy and had quorum.

Now the connection health check queries the Proxmox cluster's actual
quorum status. A cluster is only marked "degraded" if it has lost
quorum (not enough votes for consensus), which is the actual indicator
of cluster instability.

This means:
- Cluster with quorum + some nodes offline = "healthy"
- Cluster without quorum = "degraded" (warning)
- All endpoints down = "error"

Fixes #1085
2026-01-11 11:54:02 +00:00
rcourtman
55f5f071ed fix: replace hallucinated upgrade URLs with correct pulserelay.pro
Previous LLM sessions incorrectly inserted fake URLs (pulse.sh/pro and
yourpulse.io/pro) for the Pro upgrade links. Neither domain exists.

Replaced all 34 instances with the correct URL: https://pulserelay.pro/

Fixes #1077
2026-01-10 22:45:40 +00:00
rcourtman
d6554a0d87 fix(frontend): expand agent tables to full width in full-width mode
Tables in Settings → Agents were not expanding to fill the container
width when full-width mode was enabled. Added `w-full` class to all
tables (Managed Agents, Kubernetes Clusters, and removed host tables)
so they properly expand in full-width layouts.

Fixes #1080
2026-01-10 22:45:40 +00:00
rcourtman
668cdf3393 feat(license): add audit_logging, advanced_sso, advanced_reporting to Pro tier
Major changes:
- Add audit_logging, advanced_sso, advanced_reporting features to Pro tier
- Persist session username for RBAC authorization after restart
- Add hot-dev auto-detection for pulse-pro binary (enables SQLite audit logging)

Frontend improvements:
- Replace isEnterprise() with hasFeature() for granular feature gating
- Update AuditLogPanel, OIDCPanel, RolesPanel, UserAssignmentsPanel, AISettings
- Update AuditWebhookPanel to use hasFeature('audit_logging')

Backend changes:
- Session store now persists and restores username field
- Update CreateSession/CreateOIDCSession to accept username parameter
- GetSessionUsername falls back to persisted username after restart

Testing:
- Update license_test.go to reflect Pro tier feature changes
- Update session tests for new username parameter
2026-01-10 12:55:02 +00:00
rcourtman
2a8f55d719 feat(enterprise): add Advanced Reporting and Audit Webhooks integration
This commit adds enterprise-grade reporting and audit capabilities:

Reporting:
- Refactored metrics store from internal/ to pkg/ for enterprise access
- Added pkg/reporting with shared interfaces for report generation
- Created API endpoint: GET /api/admin/reports/generate
- New ReportingPanel.tsx for PDF/CSV report configuration

Audit Webhooks:
- Extended pkg/audit with webhook URL management interface
- Added API endpoint: GET/POST /api/admin/webhooks/audit
- New AuditWebhookPanel.tsx for webhook configuration
- Updated Settings.tsx with Reporting and Webhooks tabs

Server Hardening:
- Enterprise hooks now execute outside mutex with panic recovery
- Removed dbPath from metrics Stats API to prevent path disclosure
- Added storage metrics persistence to polling loop

Documentation:
- Updated README.md feature table
- Updated docs/API.md with new endpoints
- Updated docs/PULSE_PRO.md with feature descriptions
- Updated docs/WEBHOOKS.md with audit webhooks section
2026-01-09 21:31:49 +00:00
rcourtman
92c150e979 feat(rbac): add OIDC group mapping tests and audit logging for RBAC actions 2026-01-09 19:25:33 +00:00
rcourtman
6ed1fdf806 feat(rbac): implement RBAC UI, OIDC group mapping, and API standard auth
- Added Roles and Users settings panels
- Implemented OIDC group-to-role mappings in config and auth flow
- Standardized API token context handling via pkg/auth
- Added Pulse Pro branding and upgrade banners to RBAC features
- Cleanup: Removed empty code blocks and fixed lint errors
2026-01-09 19:16:34 +00:00
rcourtman
3e2824a7ff feat: remove Enterprise badges, simplify Pro upgrade prompts
- Replace barrel import in AuditLogPanel.tsx to fix ad-blocker crash
- Remove all Enterprise/Pro badges from nav and feature headers
- Simplify upgrade CTAs to clean 'Upgrade to Pro' links
- Update docs: PULSE_PRO.md, API.md, README.md, SECURITY.md
- Align terminology: single Pro tier, no separate Enterprise tier

Also includes prior refactoring:
- Move auth package to pkg/auth for enterprise reuse
- Export server functions for testability
- Stabilize CLI tests
2026-01-09 16:51:08 +00:00
rcourtman
bd1df9f942 feat: automatic subnet preference for cluster node discovery
When discovering cluster nodes, Pulse now automatically prefers IPs
on the same subnet as the initial connection. This fixes the common
issue where Pulse used internal cluster network IPs (e.g., 172.x.x.x)
instead of management network IPs (e.g., 10.x.x.x).

How it works:
1. Extract subnet from initial connection URL (assumes /24 for IPv4)
2. For each discovered node, query /nodes/{node}/network for all IPs
3. If cluster-reported IP is on a different subnet, find an IP on
   the preferred subnet and set it as IPOverride
4. Manual IPOverride settings are preserved and take precedence

This eliminates the need for manual IPOverride configuration in most
multi-network Proxmox setups.

Refs #929, #1066
2026-01-08 23:12:30 +00:00
rcourtman
73c5128a87 feat(audit): Add audit log API endpoints and UI with signature verification
- Add GET /api/audit endpoint for listing events with filters
- Add GET /api/audit/:id/verify endpoint for signature verification
- Add AuditLogPanel UI component with filtering and verification
- Update docs with audit API documentation
- Add localStorage utils for persisting UI state
- Update gitignore patterns
2026-01-08 19:19:57 +00:00
rcourtman
7342191075 docs: fix Helm chart install commands to use GitHub Pages repo
The GHCR OCI registry (ghcr.io/rcourtman/pulse-chart) is returning 403/404
errors for unauthenticated users. Updated all Helm references to use the
working GitHub Pages Helm repository at https://rcourtman.github.io/Pulse

Fixes install issues reported by customers trying to deploy via Helm.

Files updated:
- docs/KUBERNETES.md
- docs/INSTALL.md
- docs/DEPLOYMENT_MODELS.md
- docs/UPGRADE_v5.md
2026-01-08 14:27:45 +00:00
rcourtman
d0191d136f fix: Add configurable poll timeout and handle external Ceph storage
Changes:
1. Add MAX_POLL_TIMEOUT env var for large Proxmox clusters that need
   more than 3 minutes for polling (default: 3m, minimum: 30s)
2. Handle external Ceph storage gracefully - don't mark nodes unhealthy
   when Proxmox returns 'binary not installed' (e.g., for Ceph not
   managed by Proxmox)

Related to #965
2026-01-05 23:34:33 +00:00
rcourtman
45d4d68127 fix: Add debug logging and response format handling for replication status
- Add comprehensive debug logging to diagnose replication status fetch failures
- Handle both array and single-object response formats from Proxmox API
- Log raw response body for easier debugging
- Log success/failure for each enrichment step

This helps diagnose issue #992 where replication last/next sync times aren't
showing. The logging will reveal if the API call is failing, returning empty
data, or returning data in an unexpected format.

Related to #992
2026-01-04 15:01:32 +00:00
rcourtman
90cce6d51b test(monitoring): fix failing snapshot tests and improve coverage
- Fix TestMonitor_PollGuestSnapshots_Coverage by correctly initializing State ID fields
- Improve PBS client to handle alternative datastore metric fields (total-space, etc.)
- Add comprehensive test coverage for PBS polling, auth failures, and datastore metrics
- Add various coverage tests for monitoring, alerts, and metadata handling
- Refactor Monitor to support better testing of client creation and auth handling
2026-01-04 10:29:40 +00:00
rcourtman
adba448419 fix(pbs): correct API paths and achieve >95% test coverage 2026-01-03 20:45:36 +00:00
rcourtman
4cd3e53c3e test: add regression tests for missing frontend fields
Ensures that LinkedHostAgentId, CommandsEnabled, IsLegacy, and LinkedNodeId
are correctly propagated to the frontend. This prevents regressions of the
bugs fixed for #952 and #971.
2026-01-02 20:45:35 +00:00
rcourtman
3fdf753a5b Enhance devcontainer and CI workflows
- Add persistent volume mounts for Go/npm caches (faster rebuilds)
- Add shell config with helpful aliases and custom prompt
- Add comprehensive devcontainer documentation
- Add pre-commit hooks for Go formatting and linting
- Use go-version-file in CI workflows instead of hardcoded versions
- Simplify docker compose commands with --wait flag
- Add gitignore entries for devcontainer auth files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 22:29:15 +00:00
rcourtman
567a4ad147 fix(replication): fetch status from per-node endpoint
The /cluster/replication endpoint only returns job configuration (guest,
schedule, source, target), not status data (last_sync, next_sync,
duration, fail_count, state).

This fix enriches each replication job with status from the per-node
endpoint /nodes/{node}/replication/{id}/status to get timing and state
data needed for proper UI display.

Added integration tests to verify:
- Status endpoint is called and data is merged correctly
- Graceful handling when status endpoint fails

Fixes #992
2025-12-31 23:58:06 +00:00
rcourtman
4225f905b0 feat: Add manual Docker update check button. Related to #955 2025-12-29 23:37:05 +00:00
rcourtman
03e9f98ab6 fix: Exclude autofs mount type from disk counts. Related to #942 2025-12-29 23:36:58 +00:00
rcourtman
32111c7837 feat: Add --report-ip flag for multi-NIC systems (issue #945)
Allows specifying which IP address the agent should report, useful for:
- Multi-homed systems with separate management networks
- Systems with private monitoring interfaces
- VPN/overlay network scenarios

Usage:
  pulse-agent --report-ip 192.168.1.100
  PULSE_REPORT_IP=192.168.1.100 pulse-agent
2025-12-29 09:28:28 +00:00
rcourtman
35a83afbcb fix: Filter overlay filesystems from disk metrics
Docker overlay filesystems were being counted as separate disks when
monitoring hosts running Docker. These are virtual layers, not actual
storage.

Added overlay and overlayfs to the virtualFSTypes list so they are
always excluded from disk usage calculations, regardless of their
reported usage percentage.

NFS and CIFS mounts were already being filtered correctly.

Related to #942
2025-12-28 16:12:18 +00:00
rcourtman
9f3367da36 fix: Include GuestURL in NodeFrontend for cluster node navigation
The GuestURL field was missing from NodeFrontend and its converter,
causing configured Guest URLs to be ignored when clicking on cluster
node names. The frontend would fall back to the auto-detected IP
instead of using the user-configured Guest URL.

Related to #940
2025-12-28 14:49:49 +00:00
rcourtman
b50872b686 feat: Implement unified update detection system (Phase 1)
Docker container image update detection with full stack implementation:

Backend:
- Add internal/updatedetection package with types, store, registry checker, manager
- Add registry checking to Docker agent (internal/dockeragent/registry.go)
- Add ImageDigest and UpdateStatus fields to container reports
- Add /api/infra-updates API endpoints for querying updates
- Integrate with alert system - fires after 24h of pending updates

Frontend:
- Add UpdateBadge and UpdateIcon components for update indicators
- Add updateStatus to DockerContainer TypeScript interface
- Display blue update badges in Docker unified table image column
- Add 'has:update' search filter support

Features:
- Registry digest comparison for Docker Hub, GHCR, private registries
- Auth token handling for Docker Hub public images
- Caching with 6h TTL (15min for errors)
- Configurable alert delay via UpdateAlertDelayHours (default: 24h)
- Alert metadata includes digests, pending time, image info
2025-12-27 17:58:38 +00:00
rcourtman
3d671c1824 feat(pbs): add API-based token creation for turnkey PBS setup
- Added PBS client methods: CreateUser, SetUserACL, CreateUserToken
- Added SetupMonitoringAccess() turnkey method that creates user + token
- Updated handleSecureAutoRegister to use PBS API for token creation
- Enables one-click PBS setup for Docker/containerized deployments

When users provide PBS root credentials, Pulse can now create the
monitoring user and API token remotely via the PBS API, eliminating
the need to SSH/exec into the container manually.
2025-12-26 10:08:41 +00:00
rcourtman
3fd20340d1 fix: increase PBS storage content timeout to 60s
PBS storage content queries with encrypted backups can take 10-20+ seconds
to enumerate. The previous 30s timeout was causing intermittent failures
when polling backup data from PBS storage configured in PVE.

This increases the timeout to 60s to accommodate slow PBS backends while
still preventing indefinite hangs on unavailable NFS/network storage.
2025-12-26 00:21:17 +00:00
rcourtman
86e41effc0 feat: Display environment variables for Docker containers
- Add Env field to Container struct in pkg/agents/docker/report.go
- Extract env vars from inspect.Config.Env in Docker agent
- Mask sensitive values (password, secret, key, token, etc.) with ***
- Display env vars in container drawer with green badges (amber for masked)
- Add tests for maskSensitiveEnvVars function

Related to #916
2025-12-25 23:52:57 +00:00
rcourtman
08c04b78ae feat: add power consumption monitoring (Intel RAPL + AMD Energy)
- Add power.go with Intel RAPL and AMD energy driver support
- Read CPU package, core, and DRAM power consumption in watts
- Sample energy counters over 100ms interval to calculate power
- Add PowerWatts field to Sensors struct for API reporting
- Integrate power collection into host agent sensor gathering
- Add comprehensive tests for power collection module

Supports Intel CPUs (Sandy Bridge+) via RAPL and AMD Ryzen/EPYC
via the amd_energy kernel module.

Closes community-scripts/ProxmoxVE#9575
2025-12-25 21:14:12 +00:00
rcourtman
c1422882bd feat: Add disk exclusion filter for host agent. Closes #896
Users can now exclude specific mount points from disk monitoring:
- Via CLI: --disk-exclude /mnt/backup --disk-exclude '/media/*'
- Via env: PULSE_DISK_EXCLUDE=/mnt/backup,*pbs*

Patterns support:
- Exact paths: /mnt/backup
- Prefix patterns: /mnt/ext*
- Contains patterns: *pbs*

This addresses the common case where external disks or
PBS datastores are being monitored but shouldn't be.
2025-12-25 12:04:40 +00:00
rcourtman
8f9d5c1120 feat: Agent collects S.M.A.R.T. disk data via smartctl. Related to #907
- Add smartctl package to collect disk temperature and health data
- Add SMART field to agent Sensors struct
- Host agent now runs smartctl to collect disk temps when available
- Backend processes agent SMART data for temperature display
- Graceful fallback when smartctl not installed
2025-12-25 11:37:53 +00:00
rcourtman
598285d3d2 feat: Agent reports CommandsEnabled status to server. Related to #903
- Add CommandsEnabled field to AgentInfo in pkg/agents/host/report.go
- Agent now reports whether AI command execution is enabled
- Server stores and exposes this via Host model
- Frontend can now show which agents have commands enabled
- This provides visibility before implementing remote configuration
2025-12-25 07:55:22 +00:00
rcourtman
e0dc6695fc fix: Per-node TLS fingerprints for cluster peers (TOFU)
When a PVE cluster has unique self-signed certificates on each node, Pulse
would mark secondary nodes as unhealthy because only the primary node's
fingerprint was used for all connections.

Now, during cluster discovery, Pulse captures each node's TLS fingerprint
and uses it when connecting to that specific node. This enables
"Trust On First Use" (TOFU) for clusters with unique per-node certs.

Changes:
- Add Fingerprint field to ClusterEndpoint config
- Add FetchFingerprint() to tlsutil for capturing node certs
- validateNodeAPI() now captures and returns fingerprints during discovery
- NewClusterClient() accepts endpointFingerprints map for per-node certs
- All client creation paths use per-endpoint fingerprints when available

Related to #879
2025-12-24 10:05:03 +00:00
rcourtman
d663ba4342 hostagent: avoid host ID collisions and prefer LAN IP 2025-12-17 16:29:59 +00:00
rcourtman
e44a6fdadd test(envdetect): cover environment detection decisions 2025-12-17 16:08:10 +00:00
rcourtman
969fa0e509 test: add unit tests for AI, Kubernetes agent, and clients 2025-12-17 12:47:36 +00:00
rcourtman
a115af6906 feat: Improve cluster endpoint error messages for users
- Add sanitizeEndpointError() to transform raw Go errors into user-friendly messages
- Transform 'context deadline exceeded' into helpful messages mentioning possible causes
- Storage timeout errors now suggest checking PBS/NFS/Ceph backend connectivity
- Connection refused, certificate errors, and auth errors get actionable hints
- Apply sanitization everywhere cluster endpoint lastError is stored
- Add comprehensive tests for all error transformations
2025-12-16 21:50:02 +00:00
rcourtman
3a2a73f9d6 Merge main into ai-features: incorporate latest bugfixes
Resolved conflicts:
- pkg/fsfilters/filters.go: Keep both TrueNAS and EnhanceCP filter fixes
- DockerUnifiedTable.tsx: Use main's resource column overlap fix
2025-12-13 15:18:51 +00:00
rcourtman
a259b67348 feat: add Kubernetes platform support 2025-12-12 21:31:11 +00:00
rcourtman
88d419dd5b feat(ai): Add enriched context with historical trends and predictions
Phase 1 of Pulse AI differentiation:

- Create internal/ai/context package with types, trends, builder, formatter
- Implement linear regression for trend computation (growing/declining/stable/volatile)
- Add storage capacity predictions (predicts days until 90% and 100%)
- Wire MetricsHistory from monitor to patrol service
- Update patrol to use buildEnrichedContext instead of basic summary
- Update patrol prompt to reference trend indicators and predictions

This gives the AI awareness of historical patterns, enabling it to:
- Identify resources with concerning growth rates
- Predict capacity exhaustion before it happens
- Distinguish between stable high usage vs growing problems
- Provide more actionable, time-aware insights

All tests passing. Falls back to basic summary if metrics history unavailable.
2025-12-12 09:45:57 +00:00
rcourtman
fa13919987 fix(ai-chat): Display messages chronologically in AI chatbot
- Add 'content' type to StreamDisplayEvent for tracking text chunks
- Track content events in streamEvents array for chronological display
- Update render to use Switch/Match for cleaner conditional rendering
- Interleave thinking, tool calls, and content as they stream in
- Add fallback for old messages without streamEvents for backwards compat

Previously, tool/command outputs stayed at top while AI text responses
accumulated at the bottom. Now all events appear in order like a
normal chatbot.
2025-12-11 23:02:59 +00:00
rcourtman
927ac76bad feat: AI integration, Docker metrics, RAID display, and infrastructure improvements
- Add Claude OAuth authentication support with hybrid API key/OAuth flow
- Implement Docker container historical metrics in backend and charts API
- Add CEPH cluster data collection and new Ceph page
- Enhance RAID status display with detailed tooltips and visual indicators
- Fix host deduplication logic with Docker bridge IP filtering
- Fix NVMe temperature collection in host agent
- Add comprehensive test coverage for new features
- Improve frontend sparklines and metrics history handling
- Fix navigation issues and frontend reload loops
2025-12-09 09:29:27 +00:00
rcourtman
8948e84fe5 feat: AI features, agent improvements, and host monitoring enhancements
AI Chat Integration:
- Multi-provider support (Anthropic, OpenAI, Ollama)
- Streaming responses with markdown rendering
- Agent command execution for remote troubleshooting
- Context-aware conversations with host/container metadata

Agent Updates:
- Add --enable-proxmox flag for automatic PVE/PBS token setup
- Improve auto-update with semver comparison (prevents downgrades)
- Add updatedFrom tracking to report previous version after update
- Reduce initial update check delay from 30s to 5s
- Add agent version column to Hosts page table

Host Metrics:
- Add DiskIO stats collection (read/write bytes, ops, time)
- Improve disk filtering to exclude Docker overlay mounts
- Add RAID array monitoring via mdadm
- Enhanced temperature sensor parsing

Frontend:
- New Agent Version column on Hosts overview table
- Improved node modal with agent-first installation flow
- Add DiskIO display in host drawer
- Better responsive handling for metric bars
2025-12-05 10:37:02 +00:00
rcourtman
63038b5f30 fix: Filter EnhanceCP /var/container_tmp overlay mounts from disk stats
EnhanceCP uses /var/container_tmp/{uuid}/merged for container overlays.
These are ephemeral container layers, not user storage, and should be
filtered from disk usage display. Related to #790
2025-12-04 20:11:10 +00:00
rcourtman
da51449392 fix: Exclude TrueNAS Docker overlay mounts from disk stats
Host agent was including Docker overlay2 mounts from TrueNAS SCALE's
.ix-apps directory in disk totals. These mounts inherit the ZFS pool's
AVAIL space, causing massively inflated storage numbers (e.g., 173 TB
per container overlay instead of actual usage).

Changes:
- Add /mnt/.ix-apps/docker/ to container overlay path exclusions
- Use ShouldSkipFilesystem() in host agent disk collection (was only
  using ShouldIgnoreReadOnlyFilesystem() which missed container paths)
- Add test cases for TrueNAS overlay paths

Related to #718
2025-12-04 03:03:04 +00:00
rcourtman
4c98933175 fix: Filter container overlay mounts in non-standard locations
Detect container overlay filesystem paths from various container runtimes
(Docker, Podman, LXC, EnhanceCP, etc.) that may not be in standard
/var/lib/docker or /var/lib/containers locations.

Paths containing /containers/ with overlay patterns (/overlay2/, /overlay/,
/diff/, /merged) are now filtered from disk usage aggregation.

Related to #790
2025-12-03 14:06:15 +00:00
rcourtman
4f824ab148 style: Apply gofmt to 37 files
Standardize code formatting across test files and monitor.go.
No functional changes.
2025-12-02 17:21:48 +00:00