Commit Graph

2335 Commits

Author SHA1 Message Date
rcourtman
80729408c1 docs: add RBAC endpoints, OIDC group mapping, and update Pro terminology
- Add RBAC/role management endpoints to API.md
- Document OIDC group-to-role mapping feature in OIDC.md
- Add missing config files to CONFIGURATION.md (audit.db, AI files)
- Add OIDC_GROUP_ROLE_MAPPINGS env var documentation
- Fix "enterprise" -> "Pro" terminology in TROUBLESHOOTING.md
- Refocus TEMPERATURE_MONITORING.md on agent method, collapse legacy proxy docs
2026-01-10 13:59:50 +00:00
rcourtman
a970a6e5ee fix(lint): prefix unused err variable with underscore 2026-01-10 12:55:02 +00:00
rcourtman
4e11022425 refactor(ui): remove user-facing 'enterprise' terminology
- Replace 'enterprise authentication' with 'team authentication'
- Replace 'Enterprise Insights' with 'Advanced Insights'
- Deprecate isEnterprise() in favor of isPro() and hasFeature()
- Update Settings.tsx to use isPro() for badge visibility
2026-01-10 12:55:02 +00:00
rcourtman
2246aee35f chore: replace 'enterprise' terminology with 'Pro' in hot-dev docs 2026-01-10 12:55:02 +00:00
rcourtman
668cdf3393 feat(license): add audit_logging, advanced_sso, advanced_reporting to Pro tier
Major changes:
- Add audit_logging, advanced_sso, advanced_reporting features to Pro tier
- Persist session username for RBAC authorization after restart
- Add hot-dev auto-detection for pulse-pro binary (enables SQLite audit logging)

Frontend improvements:
- Replace isEnterprise() with hasFeature() for granular feature gating
- Update AuditLogPanel, OIDCPanel, RolesPanel, UserAssignmentsPanel, AISettings
- Update AuditWebhookPanel to use hasFeature('audit_logging')

Backend changes:
- Session store now persists and restores username field
- Update CreateSession/CreateOIDCSession to accept username parameter
- GetSessionUsername falls back to persisted username after restart

Testing:
- Update license_test.go to reflect Pro tier feature changes
- Update session tests for new username parameter
2026-01-10 12:55:02 +00:00
rcourtman
cba2e8609d Auto-update Helm chart version to 5.0.13 helm-chart-5.0.13 2026-01-10 08:47:24 +00:00
rcourtman
773efb4b19 Auto-update Helm chart documentation 2026-01-10 08:47:22 +00:00
rcourtman
9a59c4459b fix(workflow): build frontend before building backend in demo deployment v5.0.13 2026-01-10 00:41:00 +00:00
rcourtman
486ee29bc8 chore: bump version to 5.0.13 and fix test mocks 2026-01-10 00:27:11 +00:00
rcourtman
07b4765b8d fix: respect quiet hours for recovery notifications (#1068)
Recovery notifications were bypassing the quiet hours check, causing
users to receive recovery alerts during their configured quiet hours
window even though the original "down" alerts were suppressed.

- Add ShouldSuppressResolvedNotification() to alert manager
- Check quiet hours before sending recovery notifications in monitor
- Recovery notifications now follow same suppression rules as alerts
2026-01-09 21:47:36 +00:00
rcourtman
2a8f55d719 feat(enterprise): add Advanced Reporting and Audit Webhooks integration
This commit adds enterprise-grade reporting and audit capabilities:

Reporting:
- Refactored metrics store from internal/ to pkg/ for enterprise access
- Added pkg/reporting with shared interfaces for report generation
- Created API endpoint: GET /api/admin/reports/generate
- New ReportingPanel.tsx for PDF/CSV report configuration

Audit Webhooks:
- Extended pkg/audit with webhook URL management interface
- Added API endpoint: GET/POST /api/admin/webhooks/audit
- New AuditWebhookPanel.tsx for webhook configuration
- Updated Settings.tsx with Reporting and Webhooks tabs

Server Hardening:
- Enterprise hooks now execute outside mutex with panic recovery
- Removed dbPath from metrics Stats API to prevent path disclosure
- Added storage metrics persistence to polling loop

Documentation:
- Updated README.md feature table
- Updated docs/API.md with new endpoints
- Updated docs/PULSE_PRO.md with feature descriptions
- Updated docs/WEBHOOKS.md with audit webhooks section
2026-01-09 21:31:49 +00:00
rcourtman
92c150e979 feat(rbac): add OIDC group mapping tests and audit logging for RBAC actions 2026-01-09 19:25:33 +00:00
rcourtman
6ed1fdf806 feat(rbac): implement RBAC UI, OIDC group mapping, and API standard auth
- Added Roles and Users settings panels
- Implemented OIDC group-to-role mappings in config and auth flow
- Standardized API token context handling via pkg/auth
- Added Pulse Pro branding and upgrade banners to RBAC features
- Cleanup: Removed empty code blocks and fixed lint errors
2026-01-09 19:16:34 +00:00
rcourtman
3e2824a7ff feat: remove Enterprise badges, simplify Pro upgrade prompts
- Replace barrel import in AuditLogPanel.tsx to fix ad-blocker crash
- Remove all Enterprise/Pro badges from nav and feature headers
- Simplify upgrade CTAs to clean 'Upgrade to Pro' links
- Update docs: PULSE_PRO.md, API.md, README.md, SECURITY.md
- Align terminology: single Pro tier, no separate Enterprise tier

Also includes prior refactoring:
- Move auth package to pkg/auth for enterprise reuse
- Export server functions for testability
- Stabilize CLI tests
2026-01-09 16:51:08 +00:00
rcourtman
22059210f7 fix(frontend): remove unused import and variable to satisfy hooks 2026-01-09 14:46:15 +00:00
rcourtman
5c4399d69f feat(agent): add DisableCeph toggle, report_ip remote config, and improved IP detection (#929) 2026-01-09 14:45:29 +00:00
rcourtman
6019e3e77e fix: normalize custom OpenAI-compatible API URLs (#1067)
Users providing base URLs like "https://openrouter.ai/api/v1" were
getting HTML error responses because the client used the URL directly
without appending "/chat/completions".

- Normalize baseURL in NewOpenAIClient to ensure it ends with /chat/completions
- Fix modelsEndpoint() to derive /models from the normalized baseURL
- Add tests for URL normalization with various endpoint formats
2026-01-09 09:13:36 +00:00
rcourtman
020553a12d fix: use flexible subnet matching instead of fixed /24
The previous implementation assumed /24 subnets, which failed for
larger networks (e.g., /16 or /20). Now uses progressive subnet
matching that tries /24, /20, and /16 to handle various network sizes.

Example: If connection IP is 10.1.1.5 and a node has 10.1.2.6,
it now correctly identifies them as being on the same network.
2026-01-08 23:24:50 +00:00
rcourtman
bd1df9f942 feat: automatic subnet preference for cluster node discovery
When discovering cluster nodes, Pulse now automatically prefers IPs
on the same subnet as the initial connection. This fixes the common
issue where Pulse used internal cluster network IPs (e.g., 172.x.x.x)
instead of management network IPs (e.g., 10.x.x.x).

How it works:
1. Extract subnet from initial connection URL (assumes /24 for IPv4)
2. For each discovered node, query /nodes/{node}/network for all IPs
3. If cluster-reported IP is on a different subnet, find an IP on
   the preferred subnet and set it as IPOverride
4. Manual IPOverride settings are preserved and take precedence

This eliminates the need for manual IPOverride configuration in most
multi-network Proxmox setups.

Refs #929, #1066
2026-01-08 23:12:30 +00:00
rcourtman
d5c93fd226 fix: add cluster endpoint IP override and Windows agent download support
1. Add IPOverride field to ClusterEndpoint struct
   - Allows users to specify a custom IP that takes precedence over auto-discovered IPs
   - Fixes #929 and #1066 where Pulse used internal cluster IPs instead of management IPs
   - Added EffectiveIP() method to cleanly handle the override logic

2. Update connection code to use EffectiveIP()
   - monitor.go: Use override when building endpoint URLs
   - temperature_proxy.go: Use override for proxy connections

3. Add bare Windows EXE files to GitHub releases
   - Fixes #1064 where LXC/barebone installs couldn't download Windows agents
   - Modified build-release.sh to copy EXEs alongside ZIPs
   - Added EXEs to checksum generation
2026-01-08 23:04:25 +00:00
rcourtman
568aac6bd0 fix: multiple triage fixes for stability and correctness
1. Use correct mutex (diagMu) in cleanupDiagnosticSnapshots to prevent
   "concurrent map iteration and map write" panics (Fixes #1063)

2. Use cluster name for storage instance comparison in UpdateStorageForInstance
   to prevent storage duplication in clustered Proxmox setups (Fixes #1062)

3. Fix KUBECONFIG unbound variable error in install.sh by using ${KUBECONFIG:-}
   default parameter expansion (Fixes #1065)
2026-01-08 22:54:33 +00:00
rcourtman
06ebaf50b2 fix: use consistent ID for shared storage to prevent duplication (#1049)
Shared storage was duplicating across polling cycles because the ID
included the node name of whichever node reported it first. When a
different node reported first on the next cycle, a new ID was created.

This fix updates the shared storage aggregation to use a consistent ID
format (instance-cluster-storageName) that doesn't include the node name.

Closes #1049. Thanks to @siccous for the report and initial investigation.
2026-01-08 21:29:24 +00:00
rcourtman
5f0214b949 fix: support ReportIP override in Proxmox auto-registration (#1061) 2026-01-08 21:20:51 +00:00
rcourtman
33bb0a95bb docs: Fix formatting in API reference 2026-01-08 20:15:25 +00:00
rcourtman
6de1c660b1 chore: Improve pre-commit data validation and ignore patterns 2026-01-08 20:04:02 +00:00
rcourtman
3801b7ad7a chore: Ignore husky internal directory 2026-01-08 19:37:04 +00:00
rcourtman
73c5128a87 feat(audit): Add audit log API endpoints and UI with signature verification
- Add GET /api/audit endpoint for listing events with filters
- Add GET /api/audit/:id/verify endpoint for signature verification
- Add AuditLogPanel UI component with filtering and verification
- Update docs with audit API documentation
- Add localStorage utils for persisting UI state
- Update gitignore patterns
2026-01-08 19:19:57 +00:00
rcourtman
7342191075 docs: fix Helm chart install commands to use GitHub Pages repo
The GHCR OCI registry (ghcr.io/rcourtman/pulse-chart) is returning 403/404
errors for unauthenticated users. Updated all Helm references to use the
working GitHub Pages Helm repository at https://rcourtman.github.io/Pulse

Fixes install issues reported by customers trying to deploy via Helm.

Files updated:
- docs/KUBERNETES.md
- docs/INSTALL.md
- docs/DEPLOYMENT_MODELS.md
- docs/UPGRADE_v5.md
2026-01-08 14:27:45 +00:00
rcourtman
22e01e2244 feat: Add centralized agent configuration management (Pro)
Allows administrators to create configuration profiles and assign them
to agents for centralized fleet management.

- Configuration profiles with customizable settings (Docker, K8s,
  Proxmox monitoring, log level, reporting interval)
- Profile assignment to agents by ID
- Agent-side remote config client to fetch settings on startup
- Full CRUD API at /api/admin/profiles
- Settings UI panel in Settings → Agents → Agent Profiles
- Automatic cleanup of assignments when profiles are deleted
2026-01-08 12:06:36 +00:00
rcourtman
7db6b3e47d feat: Add AI chat session sync across devices
Implements server-side persistence for AI chat sessions, allowing users
to continue conversations across devices and browser sessions. Related
to #1059.

Backend:
- Add chat session CRUD API endpoints (GET/PUT/DELETE)
- Add persistence layer with per-user session storage
- Support session cleanup for old sessions (90 days)
- Multi-user support via auth context

Frontend:
- Rewrite aiChat store with server sync (debounced)
- Add session management UI (new conversation, switch, delete)
- Local storage as fallback/cache
- Initialize sync on app startup when AI is enabled
2026-01-08 10:47:45 +00:00
rcourtman
695ced6273 docs: Add API token scopes and kiosk mode documentation
Documents all available token scopes, UI presets, and step-by-step
instructions for setting up kiosk mode with read-only dashboard tokens.

Related to #1055
2026-01-08 10:27:15 +00:00
rcourtman
f29badbd1f feat: Add kiosk mode support with read-only dashboard tokens
- Add "Kiosk / Dashboard" preset in API token manager for easy token creation
- Backend returns token scopes in /api/security/status when authenticated via token
- Frontend hides Settings tab when token lacks settings:read scope
- URL-based token auth via ?token=xxx now properly reports scopes

Users can now create a monitoring:read token and use it in kiosk displays
without exposing settings or requiring cookie persistence.

Related to #1055
2026-01-08 10:18:27 +00:00
rcourtman
49272bd48c fix: Show usable RAIDZ capacity instead of raw pool size
For RAIDZ/mirror pools, zpool list SIZE reports raw capacity (sum of
all disks), but users expect usable capacity (accounting for parity).
The dataset stats from statfs give the correct usable capacity.

Now uses dataset Total when it's smaller than zpool Size, indicating
RAIDZ/mirror overhead.

Related to #1052
2026-01-08 09:38:18 +00:00
rcourtman
8c4bef27f0 docs: improve reverse proxy HTTPS detection and Swarm troubleshooting
- Add detailed HTTPS detection troubleshooting to REVERSE_PROXY.md
- Explain X-Forwarded-Proto header requirement for nginx/Caddy/Apache
- Add Docker Swarm troubleshooting section to UNIFIED_AGENT.md
- Document how to force Docker runtime if auto-detection fails

Based on customer feedback.
2026-01-07 18:23:48 +00:00
rcourtman
e4c17777d0 feat: Add deployment strategy configuration to Helm chart
Added strategy.type option to values.yaml (default: RollingUpdate) to allow
users to configure the deployment strategy. Users with ReadWriteOnce (RWO)
persistent volumes should set this to "Recreate" to avoid Multi-Attach errors
during upgrades.

Related to #1057
2026-01-07 17:57:41 +00:00
rcourtman
95fb896a03 fix: Agent 405 errors when reverse proxy redirects HTTP to HTTPS
When a user's reverse proxy redirects HTTP to HTTPS, Go's default HTTP
client behavior converts POST requests to GET on 301/302 redirects
(per HTTP specification). This causes the Pulse server to return 405
"Only POST is allowed" errors.

Added CheckRedirect to all agent HTTP clients (host, docker, kubernetes)
that returns a clear error message guiding users to use the correct
protocol in their --url flag instead of silently following redirects.

Related to #1058
2026-01-07 17:56:07 +00:00
rcourtman
3f0808e9f9 docs: comprehensive core and Pro documentation overhaul
- Major updates to README.md and docs/README.md for Pulse v5
- Added technical deep-dives for Pulse Pro (docs/PULSE_PRO.md) and AI Patrol (docs/AI.md)
- Updated Prometheus metrics documentation and Helm schema for metrics separation
- Refreshed security, installation, and deployment documentation for unified agent models
- Cleaned up legacy summary files
2026-01-07 17:38:27 +00:00
rcourtman
9cfcdbb247 fix: Use per-node shared flag for storage deduplication
The storage deduplication logic only checked cluster config's Shared
flag, but this required the cluster config API call to succeed. When
the per-node storage API already returns shared=1 (as the user
verified), we should use that directly.

Now we check three sources for shared storage detection:
1. Per-node API shared flag (storage.Shared)
2. Cluster config shared flag (if available)
3. Storage type heuristics (NFS, RBD, PBS, etc.)

Related to #1049
2026-01-07 10:16:23 +00:00
rcourtman
dcdbee3c5c feat: Add in-app help system with HelpIcon component
Add contextual help icons throughout the UI to improve feature
discoverability. Users can click (?) icons to see explanations
with examples for settings they might not understand.

- HelpIcon component with click-to-open popover
- Centralized help content registry in /content/help/
- FeatureTip component for dismissible contextual tips
- Help added to: alert delay, AI endpoints, update channel
2026-01-07 09:22:23 +00:00
rcourtman
b75b33b9fe fix: Read form values from DOM for password manager compatibility
Password managers may fill form fields programmatically without
triggering input events, causing SolidJS signals to remain empty.
This fix reads values directly from the DOM on submit, ensuring
credentials filled by password managers are properly captured.

Related to #1036
2026-01-06 22:25:11 +00:00
rcourtman
73e6a8edc5 fix: Add missing UI for physical disk polling interval setting
The previous commit (06261627) added backend support for configurable
physical disk polling intervals but didn't include the UI to configure it.

Adds a dropdown selector (5/15/30/60 minutes) that appears when physical
disk monitoring is enabled.

Related to #1007
2026-01-06 20:32:24 +00:00
rcourtman
96d06da0d7 fix: Deduplicate shared storages (NFS, RBD, PBS, etc) in cluster view
Shared storages were appearing multiple times (once per node) because
the deduplication logic only checked the Proxmox `Shared` flag. Many
storage types are inherently cluster-wide but don't set this flag:

- RBD (Ceph block storage)
- CephFS
- PBS (Proxmox Backup Server)
- GlusterFS
- NFS
- CIFS/SMB
- iSCSI

Now we detect shared storage based on both the Shared flag AND the
storage type. Inherently shared storage types are deduplicated and
shown once with a "cluster" node designation.

Related to #1049
2026-01-06 17:44:52 +00:00
rcourtman
d3116defe3 fix: Prevent panic from send on closed websocket channel
Add atomic `closed` flag to Client struct and `safeSend()` helper method
to prevent race condition when sending to client channels. The race
occurred when a client disconnected while a goroutine was trying to send
initial state - the channel could be closed between the registration
check and the actual send.

All sends to client.send now go through safeSend() which checks the
closed flag first. The flag is set atomically before closing the channel
in all code paths (unregister, dispatchToClients, broadcast, shutdown).

Related to #1048
2026-01-06 17:41:25 +00:00
rcourtman
48fdff3efb fix: Preserve ackState for old acknowledged alerts during restore
When LoadActiveAlerts skipped acknowledged alerts older than 1 hour,
it was also not populating ackState. This meant that when the same
alert (e.g., backup-age) was recreated on the next poll cycle,
preserveAlertState couldn't find any acknowledgement record and
the alert would retrigger notifications.

Now ackState is populated even for skipped old acknowledged alerts,
so if they reappear, the acknowledgement will be restored.

Related to #1043
2026-01-06 11:00:36 +00:00
rcourtman
74ea90e4b3 fix: Podman sockets not prioritized when --docker-runtime=podman
When --docker-runtime=podman is explicitly set, the agent should try
Podman-specific sockets first before falling back to environment
defaults (which try /var/run/docker.sock).

Also adds /var/run/podman/podman.sock as a candidate socket path,
which is used by CoreOS and some Fedora configurations.

Related to #1045
2026-01-06 10:56:37 +00:00
rcourtman
d7000fafb6 fix: Empty array expansion fails on macOS bash 3.2 with set -u
macOS ships with bash 3.2 (GPLv2) which has a bug where expanding
an empty array like ${array[@]} with set -u enabled throws an
"unbound variable" error, even when the array is initialized.

Use ${arr[@]+"${arr[@]}"} pattern to safely handle empty arrays.

Related to #1046
2026-01-06 10:52:44 +00:00
rcourtman
cfcba70b2b chore: Bump version to 5.0.12 2026-01-05 23:48:57 +00:00
rcourtman
d0191d136f fix: Add configurable poll timeout and handle external Ceph storage
Changes:
1. Add MAX_POLL_TIMEOUT env var for large Proxmox clusters that need
   more than 3 minutes for polling (default: 3m, minimum: 30s)
2. Handle external Ceph storage gracefully - don't mark nodes unhealthy
   when Proxmox returns 'binary not installed' (e.g., for Ceph not
   managed by Proxmox)

Related to #965
2026-01-05 23:34:33 +00:00
rcourtman
c6182b2ed3 feat: Add FreeBSD/OPNsense support for the Pulse agent
Added FreeBSD amd64 and arm64 build targets to the release process:
- Build host-agent and unified agent binaries for FreeBSD
- Package FreeBSD tarballs in releases
- Include FreeBSD binaries in universal tarball for download endpoint

Updated agent install script with FreeBSD support:
- Fixed architecture detection (FreeBSD reports 'amd64' not 'x86_64')
- Added FreeBSD rc.d service handler with proper daemon management
- Automatic service enabling via rc.conf

This enables users to run the Pulse agent on FreeBSD-based systems
like OPNsense, pfSense, and vanilla FreeBSD.

Fixes #1041
2026-01-05 18:18:06 +00:00
rcourtman
0826c4ddb2 fix: Show linked agents in Managed Agents table with badge
Previously, agents linked to Proxmox nodes were hidden from the
Settings > Agents > Managed Agents table, which confused users who
couldn't find their installed agents.

Now all agents are shown in the table, with linked agents displaying
an indigo 'Linked' badge that explains they're also merged with
Proxmox nodes in the Dashboard.

Fixes #1038
2026-01-05 17:57:11 +00:00