Commit Graph

216 Commits

Author SHA1 Message Date
rcourtman
17208cbf9d docs: update AI evaluation matrix and approval workflow documentation 2026-01-30 19:00:40 +00:00
rcourtman
0e880f3c89 feat(eval): improve patrol eval with polling-based completion
Refactor patrol eval runner to use a dual approach:
1. Poll GET /api/ai/patrol/status until Running=false (primary signal)
2. Best-effort SSE stream connection for tool event visibility

Changes:
- Add status polling loop with configurable timeout
- Make SSE stream optional (may not connect in time)
- Add Completed flag to PatrolRunResult
- Improve assertion error messages
- Add new scenarios and assertions

This is more reliable than relying solely on SSE stream which
may timeout waiting for headers during slow patrol initialization.
2026-01-29 08:20:39 +00:00
rcourtman
e227314d76 docs: update pulse-assistant architecture with current structure
- Remove hardcoded line numbers from enforcement references
- Update tool classification table with all current tools
- Reflect consolidated tool structure
2026-01-28 21:24:45 +00:00
rcourtman
44fecc37c0 feat(eval): enhance AI eval harness with retries and reporting
- Add retry logic for transient failures (phantom, stream, empty response)
- Add environment variable overrides for infrastructure naming
- Add JSON report output per scenario
- Expand assertions with new validation types
- Add more comprehensive test scenarios
- Add docs/EVAL.md with usage documentation

The eval harness now better handles flaky AI responses and provides
detailed reports for debugging.
2026-01-28 21:24:12 +00:00
rcourtman
94863a6750 Add comprehensive architecture documentation for Pulse Assistant
Document the complete safety architecture:

1. High-Level Architecture
   - LLM as untrusted proposer pattern
   - FSM gating and tool execution flow
   - ResolvedContext for session truth

2. Safety Invariants (9 total)
   - Session-scoped tool registration
   - FSM state enforcement
   - Strict resolution requirements
   - ExecutionIntent classification
   - NonInteractiveOnly constraint
   - Read/Write tool separation
   - Phantom execution detection
   - Recovery loop protection
   - Telemetry for all safety blocks

3. Implementation Details
   - FSM states and transitions
   - Tool classification rules
   - Intent detection patterns
   - Error handling and recovery

4. Extension Guide
   - Adding new tools safely
   - Required validations
   - Testing requirements

This serves as authoritative reference for contributors
and security auditors.
2026-01-28 16:49:51 +00:00
rcourtman
6873913e64 fix: install script and docs improvements
- Fixed --disable-docker not being passed to systemd service file. Related to #1151
- Added init: true requirement to HTTPS/TLS docs for Docker. Related to #1166
2026-01-26 20:48:57 +00:00
rcourtman
4a8f9827fe feat: add config migration system and multi-tenant support
Migration System:
- Add migration framework for config schema updates
- Add migration tests

Config Enhancements:
- Add multi-tenant configuration support
- Add DeepCopy for tenant isolation
- Enhance AI config options
- Improve API token handling
- Update persistence layer

Documentation:
- Update multi-tenant documentation
2026-01-24 22:43:10 +00:00
rcourtman
c4ca169e2b feat: add multi-tenant isolation foundation (disabled by default)
Implements multi-tenant infrastructure for organization-based data isolation.
Feature is gated behind PULSE_MULTI_TENANT_ENABLED env var and requires
Enterprise license - no impact on existing users.

Core components:
- TenantMiddleware: extracts org ID, validates access, 501/402 responses
- AuthorizationChecker: token/user access validation for organizations
- MultiTenantChecker: WebSocket upgrade gating with license check
- Per-tenant audit logging via LogAuditEventForTenant
- Organization model with membership support

Gating behavior:
- Feature flag disabled: 501 Not Implemented for non-default orgs
- Flag enabled, no license: 402 Payment Required
- Default org always works regardless of flag/license

Documentation added: docs/MULTI_TENANT.md
2026-01-23 21:42:27 +00:00
rcourtman
5efd1591ca docs: update AI documentation 2026-01-22 22:32:42 +00:00
rcourtman
ad4acf1222 chore: add frontend utilities and metrics documentation
- Add useResizeObserver and useTooltip React hooks
- Add utility functions for anomaly colors, error extraction, text width, and threshold colors
- Add METRICS_DATA_FLOW.md documentation
- Ignore SQLite temp files (*.db-shm, *.db-wal)
2026-01-22 13:48:41 +00:00
rcourtman
f1c2d7c12c docs: add logging overrides to configuration reference
Document LOG_FILE, LOG_MAX_SIZE, LOG_MAX_AGE, and LOG_COMPRESS
environment variables for log file configuration.
2026-01-22 00:44:33 +00:00
rcourtman
c8b6cbfc6d feat(pro): long-term metrics history (30d/90d)
- Add FeatureLongTermMetrics license feature for Pro tier
- Implement tiered storage in metrics store (raw, minute, hourly, daily)
- Add covering index for unified history query performance
- Seed mock data for 90 days with appropriate aggregation tiers
- Update PULSE_PRO.md to document the feature
- 7-day history remains free, 30d/90d requires Pro license
2026-01-22 00:42:41 +00:00
rcourtman
0ca6001bad docs: update documentation after sensor proxy deprecation
Update docs to reflect the simplified temperature monitoring architecture:
- Remove references to pulse-sensor-proxy throughout
- Update TEMPERATURE_MONITORING.md to focus on unified agent approach
- Update CONFIGURATION.md, DEPLOYMENT_MODELS.md, FAQ.md
- Remove SECURITY_CHANGELOG.md (proxy-specific security notes)
- Clarify current recommended setup in various guides
2026-01-21 12:00:59 +00:00
rcourtman
ee63d438cc docs: standardize markdown syntax and remove deprecated sensor-proxy docs 2026-01-20 09:43:49 +00:00
rcourtman
035436ad6e fix: add mutex to prevent concurrent map writes in Docker agent CPU tracking
The agent was crashing with 'fatal error: concurrent map writes' when
handleCheckUpdatesCommand spawned a goroutine that called collectOnce
concurrently with the main collection loop. Both code paths access
a.prevContainerCPU without synchronization.

Added a.cpuMu mutex to protect all accesses to prevContainerCPU in:
- pruneStaleCPUSamples()
- collectContainer() delete operation
- calculateContainerCPUPercent()

Related to #1063
2026-01-15 21:10:55 +00:00
rcourtman
a7de907c35 chore: remove internal planning doc, add gitignore patterns
- Remove docs/AGENTS_AI_SCOPE_PLAN.md (internal dev doc)
- Add gitignore patterns for *_PLAN.md, *_ROADMAP.md, *IMPLEMENTATION*.md in docs/
2026-01-15 13:53:42 +00:00
rcourtman
8c7581d32c feat(profiles): add AI-assisted profile suggestions
Add ability for users to describe what kind of agent profile they need
in natural language, and have AI generate a suggestion with name,
description, config values, and rationale.

- Add ProfileSuggestionHandler with schema-aware prompting
- Add SuggestProfileModal component with example prompts
- Update AgentProfilesPanel with suggest button and description field
- Streamline ValidConfigKeys to only agent-supported settings
- Update profile validation tests for simplified schema
2026-01-15 13:24:18 +00:00
rcourtman
95b849e213 chore: remove internal dev roadmap doc 2026-01-12 15:28:05 +00:00
rcourtman
b5233466a3 docs: add Pro features roadmap and implementation status
Documents implementation status for:
- Advanced SSO (SAML & Multi-Provider) - 100%
- Advanced Reporting - 100%
- Audit Logging - 100%
- Agent Profiles - 100%
- RBAC - 100%
- AI Auto-Fix - 100%
- Kubernetes AI - 55%
- AI Alert Analysis - 70%
- AI Patrol - 85%
2026-01-12 15:21:46 +00:00
rcourtman
f527e6ebd0 docs: fix Kubernetes DaemonSet deployment guide
Fixes #1091 - addresses all three documentation issues reported:

1. Binary path: Changed from /usr/local/bin/pulse-agent (which doesn't
   exist in the main image) to /opt/pulse/bin/pulse-agent-linux-amd64

2. PULSE_AGENT_ID: Added to example and documented why it's required
   for DaemonSets (prevents token conflicts when all pods share one
   API token)

3. Resource visibility flags: Added PULSE_KUBE_INCLUDE_ALL_PODS and
   PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS to example, with explanation
   of the default behavior (show only problematic resources)

Also added tolerations, resource requests/limits, and ARM64 note.
2026-01-11 21:43:23 +00:00
rcourtman
80729408c1 docs: add RBAC endpoints, OIDC group mapping, and update Pro terminology
- Add RBAC/role management endpoints to API.md
- Document OIDC group-to-role mapping feature in OIDC.md
- Add missing config files to CONFIGURATION.md (audit.db, AI files)
- Add OIDC_GROUP_ROLE_MAPPINGS env var documentation
- Fix "enterprise" -> "Pro" terminology in TROUBLESHOOTING.md
- Refocus TEMPERATURE_MONITORING.md on agent method, collapse legacy proxy docs
2026-01-10 13:59:50 +00:00
rcourtman
2a8f55d719 feat(enterprise): add Advanced Reporting and Audit Webhooks integration
This commit adds enterprise-grade reporting and audit capabilities:

Reporting:
- Refactored metrics store from internal/ to pkg/ for enterprise access
- Added pkg/reporting with shared interfaces for report generation
- Created API endpoint: GET /api/admin/reports/generate
- New ReportingPanel.tsx for PDF/CSV report configuration

Audit Webhooks:
- Extended pkg/audit with webhook URL management interface
- Added API endpoint: GET/POST /api/admin/webhooks/audit
- New AuditWebhookPanel.tsx for webhook configuration
- Updated Settings.tsx with Reporting and Webhooks tabs

Server Hardening:
- Enterprise hooks now execute outside mutex with panic recovery
- Removed dbPath from metrics Stats API to prevent path disclosure
- Added storage metrics persistence to polling loop

Documentation:
- Updated README.md feature table
- Updated docs/API.md with new endpoints
- Updated docs/PULSE_PRO.md with feature descriptions
- Updated docs/WEBHOOKS.md with audit webhooks section
2026-01-09 21:31:49 +00:00
rcourtman
3e2824a7ff feat: remove Enterprise badges, simplify Pro upgrade prompts
- Replace barrel import in AuditLogPanel.tsx to fix ad-blocker crash
- Remove all Enterprise/Pro badges from nav and feature headers
- Simplify upgrade CTAs to clean 'Upgrade to Pro' links
- Update docs: PULSE_PRO.md, API.md, README.md, SECURITY.md
- Align terminology: single Pro tier, no separate Enterprise tier

Also includes prior refactoring:
- Move auth package to pkg/auth for enterprise reuse
- Export server functions for testability
- Stabilize CLI tests
2026-01-09 16:51:08 +00:00
rcourtman
33bb0a95bb docs: Fix formatting in API reference 2026-01-08 20:15:25 +00:00
rcourtman
73c5128a87 feat(audit): Add audit log API endpoints and UI with signature verification
- Add GET /api/audit endpoint for listing events with filters
- Add GET /api/audit/:id/verify endpoint for signature verification
- Add AuditLogPanel UI component with filtering and verification
- Update docs with audit API documentation
- Add localStorage utils for persisting UI state
- Update gitignore patterns
2026-01-08 19:19:57 +00:00
rcourtman
7342191075 docs: fix Helm chart install commands to use GitHub Pages repo
The GHCR OCI registry (ghcr.io/rcourtman/pulse-chart) is returning 403/404
errors for unauthenticated users. Updated all Helm references to use the
working GitHub Pages Helm repository at https://rcourtman.github.io/Pulse

Fixes install issues reported by customers trying to deploy via Helm.

Files updated:
- docs/KUBERNETES.md
- docs/INSTALL.md
- docs/DEPLOYMENT_MODELS.md
- docs/UPGRADE_v5.md
2026-01-08 14:27:45 +00:00
rcourtman
22e01e2244 feat: Add centralized agent configuration management (Pro)
Allows administrators to create configuration profiles and assign them
to agents for centralized fleet management.

- Configuration profiles with customizable settings (Docker, K8s,
  Proxmox monitoring, log level, reporting interval)
- Profile assignment to agents by ID
- Agent-side remote config client to fetch settings on startup
- Full CRUD API at /api/admin/profiles
- Settings UI panel in Settings → Agents → Agent Profiles
- Automatic cleanup of assignments when profiles are deleted
2026-01-08 12:06:36 +00:00
rcourtman
7db6b3e47d feat: Add AI chat session sync across devices
Implements server-side persistence for AI chat sessions, allowing users
to continue conversations across devices and browser sessions. Related
to #1059.

Backend:
- Add chat session CRUD API endpoints (GET/PUT/DELETE)
- Add persistence layer with per-user session storage
- Support session cleanup for old sessions (90 days)
- Multi-user support via auth context

Frontend:
- Rewrite aiChat store with server sync (debounced)
- Add session management UI (new conversation, switch, delete)
- Local storage as fallback/cache
- Initialize sync on app startup when AI is enabled
2026-01-08 10:47:45 +00:00
rcourtman
695ced6273 docs: Add API token scopes and kiosk mode documentation
Documents all available token scopes, UI presets, and step-by-step
instructions for setting up kiosk mode with read-only dashboard tokens.

Related to #1055
2026-01-08 10:27:15 +00:00
rcourtman
8c4bef27f0 docs: improve reverse proxy HTTPS detection and Swarm troubleshooting
- Add detailed HTTPS detection troubleshooting to REVERSE_PROXY.md
- Explain X-Forwarded-Proto header requirement for nginx/Caddy/Apache
- Add Docker Swarm troubleshooting section to UNIFIED_AGENT.md
- Document how to force Docker runtime if auto-detection fails

Based on customer feedback.
2026-01-07 18:23:48 +00:00
rcourtman
3f0808e9f9 docs: comprehensive core and Pro documentation overhaul
- Major updates to README.md and docs/README.md for Pulse v5
- Added technical deep-dives for Pulse Pro (docs/PULSE_PRO.md) and AI Patrol (docs/AI.md)
- Updated Prometheus metrics documentation and Helm schema for metrics separation
- Refreshed security, installation, and deployment documentation for unified agent models
- Cleaned up legacy summary files
2026-01-07 17:38:27 +00:00
rcourtman
9cfcdbb247 fix: Use per-node shared flag for storage deduplication
The storage deduplication logic only checked cluster config's Shared
flag, but this required the cluster config API call to succeed. When
the per-node storage API already returns shared=1 (as the user
verified), we should use that directly.

Now we check three sources for shared storage detection:
1. Per-node API shared flag (storage.Shared)
2. Cluster config shared flag (if available)
3. Storage type heuristics (NFS, RBD, PBS, etc.)

Related to #1049
2026-01-07 10:16:23 +00:00
rcourtman
dcdbee3c5c feat: Add in-app help system with HelpIcon component
Add contextual help icons throughout the UI to improve feature
discoverability. Users can click (?) icons to see explanations
with examples for settings they might not understand.

- HelpIcon component with click-to-open popover
- Centralized help content registry in /content/help/
- FeatureTip component for dismissible contextual tips
- Help added to: alert delay, AI endpoints, update channel
2026-01-07 09:22:23 +00:00
rcourtman
773376fa5d docs: add deep dive summaries for notifications, discovery, and agent exec 2026-01-02 11:18:28 +00:00
rcourtman
d71754743c docs: Add PULSE_DISABLE_DOCKER_UPDATE_ACTIONS documentation
- Add to DOCKER.md configuration table and new 'Disabling Update Features' section
- Add to CONFIGURATION.md monitoring overrides table
- Clarify difference between disabling update detection vs hiding buttons
2026-01-02 10:35:04 +00:00
rcourtman
94717ba867 feat(agent): add --docker-runtime flag for podman/docker selection
On systems where Docker compatibility layer obscures Podman (like CoreOS),
the auto-detection can fail. Users can now force the runtime:

  --docker-runtime podman
  PULSE_DOCKER_RUNTIME=podman

Valid values: auto (default), docker, podman

Related to Discussion #958
2026-01-01 00:24:37 +00:00
rcourtman
e3b3785582 feat(agent): add option to disable Docker update checks
Add PULSE_DISABLE_DOCKER_UPDATE_CHECKS environment variable and
--disable-docker-update-checks flag to disable Docker image update
detection. This is useful for:
- Avoiding Docker Hub rate limits
- Users who don't want update notifications in their dashboard

Related to Discussion #982
2026-01-01 00:20:49 +00:00
rcourtman
c1f4b8f40b feat: PULSE_DISK_EXCLUDE now applies to SMART monitoring. Related to #983
Previously, the PULSE_DISK_EXCLUDE environment variable and --disk-exclude
flag only filtered mount points in the hostmetrics collector. This change
extends the exclusion to SMART data collection.

Changes:
- Updated smartctl.CollectLocal() to accept diskExclude patterns
- Added matchesDeviceExclude() for block device pattern matching
- Patterns support: exact match (sda), prefix (nvme*), contains (*cache*)
- Updated hostagent to pass DiskExclude to SMART collector
- Added comprehensive tests for pattern matching
- Updated documentation
2025-12-31 23:07:01 +00:00
rcourtman
d07b471e40 Refactor Docker agent: metrics collection, security checks, and batch updates
- Separated metrics collection into internal/dockeragent/collect.go
- Added agent self-update pre-flight check (--self-test)
- Implemented signed binary verification with key rotation for updates
- Added batch update support to frontend with parallel processing
- Cleaned up agent.go and added startup cleanup for backup containers
- Updated documentation for Docker features and agent security
2025-12-29 17:20:18 +00:00
rcourtman
3b506e3ecb docs: Add Docker container update documentation
- Document the new one-click container update feature
- Explain update detection, safety features, and requirements
- Bump VERSION to 5.0.6 for release
2025-12-29 10:06:34 +00:00
rcourtman
ca4c2383b6 docs(pbs): add Password Setup method for Docker PBS users 2025-12-26 10:12:49 +00:00
rcourtman
f891b6217e docs: add comprehensive PBS integration guide
- Created new PBS.md with setup instructions for direct PBS connection
- Explains difference between direct PBS API vs PVE passthrough
- Documents three setup methods: agent install, one-click script, manual
- Includes permissions reference and troubleshooting section
- Added link in docs/README.md under Monitoring & Agents
2025-12-26 09:45:42 +00:00
rcourtman
c3a112a2ec docs: add social preview image for GitHub 2025-12-25 20:24:45 +00:00
rcourtman
a9205e7d51 docs: update dashboard screenshot to match landing page 2025-12-25 20:17:39 +00:00
rcourtman
b638420b72 docs: Add disk exclusion and S.M.A.R.T. documentation
- Document --disk-exclude flag and PULSE_DISK_EXCLUDE env var
- Document pattern types (exact, prefix, contains)
- Document S.M.A.R.T. requirements (smartmontools package)
- Add --enable-commands to configuration table
2025-12-25 12:20:07 +00:00
rcourtman
2c76cdaa43 docs: add backup permissions fix for v4→v5 upgrades
Users upgrading from v4 may have tokens that lack PVEDatastoreAdmin
permission on /storage, causing backups to not appear.

Added section to UPGRADE_v5.md with quick fix command and alternative
approach (re-run agent setup).

Related to #883
2025-12-24 16:33:01 +00:00
rcourtman
9bbe0d6203 docs: expand AI Patrol documentation with full context explanation
- Add comprehensive explanation of what data Patrol receives
- Document the enriched context (trends, baselines, predictions)
- Explain operational memory (notes, dismissed alerts, incidents)
- Clarify why Patrol catches issues that static alerts miss
- Mark Patrol as Pro feature with link to pulserelay.pro
2025-12-23 23:21:36 +00:00
rcourtman
77db408114 feat(ai): add finding validation layer to reduce patrol noise
- Add validateAIFindings() that cross-checks AI findings against actual metrics
- Filter out low-confidence findings (CPU <50%, memory <60%, disk <70%)
- Always allow critical findings, backup issues, and reliability findings through
- Update AI system prompt with stricter thresholds and explicit noise examples
- Add 'before creating a finding' checklist for AI (the 3am test)
- Update AI.md docs with clear value proposition and expectations
- Add comprehensive tests for the validation layer

This ensures paying users get immediate value without noise.
2025-12-22 23:28:09 +00:00
rcourtman
8d0ca16124 docs: update AI.md with v5 patrol and finding management features
- Added finding management section (resolve, dismiss, suppress)
- Documented patrol service and severity levels
- Added AI-assisted remediation capabilities
- Added Ollama tool/function calling support note
- Added new troubleshooting tips for findings persistence
2025-12-22 14:43:56 +00:00
rcourtman
b6140cd6e8 feat(oidc): Add refresh token support for long-lived sessions
When offline_access scope is configured, Pulse now stores and uses
OIDC refresh tokens to automatically extend sessions. Sessions remain
valid as long as the IdP allows token refresh (typically 30-90 days).

Changes:
- Store OIDC tokens (refresh token, expiry, issuer) alongside sessions
- Automatically refresh tokens when access token nears expiry
- Invalidate session if IdP revokes access (forces re-login)
- Add background token refresh with concurrency protection
- Persist OIDC tokens across restarts

Related to #854
2025-12-20 10:45:46 +00:00