230 Commits

Author SHA1 Message Date
T. Gossen
4730da1898 Added LXC row to the bootstrap token table (first row) (#1242)
Added explicit command and clarification for getting first-time bootstrap token on install
2026-02-10 23:17:12 +00:00
T. Gossen
580206c14f docs: add LXC console access instructions (#1241)
Community contribution: FAQ entry for LXC console access
2026-02-10 23:16:33 +00:00
rcourtman
1edfa4311e feat: Unified Resource Model and Navigation Redesign
## Summary
Complete implementation of the Unified Resource Model with new navigation.

## Features
- v2 resources API with identity matching across sources (Proxmox, Agent, Docker)
- Infrastructure page with merged host view
- Workloads page for all VMs/LXC/Docker containers
- Global search (Cmd/Ctrl+K) with keyboard navigation
- Mobile navigation with bottom tabs and drawer
- Keyboard shortcuts (g+key navigation, ? for help)
- What's New modal for user onboarding
- Report Incorrect Merge feature for false positive fixes
- Debug tab in resource drawer (enable via localStorage)

## Technical
- Async audit logging for improved performance
- WebSocket-driven real-time updates for unified resources
- Session-based auth achieves <2ms API response times

## Tests
- Backend: 78 tests passed
- Frontend: 397 tests passed
2026-02-05 17:57:59 +00:00
rcourtman
ee0e89871d fix: reduce metrics memory 86x by reverting buffer and adding LTTB downsampling
The in-memory metrics buffer was changed from 1000 to 86400 points per
metric to support 30-day sparklines, but this pre-allocated ~18 MB per
guest (7 slices × 86400 × 32 bytes). With 50 guests that's 920 MB —
explaining why users needed to double their LXC memory after upgrading
to 5.1.0.

- Revert in-memory buffer to 1000 points / 24h retention
- Remove eager slice pre-allocation (use append growth instead)
- Add LTTB (Largest Triangle Three Buckets) downsampling algorithm
- Chart endpoints now use a two-tier strategy: in-memory for ranges
  ≤ 2h, SQLite persistent store + LTTB for longer ranges
- Reduce frontend ring buffer from 86400 to 2000 points

Related to #1190
2026-02-04 19:49:52 +00:00
rcourtman
4bebd2f576 docs: fix incomplete sensor-proxy cleanup commands and add upgrade warning
The legacy cleanup section in TEMPERATURE_MONITORING.md only covered 1 of the
5 systemd units and referenced an outdated binary path. Users following these
docs still had the selfheal timer running, generating recurring TASK ERROR
entries in the Proxmox task log.

Updated with the complete set of units, correct file paths, and a note that
upgrading the Pulse container does not remove the sensor proxy from the host.
Added a sensor proxy removal section to UPGRADE_v5.md so users see the warning
during upgrade.

Related to #817
2026-02-04 10:27:03 +00:00
rcourtman
3237a4d7dd docs: clarify PVE backup permission requirements
- Update UPGRADE_v5.md to clarify the backup permission issue affects
  agent-based setups (not just v4→v5 upgrades), and note the fix version
- Add troubleshooting section to UNIFIED_AGENT.md for PVE backups

Related to #1139
2026-02-03 19:14:44 +00:00
rcourtman
744eeb0270 Chore: clean up staged changes for release
- Remove standalone pulse-assistant architecture doc (content lives in CLAUDE.md)
- Add CountdownTimer component for patrol schedule display
- Rewrite patrol handler test to focus on interval persistence
- Extract MockStateProvider to shared test file
2026-02-02 23:17:40 +00:00
rcourtman
fa1b74792e docs: add comprehensive deep-dive documentation for AI subsystems
Adds detailed architecture documentation for Pulse Patrol and Pulse Assistant. Updates AI.md and PULSE_PRO.md. Also includes additional tests.
2026-02-02 10:29:07 +00:00
rcourtman
6753727a04 docs: update API documentation and config file references
Comprehensive documentation updates:

API.md:
- Add /api/security/change-password endpoint
- Add AI provider test endpoints
- Add assistant chat & session management endpoints
- Add legacy chat sessions endpoints
- Add alert investigation and patrol autonomy endpoints
- Add findings & investigations endpoints
- Add approvals & command execution endpoints
- Add remediation plans endpoints
- Add intelligence & forecasting endpoints
- Add knowledge base endpoints
- Add debug endpoint
- Add Socket.IO compatibility endpoint

Config files:
- Document sso.enc, ai_chat_sessions.json
- Document profile-versions.json, profile-changelog.json, profile-deployments.json
2026-02-01 23:26:42 +00:00
rcourtman
017073a065 Document WebSocket endpoints and mock-mode PUT method
- Add /ws and /api/agent/ws WebSocket endpoint documentation
- Add PUT method to mock-mode endpoint
2026-02-01 22:26:17 +00:00
rcourtman
80cdfab536 Update metrics docs with canonical resourceType values
- Use canonical types (vm, container, dockerContainer) instead of
  aliases (guest, docker) in examples
- Document that guest/docker aliases are accepted by the API
- Clarify persistent store type mapping in data flow doc
2026-02-01 22:26:04 +00:00
rcourtman
487fcf76d4 Expand API documentation with additional endpoints
Document previously undocumented endpoints:

- Resource metadata endpoints (hosts, guests, docker containers)
- Public config endpoint
- Node test/update/delete/refresh endpoints
- Service discovery endpoints
- Alert management endpoints (config, history, bulk actions)
- Security apply-restart endpoint
- System settings and SSH config endpoints
- Logs streaming and download endpoints
- Server info endpoint

Also clarify resourceType aliases for metrics history.
2026-02-01 22:25:48 +00:00
rcourtman
ec802c4864 Update documentation with configuration and deployment details
- CONFIGURATION.md: Add comprehensive system.json keys table with
  descriptions for all polling, discovery, and UI settings
- DEPLOYMENT_MODELS.md: Document audit signing key, agent profile files,
  org metadata, and multi-tenant storage layout
- METRICS_HISTORY.md: Update resourceType values, add maxPoints param,
  document Pro license requirement for ranges beyond 7d
- MULTI_TENANT.md: Add storage layout and migration section, remove
  completed TODO items from backlog
- CENTRALIZED_MANAGEMENT.md: Update links and clarify architecture
- API.md: Update endpoint documentation
- UNIFIED_AGENT.md: Document --version and --self-test flags
2026-02-01 22:24:48 +00:00
rcourtman
724dee0b36 Update docs for BYOK Patrol and Pro auto-fix 2026-02-01 14:47:02 +00:00
rcourtman
17208cbf9d docs: update AI evaluation matrix and approval workflow documentation 2026-01-30 19:00:40 +00:00
rcourtman
0e880f3c89 feat(eval): improve patrol eval with polling-based completion
Refactor patrol eval runner to use a dual approach:
1. Poll GET /api/ai/patrol/status until Running=false (primary signal)
2. Best-effort SSE stream connection for tool event visibility

Changes:
- Add status polling loop with configurable timeout
- Make SSE stream optional (may not connect in time)
- Add Completed flag to PatrolRunResult
- Improve assertion error messages
- Add new scenarios and assertions

This is more reliable than relying solely on SSE stream which
may timeout waiting for headers during slow patrol initialization.
2026-01-29 08:20:39 +00:00
rcourtman
e227314d76 docs: update pulse-assistant architecture with current structure
- Remove hardcoded line numbers from enforcement references
- Update tool classification table with all current tools
- Reflect consolidated tool structure
2026-01-28 21:24:45 +00:00
rcourtman
44fecc37c0 feat(eval): enhance AI eval harness with retries and reporting
- Add retry logic for transient failures (phantom, stream, empty response)
- Add environment variable overrides for infrastructure naming
- Add JSON report output per scenario
- Expand assertions with new validation types
- Add more comprehensive test scenarios
- Add docs/EVAL.md with usage documentation

The eval harness now better handles flaky AI responses and provides
detailed reports for debugging.
2026-01-28 21:24:12 +00:00
rcourtman
94863a6750 Add comprehensive architecture documentation for Pulse Assistant
Document the complete safety architecture:

1. High-Level Architecture
   - LLM as untrusted proposer pattern
   - FSM gating and tool execution flow
   - ResolvedContext for session truth

2. Safety Invariants (9 total)
   - Session-scoped tool registration
   - FSM state enforcement
   - Strict resolution requirements
   - ExecutionIntent classification
   - NonInteractiveOnly constraint
   - Read/Write tool separation
   - Phantom execution detection
   - Recovery loop protection
   - Telemetry for all safety blocks

3. Implementation Details
   - FSM states and transitions
   - Tool classification rules
   - Intent detection patterns
   - Error handling and recovery

4. Extension Guide
   - Adding new tools safely
   - Required validations
   - Testing requirements

This serves as authoritative reference for contributors
and security auditors.
2026-01-28 16:49:51 +00:00
rcourtman
6873913e64 fix: install script and docs improvements
- Fixed --disable-docker not being passed to systemd service file. Related to #1151
- Added init: true requirement to HTTPS/TLS docs for Docker. Related to #1166
2026-01-26 20:48:57 +00:00
rcourtman
4a8f9827fe feat: add config migration system and multi-tenant support
Migration System:
- Add migration framework for config schema updates
- Add migration tests

Config Enhancements:
- Add multi-tenant configuration support
- Add DeepCopy for tenant isolation
- Enhance AI config options
- Improve API token handling
- Update persistence layer

Documentation:
- Update multi-tenant documentation
2026-01-24 22:43:10 +00:00
rcourtman
c4ca169e2b feat: add multi-tenant isolation foundation (disabled by default)
Implements multi-tenant infrastructure for organization-based data isolation.
Feature is gated behind PULSE_MULTI_TENANT_ENABLED env var and requires
Enterprise license - no impact on existing users.

Core components:
- TenantMiddleware: extracts org ID, validates access, 501/402 responses
- AuthorizationChecker: token/user access validation for organizations
- MultiTenantChecker: WebSocket upgrade gating with license check
- Per-tenant audit logging via LogAuditEventForTenant
- Organization model with membership support

Gating behavior:
- Feature flag disabled: 501 Not Implemented for non-default orgs
- Flag enabled, no license: 402 Payment Required
- Default org always works regardless of flag/license

Documentation added: docs/MULTI_TENANT.md
2026-01-23 21:42:27 +00:00
rcourtman
5efd1591ca docs: update AI documentation 2026-01-22 22:32:42 +00:00
rcourtman
ad4acf1222 chore: add frontend utilities and metrics documentation
- Add useResizeObserver and useTooltip React hooks
- Add utility functions for anomaly colors, error extraction, text width, and threshold colors
- Add METRICS_DATA_FLOW.md documentation
- Ignore SQLite temp files (*.db-shm, *.db-wal)
2026-01-22 13:48:41 +00:00
rcourtman
f1c2d7c12c docs: add logging overrides to configuration reference
Document LOG_FILE, LOG_MAX_SIZE, LOG_MAX_AGE, and LOG_COMPRESS
environment variables for log file configuration.
2026-01-22 00:44:33 +00:00
rcourtman
c8b6cbfc6d feat(pro): long-term metrics history (30d/90d)
- Add FeatureLongTermMetrics license feature for Pro tier
- Implement tiered storage in metrics store (raw, minute, hourly, daily)
- Add covering index for unified history query performance
- Seed mock data for 90 days with appropriate aggregation tiers
- Update PULSE_PRO.md to document the feature
- 7-day history remains free, 30d/90d requires Pro license
2026-01-22 00:42:41 +00:00
rcourtman
0ca6001bad docs: update documentation after sensor proxy deprecation
Update docs to reflect the simplified temperature monitoring architecture:
- Remove references to pulse-sensor-proxy throughout
- Update TEMPERATURE_MONITORING.md to focus on unified agent approach
- Update CONFIGURATION.md, DEPLOYMENT_MODELS.md, FAQ.md
- Remove SECURITY_CHANGELOG.md (proxy-specific security notes)
- Clarify current recommended setup in various guides
2026-01-21 12:00:59 +00:00
rcourtman
ee63d438cc docs: standardize markdown syntax and remove deprecated sensor-proxy docs 2026-01-20 09:43:49 +00:00
rcourtman
035436ad6e fix: add mutex to prevent concurrent map writes in Docker agent CPU tracking
The agent was crashing with 'fatal error: concurrent map writes' when
handleCheckUpdatesCommand spawned a goroutine that called collectOnce
concurrently with the main collection loop. Both code paths access
a.prevContainerCPU without synchronization.

Added a.cpuMu mutex to protect all accesses to prevContainerCPU in:
- pruneStaleCPUSamples()
- collectContainer() delete operation
- calculateContainerCPUPercent()

Related to #1063
2026-01-15 21:10:55 +00:00
rcourtman
a7de907c35 chore: remove internal planning doc, add gitignore patterns
- Remove docs/AGENTS_AI_SCOPE_PLAN.md (internal dev doc)
- Add gitignore patterns for *_PLAN.md, *_ROADMAP.md, *IMPLEMENTATION*.md in docs/
2026-01-15 13:53:42 +00:00
rcourtman
8c7581d32c feat(profiles): add AI-assisted profile suggestions
Add ability for users to describe what kind of agent profile they need
in natural language, and have AI generate a suggestion with name,
description, config values, and rationale.

- Add ProfileSuggestionHandler with schema-aware prompting
- Add SuggestProfileModal component with example prompts
- Update AgentProfilesPanel with suggest button and description field
- Streamline ValidConfigKeys to only agent-supported settings
- Update profile validation tests for simplified schema
2026-01-15 13:24:18 +00:00
rcourtman
95b849e213 chore: remove internal dev roadmap doc 2026-01-12 15:28:05 +00:00
rcourtman
b5233466a3 docs: add Pro features roadmap and implementation status
Documents implementation status for:
- Advanced SSO (SAML & Multi-Provider) - 100%
- Advanced Reporting - 100%
- Audit Logging - 100%
- Agent Profiles - 100%
- RBAC - 100%
- AI Auto-Fix - 100%
- Kubernetes AI - 55%
- AI Alert Analysis - 70%
- AI Patrol - 85%
2026-01-12 15:21:46 +00:00
rcourtman
f527e6ebd0 docs: fix Kubernetes DaemonSet deployment guide
Fixes #1091 - addresses all three documentation issues reported:

1. Binary path: Changed from /usr/local/bin/pulse-agent (which doesn't
   exist in the main image) to /opt/pulse/bin/pulse-agent-linux-amd64

2. PULSE_AGENT_ID: Added to example and documented why it's required
   for DaemonSets (prevents token conflicts when all pods share one
   API token)

3. Resource visibility flags: Added PULSE_KUBE_INCLUDE_ALL_PODS and
   PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS to example, with explanation
   of the default behavior (show only problematic resources)

Also added tolerations, resource requests/limits, and ARM64 note.
2026-01-11 21:43:23 +00:00
rcourtman
80729408c1 docs: add RBAC endpoints, OIDC group mapping, and update Pro terminology
- Add RBAC/role management endpoints to API.md
- Document OIDC group-to-role mapping feature in OIDC.md
- Add missing config files to CONFIGURATION.md (audit.db, AI files)
- Add OIDC_GROUP_ROLE_MAPPINGS env var documentation
- Fix "enterprise" -> "Pro" terminology in TROUBLESHOOTING.md
- Refocus TEMPERATURE_MONITORING.md on agent method, collapse legacy proxy docs
2026-01-10 13:59:50 +00:00
rcourtman
2a8f55d719 feat(enterprise): add Advanced Reporting and Audit Webhooks integration
This commit adds enterprise-grade reporting and audit capabilities:

Reporting:
- Refactored metrics store from internal/ to pkg/ for enterprise access
- Added pkg/reporting with shared interfaces for report generation
- Created API endpoint: GET /api/admin/reports/generate
- New ReportingPanel.tsx for PDF/CSV report configuration

Audit Webhooks:
- Extended pkg/audit with webhook URL management interface
- Added API endpoint: GET/POST /api/admin/webhooks/audit
- New AuditWebhookPanel.tsx for webhook configuration
- Updated Settings.tsx with Reporting and Webhooks tabs

Server Hardening:
- Enterprise hooks now execute outside mutex with panic recovery
- Removed dbPath from metrics Stats API to prevent path disclosure
- Added storage metrics persistence to polling loop

Documentation:
- Updated README.md feature table
- Updated docs/API.md with new endpoints
- Updated docs/PULSE_PRO.md with feature descriptions
- Updated docs/WEBHOOKS.md with audit webhooks section
2026-01-09 21:31:49 +00:00
rcourtman
3e2824a7ff feat: remove Enterprise badges, simplify Pro upgrade prompts
- Replace barrel import in AuditLogPanel.tsx to fix ad-blocker crash
- Remove all Enterprise/Pro badges from nav and feature headers
- Simplify upgrade CTAs to clean 'Upgrade to Pro' links
- Update docs: PULSE_PRO.md, API.md, README.md, SECURITY.md
- Align terminology: single Pro tier, no separate Enterprise tier

Also includes prior refactoring:
- Move auth package to pkg/auth for enterprise reuse
- Export server functions for testability
- Stabilize CLI tests
2026-01-09 16:51:08 +00:00
rcourtman
33bb0a95bb docs: Fix formatting in API reference 2026-01-08 20:15:25 +00:00
rcourtman
73c5128a87 feat(audit): Add audit log API endpoints and UI with signature verification
- Add GET /api/audit endpoint for listing events with filters
- Add GET /api/audit/:id/verify endpoint for signature verification
- Add AuditLogPanel UI component with filtering and verification
- Update docs with audit API documentation
- Add localStorage utils for persisting UI state
- Update gitignore patterns
2026-01-08 19:19:57 +00:00
rcourtman
7342191075 docs: fix Helm chart install commands to use GitHub Pages repo
The GHCR OCI registry (ghcr.io/rcourtman/pulse-chart) is returning 403/404
errors for unauthenticated users. Updated all Helm references to use the
working GitHub Pages Helm repository at https://rcourtman.github.io/Pulse

Fixes install issues reported by customers trying to deploy via Helm.

Files updated:
- docs/KUBERNETES.md
- docs/INSTALL.md
- docs/DEPLOYMENT_MODELS.md
- docs/UPGRADE_v5.md
2026-01-08 14:27:45 +00:00
rcourtman
22e01e2244 feat: Add centralized agent configuration management (Pro)
Allows administrators to create configuration profiles and assign them
to agents for centralized fleet management.

- Configuration profiles with customizable settings (Docker, K8s,
  Proxmox monitoring, log level, reporting interval)
- Profile assignment to agents by ID
- Agent-side remote config client to fetch settings on startup
- Full CRUD API at /api/admin/profiles
- Settings UI panel in Settings → Agents → Agent Profiles
- Automatic cleanup of assignments when profiles are deleted
2026-01-08 12:06:36 +00:00
rcourtman
7db6b3e47d feat: Add AI chat session sync across devices
Implements server-side persistence for AI chat sessions, allowing users
to continue conversations across devices and browser sessions. Related
to #1059.

Backend:
- Add chat session CRUD API endpoints (GET/PUT/DELETE)
- Add persistence layer with per-user session storage
- Support session cleanup for old sessions (90 days)
- Multi-user support via auth context

Frontend:
- Rewrite aiChat store with server sync (debounced)
- Add session management UI (new conversation, switch, delete)
- Local storage as fallback/cache
- Initialize sync on app startup when AI is enabled
2026-01-08 10:47:45 +00:00
rcourtman
695ced6273 docs: Add API token scopes and kiosk mode documentation
Documents all available token scopes, UI presets, and step-by-step
instructions for setting up kiosk mode with read-only dashboard tokens.

Related to #1055
2026-01-08 10:27:15 +00:00
rcourtman
8c4bef27f0 docs: improve reverse proxy HTTPS detection and Swarm troubleshooting
- Add detailed HTTPS detection troubleshooting to REVERSE_PROXY.md
- Explain X-Forwarded-Proto header requirement for nginx/Caddy/Apache
- Add Docker Swarm troubleshooting section to UNIFIED_AGENT.md
- Document how to force Docker runtime if auto-detection fails

Based on customer feedback.
2026-01-07 18:23:48 +00:00
rcourtman
3f0808e9f9 docs: comprehensive core and Pro documentation overhaul
- Major updates to README.md and docs/README.md for Pulse v5
- Added technical deep-dives for Pulse Pro (docs/PULSE_PRO.md) and AI Patrol (docs/AI.md)
- Updated Prometheus metrics documentation and Helm schema for metrics separation
- Refreshed security, installation, and deployment documentation for unified agent models
- Cleaned up legacy summary files
2026-01-07 17:38:27 +00:00
rcourtman
9cfcdbb247 fix: Use per-node shared flag for storage deduplication
The storage deduplication logic only checked cluster config's Shared
flag, but this required the cluster config API call to succeed. When
the per-node storage API already returns shared=1 (as the user
verified), we should use that directly.

Now we check three sources for shared storage detection:
1. Per-node API shared flag (storage.Shared)
2. Cluster config shared flag (if available)
3. Storage type heuristics (NFS, RBD, PBS, etc.)

Related to #1049
2026-01-07 10:16:23 +00:00
rcourtman
dcdbee3c5c feat: Add in-app help system with HelpIcon component
Add contextual help icons throughout the UI to improve feature
discoverability. Users can click (?) icons to see explanations
with examples for settings they might not understand.

- HelpIcon component with click-to-open popover
- Centralized help content registry in /content/help/
- FeatureTip component for dismissible contextual tips
- Help added to: alert delay, AI endpoints, update channel
2026-01-07 09:22:23 +00:00
rcourtman
773376fa5d docs: add deep dive summaries for notifications, discovery, and agent exec 2026-01-02 11:18:28 +00:00
rcourtman
d71754743c docs: Add PULSE_DISABLE_DOCKER_UPDATE_ACTIONS documentation
- Add to DOCKER.md configuration table and new 'Disabling Update Features' section
- Add to CONFIGURATION.md monitoring overrides table
- Clarify difference between disabling update detection vs hiding buttons
2026-01-02 10:35:04 +00:00
rcourtman
94717ba867 feat(agent): add --docker-runtime flag for podman/docker selection
On systems where Docker compatibility layer obscures Podman (like CoreOS),
the auto-detection can fail. Users can now force the runtime:

  --docker-runtime podman
  PULSE_DOCKER_RUNTIME=podman

Valid values: auto (default), docker, podman

Related to Discussion #958
2026-01-01 00:24:37 +00:00