Commit Graph

2761 Commits

Author SHA1 Message Date
rcourtman
11aa3e05af fix(reporting): correct SSD life documentation and logic
- Fix misleading comment in DiskInfo struct that said "percentage of
  life used" when it's actually "percentage of life REMAINING"
- Document that 100 = healthy, 0 = end of life, -1 = unknown
- This matches the Proxmox API behavior where wearout "100 is best"
2026-02-03 18:24:09 +00:00
rcourtman
9f412e69b3 fix(reporting): correct SSD life interpretation (100% = healthy, not worn)
The WearLevel field represents SSD life REMAINING, not wear used:
- 100% = fully healthy (new drive)
- 0% = end of life

Fixed logic to:
- Show critical warning when life <= 10% (not >= 90%)
- Show warning when life <= 30% (not >= 70%)
- Display values in green when healthy (>30% life remaining)
- Rename column from "Wear" to "Life" for clarity
2026-02-03 18:20:54 +00:00
rcourtman
442d29e9b9 feat(reporting): enhance PDF reports with Executive Summary and actionable insights
- Add professional cover page with branding and report period
- Add Executive Summary page with health status banner (HEALTHY/WARNING/CRITICAL)
- Add Quick Stats section with color-coded metrics and trend indicators
- Add Key Observations with automated analysis of CPU, memory, disk, and disk wear
- Add Recommended Actions section with prioritized, actionable items
- Add Resource Details page with hardware info, storage pools, physical disks
- Add color-coded tables for alerts, storage, and disk health
- Add performance charts with area fills and proper scaling
- Improve overall visual design with consistent color scheme
- Fix SAML session invalidation to use correct SessionStore method
2026-02-03 18:17:31 +00:00
rcourtman
2e4e7b06a8 fix(tests): update reporting handlers tests to match new signature
NewReportingHandlers now requires a MultiTenantMonitor parameter.
Pass nil since the tests don't need the monitor functionality.
2026-02-03 17:48:44 +00:00
rcourtman
d716bbfdeb fix(security): add proper authorization to sensitive endpoints
- /api/agent-install-command: require admin + settings:write scope
  Previously only RequireAuth, allowing any authenticated user to mint
  high-privilege API tokens (host-agent:manage)

- /api/system/ssh-config: require settings:write scope
  Previously any authenticated token could modify ~/.ssh/config

- /api/system/verify-temperature-ssh: require settings:write scope
  Previously any authenticated token could trigger SSH connection
  attempts to arbitrary nodes (network scanning risk)

- /api/diagnostics: require admin privileges
  Previously exposed API token metadata (IDs, hints, usage mapping)
  to any authenticated token, enabling enumeration attacks
2026-02-03 17:47:40 +00:00
rcourtman
12a5a98117 fix: SSE race conditions, alert user spoofing, and security status oracle
SSE Broadcaster:
- Add per-client mutex to prevent concurrent writes to ResponseWriter
- Fix data race in cleanupLoop reading LastActive without synchronization
- Update LastActive in SendHeartbeat so clients aren't incorrectly pruned
  after 5 minutes of idle heartbeat traffic

Alert Acknowledgements:
- Extract authenticated user from X-Authenticated-User header instead of
  hardcoding 'admin' or trusting request body's User field
- Prevents audit log spoofing and ensures accurate user attribution

Security Status Endpoint:
- Remove ?token= query param validation from public /api/security/status
- Prevents endpoint from acting as a token validity oracle for attackers
- Authentication still works via session cookies and X-API-Token header
2026-02-03 17:40:58 +00:00
rcourtman
beae4c860c fix: address 6 security and reliability issues
Security fixes:
- Auto-register now requires settings:write scope for API tokens
- X-Forwarded-For in auto-register only trusted from verified proxies
- Public URL capture requires authentication (no loopback bypass)
- Lockout reset now uses RequireAdmin for session users

Reliability fixes:
- Docker stop command expiration clears PendingUninstall flag
- Cancelled notifications get completed_at set and are cleaned up
2026-02-03 17:32:44 +00:00
rcourtman
4b0d6a0538 fix(ui): prevent text overflow in drawer cards
Add overflow-hidden to card containers and truncation with tooltips
to long text fields (hostnames, IPs, MACs, version strings) so content
no longer spills into adjacent cards.
2026-02-03 17:32:14 +00:00
rcourtman
b2639ed5a5 Fix security vulnerabilities and critical bugs
- Fix WebSocket CORS bypass by strictly verifying origin
- Fix OIDC refresh token persistence by encrypting at rest
- Fix grouped webhook data mutation by cloning alerts
- Fix host agent uninstall authorization and config fetch logic
- Fix notification queue recovery for stuck sending items
- Fix ignored update history limit parameter
- Fix ineffective break statement in WebSocket write pump
2026-02-03 17:16:27 +00:00
rcourtman
c7f4030c29 fix(monitoring): prevent memory leak from stale metrics history and rate tracker entries
MetricsHistory.Cleanup() was defined but never called, and even if called,
it only removed old data points without deleting map entries for deleted
containers/VMs. Each stale entry leaked ~224KB (7 pre-allocated slices).

Changes:
- Call metricsHistory.Cleanup() and rateTracker.Cleanup() in maintenance loop
- Delete map entries entirely when all data points have expired
- Return nil instead of empty slice in cleanupMetrics() to release backing arrays
- Add Cleanup() method to RateTracker with 24-hour stale threshold
- Add debug logging to track cleanup activity

Related to #1153
2026-02-03 17:16:06 +00:00
rcourtman
f8bb14977d fix(discovery): include IPAddresses in state adapter for URL suggestion
The discovery state adapter was not copying IPAddresses from the models
when converting VM/Container state. This caused getResourceExternalIP()
to return empty strings, preventing URL suggestion from working.
2026-02-03 17:05:01 +00:00
rcourtman
bd030c7c87 security: fix webhook SSRF, rate limit spoofing, metrics retention, and url poisoning
- Fix SSRF and rate limit bypass in SendEnhancedWebhook by validating the rendered URL.
- Fix rate limit spoofing in updates API by using secure IP extraction (trusted proxies).
- Fix memory leak in metrics history by correctly clearing fully stale data series.
- Fix public URL poisoning by preventing overwrites when explicitly configured.
2026-02-03 16:58:13 +00:00
rcourtman
7d0d7bb523 chore: bump version to 5.1.0-rc.2 2026-02-03 16:54:18 +00:00
rcourtman
c6aeb9429b fix: initialize reporting engine in standard binary
Pro license holders running the standard Docker image/binary were
getting "Reporting engine not initialized" errors because the
reporting engine was only wired up in the enterprise build.

Now the core server initializes the reporting engine automatically
when the metrics store is ready, ensuring PDF/CSV report generation
works for all Pro license holders regardless of which binary they use.

The enterprise hooks are still honored if set, allowing the enterprise
build to override with its own implementation if needed.
2026-02-03 16:53:20 +00:00
rcourtman
3ea3f0f827 feat(discovery): auto-suggest web interface URLs for discovered services
Add deterministic URL suggestion based on service type and external IP:

- Add SuggestedURL field to ResourceDiscovery type (Go + TypeScript)
- Create url_suggestion.go with 60+ service defaults (Jellyfin, Plex,
  Home Assistant, Grafana, Proxmox, etc.)
- Support HTTPS services, custom paths (/web, /dashboard/, /admin)
- Fall back to discovered ports for unknown services
- Add UI in DiscoveryTab with "Use this" button to populate URL input
- Add comprehensive unit tests for URL suggestion logic

Suggestion only appears when no custom URL is saved. User clicks
"Use this" to populate the input, then "Save" to confirm.
2026-02-03 16:49:57 +00:00
rcourtman
4f40c3d751 fix: resolve critical stability and auth issues
- Fix data race in webhook notifications by removing shared state
- Fix duplicate monitors on config reload by stopping old instances
- Prevent metrics ID deletion on transient startup errors
- Support Bearer auth header for config export/import endpoints
2026-02-03 16:46:27 +00:00
rcourtman
935326ebb7 fix(api/ai): resolve critical auth, agent download, and lifecycle issues
- Fix API-only mode to accept Bearer tokens and query params
- Fix data race in API token validation using fine-grained locking
- Fix unified agent download serving wrong binary for invalid arch
- Fix AI infra discovery running when AI disabled and missing stop mechanism
2026-02-03 16:35:12 +00:00
rcourtman
300b5592da Fix node drawer colspan causing table layout shift
Compute colspan dynamically based on visible columns (tab type and
temperature data) instead of hardcoded value of 11.
2026-02-03 16:30:50 +00:00
rcourtman
a1b9de8f10 Enhance discovery UI and table consistency
- Fix visual flash in discovery tab

- Standardize table column widths and UI across Docker, Hosts, Storage, etc.

- Add support for new K8s and Host charts

- Fix Service Discovery tests
2026-02-03 16:25:09 +00:00
rcourtman
3d8374e527 Fix AI investigation context and UI settings
- Ensure correct org context is used for AI chat service resolution

- Fix AI adapter tests

- Update AI Intelligence page UI for advanced settings
2026-02-03 16:24:56 +00:00
rcourtman
aeca5e39fa Fix multi-tenant persistence and backend stability
- Initialize Alert and Notification managers with tenant-specific data directories

- Add panic recovery to WebSocket safeSend for stability

- Record host metrics to history for sparkline support
2026-02-03 16:24:42 +00:00
rcourtman
bea3bbe5f6 Fix API token authentication and multi-tenancy logic
- Fix AuthContextMiddleware to use tenant-specific config for token validation

- Resolve data race in token LastUsedAt update

- Fix invalid org IDs returning 501/402 instead of 400

- Prevent unauthenticated organization directory creation (DoS protection)
2026-02-03 16:24:28 +00:00
rcourtman
88d95f40be feat: add Discovery Transparency & Trust features
- Add AI provider indicator showing local (Ollama) vs cloud (Anthropic/OpenAI) analysis
- Add "What Discovery Does" explanation section before first scan
- Show commands preview before scan so users know what will run
- Add scan details section showing raw command outputs for admins
- Filter sensitive Docker labels (passwords, secrets, tokens) before AI analysis
- Add comprehensive tests for label filtering

This improves sysadmin confidence by making discovery transparent about
what it does, what data it collects, and where that data goes.
2026-02-03 14:59:27 +00:00
rcourtman
8720708e70 fix: address AI patrol concurrency and streaming issues
- HIGH: Create per-request AgenticLoop instead of sharing one across
  concurrent sessions. This prevents race conditions where ExecuteStream
  calls would overwrite each other's FSM, knowledge accumulator, and
  other session-specific state.

- MEDIUM: TriggerManager.GetStatus now recomputes adaptive interval after
  pruning old events. Previously, currentInterval could remain stuck in
  busy/quiet mode after events aged out of the window.

- MEDIUM: Patrol stream phases are now broadcast to subscribers. Fixed
  setStreamPhase() to emit phase events and SubscribeToStream() to send
  phase events to late joiners. UI was stuck on 'Starting patrol...'
  because phase events were never emitted.

- LOW: Fixed TriggerStatus.CurrentInterval JSON serialization. Changed
  from time.Duration (serializes as nanoseconds) to int64 milliseconds
  to match the 'current_interval_ms' tag.
2026-02-03 14:39:00 +00:00
rcourtman
565b2ef51d Fix: prevent discovery button from flashing during scan/refetch cycle
- Keep isScanning=true until after refetch completes
- Show "Loading results..." at 100% during refetch phase
- Add !isScanning() guard to both "Run Discovery" button sections
2026-02-03 14:29:13 +00:00
rcourtman
4caa30534d Fix: PBS instance selection now filters backup list correctly
When selecting a PBS instance from the node summary table, the backup
list now correctly filters to show only backups from that specific
PBS instance. Previously, the nodeType parameter was ignored and the
filter logic only handled PVE nodes, causing PBS selection to have
no effect.

Related to #1182
2026-02-03 14:27:43 +00:00
rcourtman
c2ed6067f1 Fix: discovery routing, host identification, and UX feedback
- Fix routing for POST/PUT/DELETE on /api/discovery/host/ endpoints
  (Go's http.ServeMux was matching the longer prefix before method-specific routes)
- Add HOST-specific AI prompt that focuses on identifying the host OS
  rather than services/containers running on it
- Add success message UI after discovery completes
- Fix timing so success appears after data is visible (not during refetch)
- Add error handling and display for failed discoveries
2026-02-03 14:10:54 +00:00
rcourtman
896b5bfc89 Fix: enable backup monitoring for PVE instances via config migration
Adds a config migration that ensures MonitorBackups is enabled for PVE
instances, matching the existing PBS migration from issue #411. This fixes
issue #1139 where local PVE backups weren't appearing in the backup overview
because the MonitorBackups field defaulted to false when not explicitly set.

Fixes #1139
2026-02-03 13:38:41 +00:00
rcourtman
86a7c2283c Revert "Detect incompatible models that don't support function calling"
This reverts commit 11a72ee263.
2026-02-03 13:36:30 +00:00
rcourtman
c6318a8484 Revert "Simplify incompatible model error message"
This reverts commit c58fe81700.
2026-02-03 13:36:30 +00:00
rcourtman
c58fe81700 Simplify incompatible model error message 2026-02-03 13:30:54 +00:00
rcourtman
11a72ee263 Detect incompatible models that don't support function calling
When local LLM servers (LM Studio, llama.cpp) receive tool definitions
but the model doesn't support function calling, they output internal
control tokens like <|channel|>, <|im_start|>, etc. instead of proper
responses.

This change detects these control tokens during streaming and returns
a clear error message explaining that the model doesn't support function
calling and recommending compatible models (Llama 3.1+, Mistral, Qwen).

This is better than the previous approach of offering a "disable tools"
option, which would have crippled Pulse Assistant/Patrol functionality.
Users need to use compatible models for the AI features to work properly.

Related to #1154
2026-02-03 13:28:37 +00:00
rcourtman
a55ae78715 Revert "Add config option to disable tools for OpenAI-compatible endpoints"
This reverts commit 81229f206f.
2026-02-03 13:26:26 +00:00
rcourtman
81229f206f Add config option to disable tools for OpenAI-compatible endpoints
Some local LLM servers (LM Studio, llama.cpp) expose OpenAI-compatible
APIs but don't support function calling. When tools are sent to these
models, they output raw control tokens instead of proper responses.

This change adds:
- openai_tools_disabled config field in AIConfig
- AreToolsDisabledForProvider() method to check at runtime
- API support to get/set the new setting
- Tests for the new functionality

When enabled and using a custom OpenAI base URL, the chat service will
skip sending tools to the model, allowing basic chat functionality to
work even with models that don't support function calling.

Fixes #1154
2026-02-03 13:21:44 +00:00
rcourtman
e3556455c6 Revert "Sanitize LLM control tokens from OpenAI-compatible responses"
This reverts commit e5eb15918e.
2026-02-03 13:14:33 +00:00
rcourtman
e5eb15918e Sanitize LLM control tokens from OpenAI-compatible responses
Some local models (llama.cpp, LM Studio) output internal control tokens
like <|channel|>, <|constrain|>, <|message|> instead of using proper
function calling. These tokens leak into the UI creating a poor UX.

This adds sanitization to strip these control tokens from both streaming
and non-streaming responses before they reach the user.
2026-02-03 13:12:17 +00:00
rcourtman
71f80c8a99 Fix: alert resolution now records incident timeline during quiet hours
- Fixed early return in handleAlertResolved that skipped incident recording
  when quiet hours suppressed recovery notifications
- Added Host Agent alert delay configuration (backend + UI)
- Host Agents now have dedicated time threshold settings like other resource types

Related to #1179
2026-02-03 12:49:41 +00:00
rcourtman
174ac481c8 Add Windows uninstall command to UI
Update the Uninstall agent section to display both Linux/macOS and
Windows uninstall commands with clear platform labels.

Related to #1176
2026-02-03 12:04:46 +00:00
rcourtman
c2de5f7f4c Fix: add Windows uninstall command support for unified agent
The UI only showed a bash uninstall command which doesn't work on Windows.
Added PULSE_UNINSTALL env var support to install.ps1 and updated the UI
to display platform-specific uninstall commands for both Linux/macOS and
Windows.

Related to #1176
2026-02-03 12:03:06 +00:00
rcourtman
900e05025a Fix OpenAI-compatible endpoint support for chat
Two issues fixed:

1. Custom base URL wasn't being passed to the OpenAI client in
   createProviderForModel() - requests went to api.openai.com instead
   of the configured endpoint (e.g., LM Studio, llama.cpp)

2. Tool schemas were missing the "properties" field when tools had no
   parameters. OpenAI API requires "properties" to always be present
   as an object, even if empty.

Fixes #1154
2026-02-03 12:03:06 +00:00
rcourtman
1be9e6a024 Enhance Kiosk Mode: auto-enable logic and magic link generation
This improves the UX for setting up unattended displays by:1. Automatically enabling visual Kiosk mode when a token has monitoring-only scope (unless explicitly disabled).2. Providing a ready-to-use 'Magic Link' (with ?token=...&kiosk=1) upon token creation.
2026-02-03 12:03:06 +00:00
rcourtman
35eedcb5ac Fix: metrics store tier fallback for mock mode sparklines
When querying short time ranges (1h, 6h), the metrics store only looked
in TierRaw and TierMinute which were empty in mock mode. The seeded data
was stored in TierHourly and TierDaily.

Updated tierFallbacks to include coarser tiers as fallbacks:
- TierRaw now falls back to TierMinute, then TierHourly
- TierMinute now falls back to TierRaw, then TierHourly

This ensures sparkline data is available in mock/demo mode where
historical data is seeded into coarser tiers.
2026-02-03 12:03:06 +00:00
rcourtman
8495878553 Fix: improve mock metrics sampler startup performance
- Reduce minimum seed duration from 7 days to 1 hour for faster startup
  on resource-constrained systems (like demo server 1GB droplet)
- Reduce sleep times from 200ms to 50ms between resource processing
- Add diagnostic logging throughout mock metrics seeding to help debug
  issues where sparklines show no data
- Add progress logging for nodes, VMs, containers, storage, docker hosts
2026-02-03 12:03:06 +00:00
rcourtman
7cc3f77097 Auto-update Helm chart version to 5.1.0-rc.1 helm-chart-5.1.0-rc.1 2026-02-03 00:55:38 +00:00
rcourtman
a61f1b387a Fix: data race in Docker detection test mock — add mutex for concurrent calls v5.1.0-rc.1 2026-02-03 00:12:16 +00:00
rcourtman
445c5c0587 Fix: remove install-sensor-proxy.sh from release workflow (script was removed) 2026-02-03 00:08:19 +00:00
rcourtman
ed5ab5eebf Fix: flaky metrics fallback test — use WriteBatchSync for deterministic writes 2026-02-02 23:32:28 +00:00
rcourtman
df0d90fb69 Fix: regenerate package-lock.json for ESLint v9 upgrade 2026-02-02 23:25:21 +00:00
rcourtman
6ff5ca94c3 Bump version to 5.1.0-rc.1 2026-02-02 23:22:04 +00:00
rcourtman
744eeb0270 Chore: clean up staged changes for release
- Remove standalone pulse-assistant architecture doc (content lives in CLAUDE.md)
- Add CountdownTimer component for patrol schedule display
- Rewrite patrol handler test to focus on interval persistence
- Extract MockStateProvider to shared test file
2026-02-02 23:17:40 +00:00