Commit Graph

187 Commits

Author SHA1 Message Date
rcourtman
1a518321c2 Add csv formatValue coverage 2026-02-05 11:45:22 +00:00
rcourtman
f1fece52ed Add PDF backup and duration coverage 2026-02-05 11:44:28 +00:00
rcourtman
4ec2df5071 Add cluster membership and not-implemented coverage 2026-02-05 11:42:35 +00:00
rcourtman
ed6f15d49e Cover multi-report defaults and metric filtering 2026-02-05 11:41:30 +00:00
rcourtman
2f4bcc83f9 Add cluster client fallback coverage 2026-02-05 11:40:08 +00:00
rcourtman
2e62754efc Extend report engine and cluster client coverage 2026-02-05 11:38:55 +00:00
rcourtman
5fe467c4e5 Add report engine and cluster client coverage 2026-02-05 11:36:54 +00:00
rcourtman
5c18748742 Add SMART disk lifecycle monitoring with historical charts
Expand the smartctl collector to capture detailed SMART attributes (SATA
and NVMe), propagate them through the full data pipeline, persist them
as time-series metrics, and display them in an interactive disk detail
drawer with historical sparkline charts.

Backend: add SMARTAttributes struct, writeSMARTMetrics for persistent
storage, "disk" resource type in metrics API with live fallback.
Frontend: enhanced DiskList with Power-On column and SMART warnings,
new DiskDetail drawer matching NodeDrawer styling patterns, generic
HistoryChart metric support with proper tooltip formatting.
2026-02-04 13:35:40 +00:00
rcourtman
6427f28a08 Fix stale metrics store reference in reporting engine after monitor reload
The reporting engine held a direct pointer to the metrics store, which
becomes invalid after a monitor reload (settings change, node config
save, etc.) closes and recreates the store. Use a dynamic getter closure
that always resolves to the current monitor's active store.

Also adds diagnostic logging when report queries return zero metrics,
and integration tests covering the full metrics-to-report pipeline
including reload scenarios.

Fixes #1186
2026-02-04 12:34:40 +00:00
rcourtman
5c1487e406 feat: add resource picker and multi-resource report generation
Replace manual resource ID entry with a searchable, filterable resource
picker that uses live WebSocket state. Support selecting multiple
resources (up to 50) for combined fleet reports.

Multi-resource PDFs include a cover page, fleet summary table with
aggregate health status, and condensed per-resource detail pages with
overlaid CPU/memory charts. Multi-resource CSVs include a summary
section followed by interleaved time-series data with resource columns.

New POST /api/admin/reports/generate-multi endpoint handles multi-resource
requests while the existing single-resource GET endpoint remains unchanged.

Also fixes resource ID validation regex to allow colons used in
VM/container IDs (e.g., "instance:node:vmid").
2026-02-04 10:24:23 +00:00
rcourtman
2ebe65bbc5 security: add scope checks to AI Patrol and agent profile endpoints
- AI Patrol mutation endpoints (acknowledge, dismiss, suppress, snooze, resolve,
  findings/note, suppressions/*) now require ai:execute scope to prevent
  low-privilege tokens from blinding patrol by hiding/suppressing findings

- Agent profile admin endpoints (/api/admin/profiles/*) now require
  settings:write scope to prevent low-privilege tokens from modifying
  fleet-wide agent behavior
2026-02-03 19:29:56 +00:00
rcourtman
4fdc0cae64 feat(reporting): enrich metric reports with detailed resource info 2026-02-03 18:51:27 +00:00
rcourtman
11aa3e05af fix(reporting): correct SSD life documentation and logic
- Fix misleading comment in DiskInfo struct that said "percentage of
  life used" when it's actually "percentage of life REMAINING"
- Document that 100 = healthy, 0 = end of life, -1 = unknown
- This matches the Proxmox API behavior where wearout "100 is best"
2026-02-03 18:24:09 +00:00
rcourtman
9f412e69b3 fix(reporting): correct SSD life interpretation (100% = healthy, not worn)
The WearLevel field represents SSD life REMAINING, not wear used:
- 100% = fully healthy (new drive)
- 0% = end of life

Fixed logic to:
- Show critical warning when life <= 10% (not >= 90%)
- Show warning when life <= 30% (not >= 70%)
- Display values in green when healthy (>30% life remaining)
- Rename column from "Wear" to "Life" for clarity
2026-02-03 18:20:54 +00:00
rcourtman
442d29e9b9 feat(reporting): enhance PDF reports with Executive Summary and actionable insights
- Add professional cover page with branding and report period
- Add Executive Summary page with health status banner (HEALTHY/WARNING/CRITICAL)
- Add Quick Stats section with color-coded metrics and trend indicators
- Add Key Observations with automated analysis of CPU, memory, disk, and disk wear
- Add Recommended Actions section with prioritized, actionable items
- Add Resource Details page with hardware info, storage pools, physical disks
- Add color-coded tables for alerts, storage, and disk health
- Add performance charts with area fills and proper scaling
- Improve overall visual design with consistent color scheme
- Fix SAML session invalidation to use correct SessionStore method
2026-02-03 18:17:31 +00:00
rcourtman
c6aeb9429b fix: initialize reporting engine in standard binary
Pro license holders running the standard Docker image/binary were
getting "Reporting engine not initialized" errors because the
reporting engine was only wired up in the enterprise build.

Now the core server initializes the reporting engine automatically
when the metrics store is ready, ensuring PDF/CSV report generation
works for all Pro license holders regardless of which binary they use.

The enterprise hooks are still honored if set, allowing the enterprise
build to override with its own implementation if needed.
2026-02-03 16:53:20 +00:00
rcourtman
35eedcb5ac Fix: metrics store tier fallback for mock mode sparklines
When querying short time ranges (1h, 6h), the metrics store only looked
in TierRaw and TierMinute which were empty in mock mode. The seeded data
was stored in TierHourly and TierDaily.

Updated tierFallbacks to include coarser tiers as fallbacks:
- TierRaw now falls back to TierMinute, then TierHourly
- TierMinute now falls back to TierRaw, then TierHourly

This ensures sparkline data is available in mock/demo mode where
historical data is seeded into coarser tiers.
2026-02-03 12:03:06 +00:00
rcourtman
ed5ab5eebf Fix: flaky metrics fallback test — use WriteBatchSync for deterministic writes 2026-02-02 23:32:28 +00:00
rcourtman
eb2d07e48f Chore: enhance core api and metrics testability
Refactor Router to allow HTTP client injection for install script proxying. Add tests for unified agent install mechanism and additional metrics store coverage.
2026-02-02 22:01:36 +00:00
rcourtman
3b347b6548 fix: harden SQLite against I/O contention causing persistent lock errors
- Move all SQLite pragmas from db.Exec() to DSN parameters so every
  connection the pool creates gets busy_timeout and other settings.
  Previously only the first connection had these applied.
- Set MaxOpenConns(1) on audit, RBAC, and notification databases
  (metrics already had this). Fixes potential for multiple connections
  where new ones lack busy_timeout.
- Increase busy_timeout from 5s to 30s across all databases to
  tolerate disk I/O pressure during backup windows.
- Fix nested query deadlocks in GetRoles(), GetUserAssignments(), and
  CancelByAlertIDs() that would deadlock with MaxOpenConns(1).
- Fix circuit breaker retryInterval not resetting on recovery, which
  caused the next trip to start at 5-minute backoff instead of 5s.

Related to #1156
2026-02-02 17:29:14 +00:00
rcourtman
4af5fc4246 refactor(config): rename BackendHost/BackendPort to BindAddress
Simplify server config by consolidating BackendHost and BackendPort into
a single BindAddress field. The port is now solely controlled by FrontendPort.

Changes:
- Replace BackendHost/BackendPort with BindAddress in Config struct
- Add deprecation warning for BACKEND_HOST env var (use BIND_ADDRESS)
- Update connection timeout default from 45s to 60s
- Remove backendPort from SystemSettings and frontend types
- Update server.go to use cfg.BindAddress
- Update all tests to use new config field names
2026-02-01 23:26:32 +00:00
rcourtman
0c802e7083 fix(patrol): improve service lifecycle, graceful shutdown, and concurrency 2026-02-01 16:27:25 +00:00
rcourtman
13a6f7750c Minor updates to main and proxmox client 2026-01-28 16:52:50 +00:00
rcourtman
7f7edfceb4 test: expand backend coverage 2026-01-25 21:08:44 +00:00
rcourtman
fd2c53a84e Gate multi-tenant migration by flag 2026-01-24 15:41:02 +00:00
rcourtman
c4ca169e2b feat: add multi-tenant isolation foundation (disabled by default)
Implements multi-tenant infrastructure for organization-based data isolation.
Feature is gated behind PULSE_MULTI_TENANT_ENABLED env var and requires
Enterprise license - no impact on existing users.

Core components:
- TenantMiddleware: extracts org ID, validates access, 501/402 responses
- AuthorizationChecker: token/user access validation for organizations
- MultiTenantChecker: WebSocket upgrade gating with license check
- Per-tenant audit logging via LogAuditEventForTenant
- Organization model with membership support

Gating behavior:
- Feature flag disabled: 501 Not Implemented for non-default orgs
- Flag enabled, no license: 402 Payment Required
- Default org always works regardless of flag/license

Documentation added: docs/MULTI_TENANT.md
2026-01-23 21:42:27 +00:00
rcourtman
8963d69764 feat: add metrics store point limiting and mock improvements
- Add point limiting to metrics queries
- Improve mock metrics history for testing
- Add monitor enhancements
2026-01-22 22:29:56 +00:00
rcourtman
5f56efa88a Refactor: Core monitoring and update managers multi-tenancy
- Updated monitoring reload and metrics history to be tenant-aware
- Refactored update manager and checksum validation for multi-tenancy
- Enhanced test coverage for agent updates and metrics storage
2026-01-22 16:43:24 +00:00
rcourtman
289d95374f feat: add multi-tenancy foundation (directory-per-tenant)
Implements Phase 1-2 of multi-tenancy support using a directory-per-tenant
strategy that preserves existing file-based persistence.

Key changes:
- Add MultiTenantPersistence manager for org-scoped config routing
- Add TenantMiddleware for X-Pulse-Org-ID header extraction and context propagation
- Add MultiTenantMonitor for per-tenant monitor lifecycle management
- Refactor handlers (ConfigHandlers, AlertHandlers, AIHandlers, etc.) to be
  context-aware with getConfig(ctx)/getMonitor(ctx) helpers
- Add Organization model for future tenant metadata
- Update server and router to wire multi-tenant components

All handlers maintain backward compatibility via legacy field fallbacks
for single-tenant deployments using the "default" org.
2026-01-22 13:39:06 +00:00
rcourtman
c75972d57c Fix mock metrics history and guest drawer controls 2026-01-22 09:39:53 +00:00
rcourtman
2e0da42a81 chore: reliability and maintenance improvements
Host agent:
- Add SHA256 checksum verification for downloaded binaries
- Verify checksum file matches expected bundle filename

WebSocket:
- Add write failure tracking with graceful disconnection
- Increase write deadline to 30s for large state payloads
- Better handling for slow clients (Raspberry Pi, slow networks)

Monitoring:
- Remove unused temperature proxy imports
- Add monitor polling improvements
- Expand test coverage

Other:
- Update package.json dependencies
- Fix generate-release-notes.sh path handling
- Minor reporting engine cleanup
2026-01-22 00:45:04 +00:00
rcourtman
c8b6cbfc6d feat(pro): long-term metrics history (30d/90d)
- Add FeatureLongTermMetrics license feature for Pro tier
- Implement tiered storage in metrics store (raw, minute, hourly, daily)
- Add covering index for unified history query performance
- Seed mock data for 90 days with appropriate aggregation tiers
- Update PULSE_PRO.md to document the feature
- 7-day history remains free, 30d/90d requires Pro license
2026-01-22 00:42:41 +00:00
rcourtman
bb47e1831c security: SSRF protection for webhook URLs
- Add DNS resolution validation to block webhooks to internal IPs
- Validate hostname resolves before accepting webhook URL
- Block metadata endpoints (AWS, GCP, Azure)
- Block localhost, private IPs, and reserved ranges
- Add context timeout for DNS lookups (2s)
2026-01-22 00:42:23 +00:00
rcourtman
61bb582d82 fix: disk-exclude now works with device paths and disk I/O
- Add MatchesDiskExclude() to check both device path and mountpoint
- Add MatchesDeviceExclude() for device-only matching
- Update collectDisks to check device in addition to mountpoint
- Update collectDiskIO to respect disk exclusions
- Patterns like /dev/sda, sda, or /mnt/backup all work now

Related to #1142
2026-01-21 19:03:05 +00:00
rcourtman
c44cb5af5b fix: use pure Go SQLite driver for arm64 compatibility
Switch from mattn/go-sqlite3 (CGO) to modernc.org/sqlite (pure Go)
for auth, audit, and notification queue storage. This enables SQLite
functionality on arm64 Docker images which are built with CGO_ENABLED=0.

Related to #1140
2026-01-21 18:58:23 +00:00
rcourtman
ebc29b4fdb feat: show pending apt updates for Proxmox nodes (#1083)
- Add PendingUpdates and PendingUpdatesCheckedAt fields to Node model
- Add GetNodePendingUpdates method to Proxmox client (calls /nodes/{node}/apt/update)
- Add 30-minute polling cache to avoid excessive API calls
- Add pendingUpdates to frontend Node type
- Add color-coded badge in NodeSummaryTable (yellow: 1-9, orange: 10+)
- Update test stubs for interface compliance

Requires Sys.Audit permission on Proxmox API token to read apt updates.
2026-01-21 10:53:36 +00:00
rcourtman
ecc31730f6 Remove OpenCode references 2026-01-20 16:56:41 +00:00
rcourtman
96b7370f7b test: improve coverage for API, AI, Alerts, and Frontend Utils
- Add comprehensive tests for internal/api/config_handlers.go (Phases 1-3)
- Improve test coverage for AI tools, chat service, and session management
- Enhance alert and notification tests (ResolvedAlert, Webhook)
- Add frontend unit tests for utils (searchHistory, tagColors, temperature, url)
- Add proximity client API tests
2026-01-20 15:52:39 +00:00
rcourtman
a6a8efaa65 test: Add comprehensive test coverage across packages
New test files with expanded coverage:

API tests:
- ai_handler_test.go: AI handler unit tests with mocking
- agent_profiles_tools_test.go: Profile management tests
- alerts_endpoints_test.go: Alert API endpoint tests
- alerts_test.go: Updated for interface changes
- audit_handlers_test.go: Audit handler tests
- frontend_embed_test.go: Frontend embedding tests
- metadata_handlers_test.go, metadata_provider_test.go: Metadata tests
- notifications_test.go: Updated for interface changes
- profile_suggestions_test.go: Profile suggestion tests
- saml_service_test.go: SAML authentication tests
- sensor_proxy_gate_test.go: Sensor proxy tests
- updates_test.go: Updated for interface changes

Agent tests:
- dockeragent/signature_test.go: Docker agent signature tests
- hostagent/agent_metrics_test.go: Host agent metrics tests
- hostagent/commands_test.go: Command execution tests
- hostagent/network_helpers_test.go: Network helper tests
- hostagent/proxmox_setup_test.go: Updated setup tests
- kubernetesagent/*_test.go: Kubernetes agent tests

Core package tests:
- monitoring/kubernetes_agents_test.go, reload_test.go
- remoteconfig/client_test.go, signature_test.go
- sensors/collector_test.go
- updates/adapter_installsh_*_test.go: Install adapter tests
- updates/manager_*_test.go: Update manager tests
- websocket/hub_*_test.go: WebSocket hub tests

Library tests:
- pkg/audit/export_test.go: Audit export tests
- pkg/metrics/store_test.go: Metrics store tests
- pkg/proxmox/*_test.go: Proxmox client tests
- pkg/reporting/reporting_test.go: Reporting tests
- pkg/server/*_test.go: Server tests
- pkg/tlsutil/extra_test.go: TLS utility tests

Total: ~8000 lines of new test code
2026-01-19 19:26:18 +00:00
rcourtman
d06ed2edb3 refactor: Add testability improvements to core packages
hostagent/commands.go:
- Extract execCommandContext as mockable variable

hostagent/proxmox_setup.go:
- Convert stateFilePath constants to variables (testable)
- Extract runCommand and lookPath as mockable functions
- Add duplicate comment (minor cleanup needed)

notifications/notifications.go:
- Add GetQueueStats() method for interface compliance
- Used by NotificationMonitor interface

updates/manager.go:
- Add AddSSEClient, RemoveSSEClient, GetSSECachedStatus methods
- Enables interface-based SSE client management

pkg/audit/export.go:
- Minor testability improvements

go.mod/go.sum:
- Add stretchr/objx v0.5.2 (test mocking dependency)
2026-01-19 19:25:38 +00:00
rcourtman
dc16c94766 fix: Add robustness improvements to approval, auth, and server
approval/store.go:
- Make Approve() idempotent - return success if already approved
- Handles double-clicks and race conditions gracefully

auth.go:
- Add dev mode admin bypass (disabled by default)
- When ALLOW_ADMIN_BYPASS=1, sets X-Authenticated-User header

server.go:
- Call router.StopOpenCodeAI() during shutdown
- Ensures AI service stops cleanly on server termination
2026-01-19 19:24:45 +00:00
rcourtman
d73e57af86 Initialize SQLite audit logger for Pro license with audit_logging feature
The audit logging feature was showing the UI for Pro users but the
SQLiteLogger was never actually initialized - it fell back to the
ConsoleLogger which only writes to console and returns empty arrays
for queries.

This fix:
- Adds initAuditLoggerIfLicensed() helper to license_handlers.go
- Calls it when loading a persisted license at startup
- Calls it when activating a new license via API
- Creates SQLiteLogger with 90-day default retention when audit_logging
  feature is enabled

The audit.db will be created in {dataDir}/audit/ when Pro is licensed.
2026-01-13 10:06:48 +00:00
rcourtman
b177812fd3 revert: remove accidentally committed WIP OpenCode changes
Reverts unintended changes from 4e064aa0 that broke the frontend build.
The workflow fix for cmd/pulse package build remains intact.
2026-01-13 09:15:42 +00:00
rcourtman
4e064aa0cc fix: build entire cmd/pulse package, not just main.go
The static binary build was only compiling main.go, missing bootstrap.go
and config.go which define osExit, bootstrapTokenCmd, and configCmd.
2026-01-13 09:06:21 +00:00
rcourtman
3199c258d2 feat(reporting): add advanced reporting engine with CSV/PDF export
- Add reporting engine for scheduled and on-demand reports
- Implement CSV export functionality
- Implement PDF report generation
2026-01-12 15:21:28 +00:00
rcourtman
0ddbf37c59 feat(auth): add policy evaluator and SQLite auth manager for RBAC
- Add policy evaluator for fine-grained access control
- Implement SQLite-backed auth manager for user/role persistence
- Support role-based permissions evaluation
2026-01-12 15:20:49 +00:00
rcourtman
d0ba203203 feat(audit): add comprehensive audit logging system
- Add SQLite-backed audit logger for persistent audit trails
- Implement cryptographic signing for tamper detection
- Add audit log export functionality
- Add webhook notifications for audit events
2026-01-12 15:20:33 +00:00
rcourtman
b2a6cd0fa3 fix(agent): add FreeBSD platform support to agent download and UI (#1051)
- Add freebsd-amd64 and freebsd-arm64 to normalizeUnifiedAgentArch()
  so the download endpoint serves FreeBSD binaries when requested
- Add FreeBSD/pfSense/OPNsense platform option to agent setup UI
  with note about bash installation requirement
- Add FreeBSD test cases to unified_agent_test.go

Fixes installation on pfSense/OPNsense where users were getting 404
errors because the backend didn't recognize the freebsd-amd64 arch
parameter from install.sh.
2026-01-11 23:51:12 +00:00
rcourtman
f527e6ebd0 docs: fix Kubernetes DaemonSet deployment guide
Fixes #1091 - addresses all three documentation issues reported:

1. Binary path: Changed from /usr/local/bin/pulse-agent (which doesn't
   exist in the main image) to /opt/pulse/bin/pulse-agent-linux-amd64

2. PULSE_AGENT_ID: Added to example and documented why it's required
   for DaemonSets (prevents token conflicts when all pods share one
   API token)

3. Resource visibility flags: Added PULSE_KUBE_INCLUDE_ALL_PODS and
   PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS to example, with explanation
   of the default behavior (show only problematic resources)

Also added tolerations, resource requests/limits, and ARM64 note.
2026-01-11 21:43:23 +00:00
rcourtman
80444a9022 fix(monitor): use cluster quorum status instead of endpoint count for health
Previously, when some cluster endpoints were unreachable (e.g., backup
nodes intentionally offline), the cluster was marked as "degraded" even
though the Proxmox cluster itself was healthy and had quorum.

Now the connection health check queries the Proxmox cluster's actual
quorum status. A cluster is only marked "degraded" if it has lost
quorum (not enough votes for consensus), which is the actual indicator
of cluster instability.

This means:
- Cluster with quorum + some nodes offline = "healthy"
- Cluster without quorum = "degraded" (warning)
- All endpoints down = "error"

Fixes #1085
2026-01-11 11:54:02 +00:00