285 Commits

Author SHA1 Message Date
rcourtman
4343910fc3 feat(api): add upgrade-metrics routes, system settings, and disable flag 2026-02-17 11:36:30 +00:00
rcourtman
941b7fac6c fix: resolve all test failures after 44-branch v6 merge
After merging 44 parallel feature branches into pulse/v6, ~33 test
packages broke due to interface changes, new validation requirements,
and cross-branch conflicts. This commit fixes all failures to restore
82/82 packages passing.

Production fixes:
- notifications: call normalizeEmailConfig in SetEmailConfig, fix
  CancelByAlertIDs missing completed_at/next_retry_at, fix
  performCleanup FK constraint (delete audit rows first), fix Stop()
  deadlock (release mutex before waiting on cleanupDone)
- relay: store startupErr in Client struct, preserve nil
  LicenseTokenFunc/TokenValidator instead of replacing with stubs,
  trim license token in register
- agentexec: add RequestID length validation before agent lookup,
  add shutdown case to ReadFile select
- discovery: skip invalid subnet tokens instead of rejecting all
- sensors: return error on invalid thermal fallback value
- monitoring: wire in sanitizeSSHCommandError for SSH errors
- hostagent: use readProcMDStat in listArrayDevices/getRebuildSpeed
- mock: fix disableMockMode double-unlock (use stopUpdateLoop)
- models: preserve non-nil empty slices in state snapshots

Test fixes:
- Add ?org_id=default to websocket test URLs (tenant isolation)
- Update test expectations for tightened org token access
- Fix SAML route inventory (prefix-based bypass, not exact match)
- Update error message expectations for wrapped errors
- Fix nil pointer in NVMe SMART test (value→pointer type change)
- Fix PulseURL in hostagent tests (loopback required for HTTP)
- Remove duplicate/conflicting test files from parallel branches
- Reconcile contradictory buffer capacity test expectations
2026-02-14 20:07:42 +00:00
rcourtman
0dce6f38df fix: port 19 unreleased hotfixes from 5.1.x into v6
Ports all unreleased fixes from pulse/hotfix-5.1.3 branch, adapted for
the v6 unified resource model where file paths and patterns differ.

Proxmox cluster health:
- Endpoint health: blocklist→allowlist (only connectivity errors mark unhealthy)
- Guest agent error wrapping with context prefixes
- TOFU fingerprint auto-refresh after TLS certificate renewal
- Test coverage for endpoint health classification

Storage & disk:
- metrics.db bloat prevention (vacuum, auto-vacuum, retention-on-startup)
- Deduplicate bind-mounted volumes in disk totals (K8s + btrfs/zfs)
- Storage aggregation scoped per-instance (prevents cross-cluster merging)

SMART monitoring:
- Parse raw.string instead of raw.value for SATA (fixes Seagate drives)
- Temperature fallback to ATA attributes 194/190

Docker & agents:
- Docker CPU: always use manual delta tracking (PreCPUStats unreliable)
- Agent auto-update: exit(1) on exec failure for systemd restart

Auth & SSO:
- SSO config persistence via ensureSSOConfig() on first access
- SAML route registration (/api/saml/* dispatcher)

AI & patrol:
- Probe all guest IPs for reachability (not just first)
- Discovery interval auto-defaults to 24h when enabled

Misc:
- Profile API trailing slash fix
- Doc comment fix (UpdateAlertDelayHours: -1 = disabled)
2026-02-14 15:17:31 +00:00
rcourtman
fc6ecbc2fb fix: post-merge compile fixes for parallel branch integration
Resolve build errors caused by merge conflict resolution that
accidentally dropped struct fields, variable names, and constants
while keeping code that referenced them:

- updates/manager.go: Add missing lifecycleMu, updateMu, updateInFlight
  fields (from parallel-16-goroutine-safety)
- updates/manager.go: Remove dead code block (duplicate HTTP client/req)
- updates/manager.go: Fix heartbeatStopC -> shutdownCh field name
- updates/sse.go: Fix closeCh -> stopCh references (3 sites)
- monitoring/temperature.go: Add missing defaultSSHCommandTimeout const
  (from parallel-30-ssh-management, 15s)
- ai/patrol_run.go: Fix initialDelayTimer -> initialTimer variable name
- updates/host_agent_binaries_test.go: Update test for new
  normalizeHostAgentSymlinkTarget(target, allowedNames) signature

Also includes hostagent package fixes from previous session resolving
duplicate declarations, type mismatches, and undefined fields across
38 files.
2026-02-14 10:53:42 +00:00
rcourtman
821d655be6 merge: parallel-37-tenant-isolation into v6 2026-02-13 22:46:52 +00:00
rcourtman
feac7ba0d7 merge: parallel-33-data-model-integrity into v6 2026-02-13 22:39:15 +00:00
rcourtman
2ecaf5389d merge: parallel-19-monitoring-observability into v6 2026-02-13 22:38:52 +00:00
rcourtman
95124bde16 merge: parallel-18-graceful-shutdown into v6 2026-02-13 22:37:29 +00:00
rcourtman
88d6448b54 merge: parallel-16-goroutine-safety into v6 2026-02-13 22:36:23 +00:00
rcourtman
c8719186a5 merge: parallel-12-security-hardening into v6
Resolved 53 conflicts, preferring branch's security improvements:
- Response body size limiting (readResponseBodyLimited)
- Command output capping (cappedBuffer)
- Request ID validation (validateRequestID)
- Scoped pending request keys (pendingReqs)
- Input sanitization (sanitizeCapacity, normalizeTarget)
While preserving HEAD's error handling helpers and logging context.
2026-02-13 22:29:50 +00:00
rcourtman
9ce6907d30 merge: parallel-05-error-handling into v6 2026-02-13 22:11:51 +00:00
rcourtman
cb511e2266 merge: parallel-44-circuit-breakers into v6 2026-02-13 22:10:59 +00:00
rcourtman
1c3b37c9ff merge: parallel-39-audit-logging into v6 2026-02-13 22:07:30 +00:00
rcourtman
6ef3f63331 merge: parallel-36-database-persistence into v6 2026-02-13 22:03:35 +00:00
rcourtman
893b9fe6dc merge: parallel-42-background-queues into v6 2026-02-13 22:03:08 +00:00
rcourtman
0662914989 merge: parallel-40-proxmox-client into v6 2026-02-13 22:02:07 +00:00
rcourtman
d223bb5c47 merge: parallel-20-docker-agent-hardening into v6 2026-02-13 21:56:54 +00:00
rcourtman
43bd4ac54a merge: parallel-08-component-consolidation into v6
Resolves conflicts:
- monitor_agents.go: Use models.IOMetrics + clampToInt64 (consolidating
  types.IOMetrics from HEAD and bare IOMetrics from parallel-08)
- hostagent/agent.go: Remove stale internal/buffer import (consolidated
  into internal/utils by parallel-08)
- hostagent/ceph.go: Take parallel-08 consolidated Ceph types (prefixed
  with Ceph, plain strings instead of typed enums); fix NumOSD→NumOSDs
  typo in parallel branch; remove dead normalizeHealth* functions
- hostagent/ceph_test.go: Update assertions to use plain strings

Component moves: ceph→hostagent, smartctl→hostagent, mdadm→hostagent,
types/metrics→models/metrics_types, errors→monitoring/errors,
infradiscovery→ai/infradiscovery, agentbinaries→updates,
cloudcp/health→cloudcp (flattened)
2026-02-13 21:55:40 +00:00
rcourtman
47fe904500 merge: parallel-06-type-safety into v6
Conflicts resolved by combining both sides' improvements:
- agentexec/server.go: kept sendRequestAndWait DRY refactor, integrated
  NewMessage() type-safe constructor from parallel-06
- agentupdate/update.go: kept setAuthHeaders DRY helper, applied goOS
  typed constants from parallel-06
- ceph/collector.go: applied ServiceType typed constants, fixed NumOSD
  typo from parallel-06 to NumOSDs (matching struct field)
- config/persistence.go: adopted alertSchedulePresence typed struct
  from parallel-06 over raw map[string]interface{} approach
- logging/broadcast.go: adopted ringHistorySnapshot standalone function
  (type-safe), kept subscriberID naming from parallel-04
2026-02-13 21:41:35 +00:00
rcourtman
66f7caef73 refactor(naming-consistency): fix VMIpAddress → VMIPAddress in pkg/proxmox 2026-02-12 18:20:16 +00:00
rcourtman
9a8a1de6f4 refactor(08-component-consolidation): consolidate agentbinaries into updates 2026-02-12 15:22:17 +00:00
rcourtman
1d416c96ae refactor(error-handling): wrap csv write errors in pkg/reporting 2026-02-12 12:20:25 +00:00
rcourtman
b3fa65a00f refactor(44-circuit-breakers): clear stale rate-limit cooldown in pkg/proxmox 2026-02-12 06:45:53 +00:00
rcourtman
25fd0ee273 refactor(12-security-hardening): cap response-body reads in pkg/ 2026-02-12 04:45:56 +00:00
rcourtman
4062592315 refactor(39-audit-logging): audit auto-import outcomes in pkg/server 2026-02-12 04:25:29 +00:00
rcourtman
66c1efefb3 refactor(42-background-queues): guard shutdown writes in pkg/metrics 2026-02-12 04:16:08 +00:00
rcourtman
2e0948339e refactor(37-tenant-isolation): validate tenant audit org IDs in pkg/audit 2026-02-12 04:10:47 +00:00
rcourtman
4617868b7b refactor(40-proxmox-client): retry password auth requests after 401 in pkg/proxmox 2026-02-12 04:05:00 +00:00
rcourtman
2050a131b0 refactor(40-proxmox-client): fix PMG reauth deadlock in pkg/pmg 2026-02-12 02:01:44 +00:00
rcourtman
4ac69479dd refactor(18-graceful-shutdown): harden audit shutdown paths in pkg/ 2026-02-12 01:37:02 +00:00
rcourtman
7cc896424d refactor(33-data-model-integrity): harden shutdown lifecycle in pkg/audit 2026-02-12 01:34:50 +00:00
rcourtman
c5f5b8c22d refactor(monitoring-observability): enrich metrics store write-failure telemetry in pkg/metrics 2026-02-12 01:30:55 +00:00
rcourtman
f1e53d3934 refactor(36-database-persistence): fix offset-only audit queries in pkg 2026-02-12 01:26:59 +00:00
rcourtman
62d0afa623 refactor(18-graceful-shutdown): harden mock shutdown lifecycle in internal/mock 2026-02-12 00:22:31 +00:00
rcourtman
0a5c274e9f refactor(16-goroutine-safety): stop websocket hub goroutines on server exit in pkg/server 2026-02-12 00:17:34 +00:00
rcourtman
35fab694c9 refactor(18-graceful-shutdown): close logging resources on shutdown in internal/logging 2026-02-12 00:06:45 +00:00
rcourtman
425241d353 refactor(type-safety): type API token context key in pkg/auth 2026-02-11 16:57:32 +00:00
rcourtman
9e979916cd feat(cloud): add TierCloud, hosted subscription gating, trial seeding, SAML license check
- Add TierCloud to license features with full Cloud capability set
- Hosted mode: tenant middleware gates on subscription lifecycle
  (active/grace/bounded-trial) instead of FeatureMultiTenant
- Seed trial billing state on hosted signup so tenants are usable
  before Stripe checkout completes (14-day bounded trial)
- SAML SSO creation/update now requires AdvancedSSO license (OIDC
  remains free on all tiers)
- Stripe webhook handlers use TierCloud instead of TierPro for
  hosted checkout/subscription capability derivation
- MultiTenantChecker accepts hostedMode flag for correct WebSocket
  gating in Cloud deployments
- Comprehensive tests for hosted subscription gating (active, trial
  with/without end date, expired, canceled, grace period)
2026-02-10 22:56:50 +00:00
rcourtman
463e4eff50 feat(cloud): implement signup + magic link flow (C-6)
Complete the post-checkout signup flow: Stripe checkout → container
starts → magic link generated → user clicks → logged into tenant
dashboard.

- Add pkg/cloudauth for shared HMAC-SHA256 handoff token sign/verify
- Add internal/cloudcp/auth for control plane magic link service with
  SQLite-backed token store (standalone, no internal/api dependency)
- Add magic link verify handler on control plane that generates a
  short-lived handoff token and redirects to tenant container
- Add /auth/cloud-handoff endpoint on tenant side that validates
  handoff token and creates a session using existing auth machinery
- Expand provisioner to write per-tenant handoff key, poll container
  health (2s interval, 60s timeout), and generate magic link on success
- Wire magic link service into control plane server and routes
2026-02-10 21:54:23 +00:00
rcourtman
99c7b42d20 fix(proxmox): avoid 403 on apt update checks and harden PVE permission setup
Port from 5.1.x hotfix line (815c990e). Adds privilege probing so
the host agent only requests PVE permissions that exist on the target
version (VM.Monitor on PVE 8, VM.GuestAgent.Audit on PVE 9+). Demotes
apt/update 403 to Debug. Setup script uses comma-separated privs and
modify-before-add for the PulseMonitor role.
2026-02-10 18:12:09 +00:00
rcourtman
8a3aabe21d Merge branch 'main' into pulse/rc-00-scope-freeze 2026-02-10 17:17:32 +00:00
rcourtman
ca01fdf56c feat(audit): real per-tenant SQLite audit logging with license gating (W1-B)
- Add SQLiteLoggerFactory implementing LoggerFactory interface, bridging
  dbPath to SQLiteLoggerConfig.DataDir with per-tenant crypto support
- Wire factory into server.go TenantLoggerManager initialization
- Remove stub initAuditLoggerIfLicensed from license_handlers.go
- Make all /api/audit/* handlers tenant-aware via GetOrgID(ctx)
- Register /api/audit/export and /api/audit/summary with audit_logging
  license gate
- Add factory persistence + HMAC signing test
- Add tenant isolation test (org-a events invisible to org-b)

Decision: always capture audit events to SQLite; gate query/export
endpoints behind audit_logging license feature.
2026-02-10 14:52:02 +00:00
rcourtman
298b957222 feat(hosted): wire reaper + cleanup cascade into server lifecycle
- Add OnBeforeDelete hook to Reaper for pre-deletion cleanup
- Store rbacProvider on Router struct for cross-subsystem access
- Add Router.CleanupTenant() cascading RBAC, AI, and license cleanup
- Add LicenseHandlers.RemoveTenantService() for cache eviction
- Wire reaper startup in server.go behind PULSE_HOSTED_MODE=true
2026-02-10 12:47:27 +00:00
rcourtman
e2194f868e feat(relay-docker): improve relay proxy and Docker agent collection
- Enhance relay client with better connection handling
- Improve relay proxy with additional functionality and tests
- Update Docker agent collect with improved metrics gathering
- Add test coverage for Docker agent collection
2026-02-07 16:15:43 +00:00
rcourtman
ffe6c88c8b feat(kubernetes): improve agent metrics and unified resource integration
- Enhance Kubernetes agent with comprehensive usage metrics collection
- Add monitoring improvements for Kubernetes agents
- Integrate Kubernetes resources into unified resource registry
- Add report format improvements for Kubernetes agent reports
- Include new test coverage for usage metrics and registry integration
2026-02-07 16:12:23 +00:00
rcourtman
f6f792c4d4 feat(backend): Implement Unified Resources backend 2026-02-06 16:04:18 +00:00
rcourtman
555de24563 feat(api): update API handlers and service integrations
Refactors API handlers, updates notification logic, and enhances service discovery and configuration management. Includes extensive test coverage updates.
2026-02-06 12:28:55 +00:00
rcourtman
2cad478774 feat(monitoring): enhance metrics collection and history
Updates monitoring logic for better coverage, adds metrics history support, and improves host agent command handling.
2026-02-06 12:28:29 +00:00
rcourtman
1edfa4311e feat: Unified Resource Model and Navigation Redesign
## Summary
Complete implementation of the Unified Resource Model with new navigation.

## Features
- v2 resources API with identity matching across sources (Proxmox, Agent, Docker)
- Infrastructure page with merged host view
- Workloads page for all VMs/LXC/Docker containers
- Global search (Cmd/Ctrl+K) with keyboard navigation
- Mobile navigation with bottom tabs and drawer
- Keyboard shortcuts (g+key navigation, ? for help)
- What's New modal for user onboarding
- Report Incorrect Merge feature for false positive fixes
- Debug tab in resource drawer (enable via localStorage)

## Technical
- Async audit logging for improved performance
- WebSocket-driven real-time updates for unified resources
- Session-based auth achieves <2ms API response times

## Tests
- Backend: 78 tests passed
- Frontend: 397 tests passed
2026-02-05 17:57:59 +00:00
rcourtman
6c170165a5 Cover PMG cluster status without params 2026-02-05 13:24:50 +00:00