Commit Graph

1556 Commits

Author SHA1 Message Date
rcourtman
5d165fc055 docs: Fix CONFIGURATION.md - logFormat not in system.json
The logFormat setting is only available via LOG_FORMAT environment
variable, not in system.json. Updated the example and added a note
clarifying this. Also added LOG_FORMAT to the environment variables
table.
2025-12-02 23:43:45 +00:00
rcourtman
9de0c1cdb1 docs: Fix rollback instructions in INSTALL.md
The doc claimed a "Restore previous version" button exists in Settings UI,
but this doesn't exist. The rollback API endpoint exists in backend code
but has no UI. Updated to reflect actual behavior: backups are created
during systemd updates and can be restored manually.
2025-12-02 23:42:05 +00:00
rcourtman
8ad02ce048 docs: Remove non-existent LXC installation references
- FAQ.md: Replace LXC installer one-liner with Docker quick start
- MIGRATION.md: Replace LXC mention with Kubernetes
- README.md: Remove "Proxmox LXC" from installation methods list

The install.sh script is a unified agent installer, not an LXC
container creator. Pulse server installation is via Docker,
Kubernetes helm, or manual systemd setup.
2025-12-02 23:40:00 +00:00
rcourtman
96da426a63 docs: Fix DOCKER.md - Alpine-based image with shell access 2025-12-02 23:38:37 +00:00
rcourtman
bf619b9628 docs: Fix /api/storage endpoint path in API.md 2025-12-02 23:37:59 +00:00
rcourtman
8a54156632 docs: Remove LXC references from CONFIGURATION.md 2025-12-02 23:37:11 +00:00
rcourtman
aa2023c533 Fix INSTALL.md inaccuracies
- Remove non-existent Proxmox LXC installer section (install.sh is actually
  the unified agent installer, not an LXC container creator)
- Fix Helm install command to use GitHub Pages repo instead of non-existent
  OCI registry
- Add proper systemd installation instructions with actual commands
- Remove non-existent CLI commands (pulse config rollback, pulse-update.timer)
- Add Kubernetes update/uninstall commands
- Add sudo where needed for systemd commands
2025-12-02 23:36:32 +00:00
rcourtman
a40d5e0f5e Fix inaccurate architecture documentation
- Correct connection methods: Pulse uses REST APIs for PVE/PBS (not SSH)
- Update diagram to show HTTPS API connections on ports 8006/8007
- Add agent push model for Docker/Host metrics collection
- Remove incorrect SSH connection pooling references
- Update data flow to reflect API polling and agent push
2025-12-02 23:32:14 +00:00
rcourtman
d0d989289a Refactor alert system: fix race conditions, memory leaks, and improve code quality
- Rename checkFlapping to checkFlappingLocked to clarify lock contract
- Replace goto statements with structured control flow
- Wire up unused recordAlertFired/recordAlertResolved metric hooks
- Add trackingMapCleanup goroutine to prevent memory leaks from stale entries
- Tighten alert ID validation to alphanumeric + safe punctuation
- Fix history save error handling to properly manage backup lifecycle
- Add auto-migration for deprecated GroupingWindow field
- Refactor 300+ line UpdateConfig into focused helper functions
- Unify duplicate evaluateVMCondition/evaluateContainerCondition
- Add constants for magic numbers (thresholds, timing, flapping)
- Update tests to match new backup behavior
2025-12-02 23:31:36 +00:00
rcourtman
da43588189 Update docs and helm chart for agent health endpoints
- Add health-addr config option to UNIFIED_AGENT.md
- Document /healthz, /readyz, /metrics endpoints
- Add Kubernetes probe examples to docs
- Add liveness/readiness probes to helm chart agent template
- Add healthPort, livenessProbe, readinessProbe to values.yaml
- Update values.schema.json with new agent probe options
2025-12-02 22:45:24 +00:00
rcourtman
7fc15417e4 Add health/metrics server and proper cleanup to unified agent
- Add /healthz (liveness) and /readyz (readiness) endpoints
- Add /metrics endpoint with Prometheus metrics (pulse_agent_info, pulse_agent_up)
- Properly call dockerAgent.Close() on shutdown
- New config: -health-addr flag and PULSE_HEALTH_ADDR env (default :9191)
- Set to empty string to disable health server
2025-12-02 22:42:05 +00:00
rcourtman
b4a33c4f2d Fix offline buffering: add tests, remove unused config, fix flaky test
- Add unit tests for internal/buffer package
- Fix misleading "ring buffer" comment (it's a bounded FIFO queue)
- Remove unused BufferCapacity config field from both agents
- Rewrite flaky integration test to use polling instead of fixed sleeps
2025-12-02 22:31:44 +00:00
courtmanr@gmail.com
caf0c10206 feat: Implement offline buffering for host and docker agents
- Add internal/buffer package with generic ring buffer
- Add buffering logic to host agent for failed reports
- Add buffering logic to docker agent for failed reports
- Add BufferCapacity configuration option
- Add integration tests for buffering logic
2025-12-02 22:12:47 +00:00
rcourtman
bda8056e48 Add refresh-cluster button to detect new Proxmox cluster members
When new nodes are added to a Proxmox cluster after Pulse was
initially configured, they weren't showing up in Settings. The
existing "Refresh" button only triggered network discovery, not
cluster membership re-detection.

Changes:
- Add POST /api/config/nodes/{id}/refresh-cluster endpoint
- Add "Refresh" button in cluster node panel in Settings
- Re-detect cluster membership and update stored endpoints

Related to #799
2025-12-02 22:01:00 +00:00
courtmanr@gmail.com
19c00feced Link ARCHITECTURE.md in SECURITY and DEV-QUICK-START guides 2025-12-02 20:51:37 +00:00
courtmanr@gmail.com
0d2f035292 Update docs: Unified Agent, Migration checklist, and cleanup 2025-12-02 20:49:34 +00:00
courtmanr@gmail.com
3c92c38b27 Update docs with missing config, API endpoints, and Docker Compose 2025-12-02 20:46:21 +00:00
courtmanr@gmail.com
4e0d971fa9 Link ARCHITECTURE.md in documentation 2025-12-02 20:41:39 +00:00
courtmanr@gmail.com
afcc1267bb Add ARCHITECTURE.md system design documentation 2025-12-02 20:40:31 +00:00
rcourtman
d9833cf6b0 fix: Resolve TypeScript errors in StackedMemoryBar and Settings
StackedMemoryBar.tsx:
- Fixed 'props.balloon' possibly undefined error by adding fallback
  to second comparison in Show condition

Settings.tsx:
- Fixed 'systemSettings' scope error by using updateChannel() signal
  instead of referencing out-of-scope variable from previous try block

Both files now pass strict TypeScript checks.
2025-12-02 20:37:44 +00:00
rcourtman
4c5b515cba fix: Update Mail Gateway disconnect state for consistency
Changed from warning (amber) to danger (red) tone and added:
- Dynamic description based on reconnecting status
- Manual "Reconnect now" button when not auto-reconnecting
- Consistent "Connection lost" title

All 7 major pages now have unified connection lost UX:
Dashboard, Storage, Backups, Replication, Hosts, Docker, Mail Gateway
2025-12-02 20:34:32 +00:00
rcourtman
1af0740de2 fix: Add connection lost indicator to Docker page
Docker/Containers page now shows a clear error state when WebSocket
connection is lost, with a manual "Reconnect now" button. This
matches the pattern established across all other major pages.

Connection lost UX is now consistent across: Dashboard, Storage,
Backups, Replication, Hosts, and Docker.
2025-12-02 20:31:59 +00:00
rcourtman
f0ff21ca1b fix: Add connection lost indicator to Hosts page
Hosts page now shows a clear error state when WebSocket connection
is lost, with a manual "Reconnect now" button. Also improved loading
state logic to differentiate between initial loading and connection
loss after having received data.

This completes the connection lost UX consistency across all major
pages: Dashboard, Storage, Backups, Replication, and now Hosts.
2025-12-02 20:29:15 +00:00
rcourtman
272d582262 fix: Add reconnect button to Backups and Replication pages
Both pages now show a consistent disconnect state with:
- Dynamic description based on reconnecting status
- Manual "Reconnect now" button when not auto-reconnecting

This matches the Dashboard and Storage page behavior, providing a
consistent UX across all main pages when connection is lost.
2025-12-02 20:24:34 +00:00
rcourtman
39f8a9f42c fix: Add connection lost indicator to Storage page
Storage page now shows a clear error state when WebSocket connection
is lost, matching the Dashboard's behavior. Users see the issue and
can manually reconnect instead of wondering why data isn't updating.
2025-12-02 20:20:39 +00:00
rcourtman
7ad4ccba49 fix: Correct lastBackup TypeScript type from string to number
The backend sends lastBackup as Unix milliseconds (int64), not as an
ISO string. Update VM and Container interfaces to match the actual
JSON payload.

The getBackupInfo() function already handles both string and number
types, so this is a type-safety fix that aligns types with reality.
2025-12-02 20:15:35 +00:00
rcourtman
a3e60cdd85 chore: Remove unused usePersistentSignal import
Cleanup from Settings sidebar change - the import was left behind
when switching from usePersistentSignal to createSignal.
2025-12-02 20:09:54 +00:00
rcourtman
d620de147a fix: Settings sidebar always starts expanded for discoverability
The sidebar no longer persists its collapsed state to localStorage.
Each visit to Settings starts with the sidebar expanded, showing
all menu labels for better discoverability by new users.

Users can still collapse the sidebar during their session if they
want more space, but it will reset to expanded on page reload.

Related to #764
2025-12-02 20:03:54 +00:00
rcourtman
9e38e4a6f0 feat: Add 'Needs Backup' filter to Dashboard
Add a new filter button that shows only guests with stale, critical,
or missing backups. This makes it easy to identify which VMs and
containers need attention for backup scheduling.

- Adds backupMode state with 'all' and 'needs-backup' options
- Filters out templates (they don't need backups)
- Uses existing getBackupInfo() thresholds (>24h stale, >72h critical)
- Integrates with Reset button and Escape key handling
- Persists filter state in localStorage

Related to #762
2025-12-02 20:00:33 +00:00
rcourtman
abf64c4ed3 fix: Constrain Docker drawer width to force card wrapping
The previous fix (2078421d) added overflow-hidden but didn't address
the root cause: the drawer div inside overflow-x-auto context had no
width constraint, so flex-wrap saw infinite space and didn't wrap.

Adding w-0 min-w-full forces the div to take exactly 100% of parent
width, which properly constrains flex-wrap to wrap cards within the
visible viewport.

Related to #789
2025-12-02 18:02:28 +00:00
rcourtman
032d12db8f deps: Update systemd, grpc-gateway, and pflag packages
- github.com/coreos/go-systemd/v22 v22.5.0 => v22.6.0
- github.com/grpc-ecosystem/grpc-gateway/v2 v2.27.2 => v2.27.3
- github.com/spf13/pflag v1.0.9 => v1.0.10
2025-12-02 17:44:07 +00:00
rcourtman
95bcda21da deps: Update prometheus, compress, and protobuf packages
- github.com/klauspost/compress v1.18.0 => v1.18.2
- github.com/prometheus/common v0.66.1 => v0.67.4
- github.com/prometheus/procfs v0.16.1 => v0.19.2
- go.yaml.in/yaml/v2 v2.4.2 => v2.4.3
- google.golang.org/protobuf v1.36.8 => v1.36.10
2025-12-02 17:42:03 +00:00
rcourtman
e74b09557d fix: trigger Docker publish workflow in release pipeline
The release workflow publishes via GitHub API (patching draft to
published), which doesn't fire the release webhook. This meant the
Docker publish workflow was never triggered automatically.

Added explicit workflow dispatch for publish-docker.yml after release
publish, similar to how update-demo-server.yml was already dispatched.

Related to #797
2025-12-02 17:32:30 +00:00
rcourtman
4f824ab148 style: Apply gofmt to 37 files
Standardize code formatting across test files and monitor.go.
No functional changes.
2025-12-02 17:21:48 +00:00
rcourtman
1a5acc2542 refactor: Remove duplicate IsPasswordHashed from auth package
The config package has a more robust IsPasswordHashed function that
handles truncated hashes. The auth package had a simpler duplicate
that was only used in tests. Removed the duplicate and its test
(already covered by config/config_utils_test.go).

Reduces deadcode findings from 7 to 6.
2025-12-02 17:19:07 +00:00
rcourtman
b66eb5cc83 deps: Update golang.org/x packages to latest versions
Updated direct dependencies:
- golang.org/x/oauth2: v0.31.0 -> v0.33.0
- golang.org/x/sync: v0.13.0 -> v0.18.0
- golang.org/x/time: v0.13.0 -> v0.14.0

All tests pass.
2025-12-02 17:03:16 +00:00
rcourtman
7075c8118e deps: Update key dependencies to latest versions
Updated direct dependencies:
- github.com/coreos/go-oidc/v3: v3.15.0 -> v3.17.0 (OIDC authentication)
- github.com/docker/docker: v28.5.1 -> v28.5.2 (Docker client)
- github.com/shirou/gopsutil/v4: v4.25.9 -> v4.25.11 (system monitoring)
- github.com/spf13/cobra: v1.9.1 -> v1.10.1 (CLI framework)

Transitive dependency updates:
- github.com/go-jose/go-jose/v4: v4.0.5 -> v4.1.3 (JWT library)
- github.com/spf13/pflag: v1.0.7 -> v1.0.9
- github.com/ebitengine/purego: v0.9.0 -> v0.9.1
- github.com/tklauser/go-sysconf: v0.3.15 -> v0.3.16
- github.com/tklauser/numcpus: v0.10.0 -> v0.11.0

All tests pass with updated dependencies.
2025-12-02 16:59:13 +00:00
rcourtman
e5f1289239 refactor: Remove duplicate isLoopback function from hostagent
The isLoopback function in internal/hostagent/agent.go was unused in
production code - it was a duplicate of the same function in
internal/hostmetrics/collector.go which is actively used.

Removed the dead code along with its associated tests to reduce
maintenance burden and improve code clarity.
2025-12-02 16:52:55 +00:00
rcourtman
cf26ed7f12 security: Add request body size limits to remaining API handlers
Add http.MaxBytesReader to 8 additional handlers to complete API
hardening against memory exhaustion attacks:

- guest_metadata.go: HandleUpdateMetadata (16KB)
- notification_queue.go: RetryDLQItem, DeleteDLQItem (8KB each)
- temperature_proxy.go: HandleRegister (8KB)
- host_agents.go: HandleReport (256KB)
- updates.go: HandleApplyUpdate (8KB)
- docker_metadata.go: HandleUpdateMetadata (16KB)
- system_settings.go: UpdateSystemSettings (64KB)

All API handlers that decode JSON request bodies now have size limits.
2025-12-02 16:47:13 +00:00
rcourtman
b4d497ce3b security: Add request body size limits to API handlers
Add http.MaxBytesReader to 16 additional handlers to prevent memory
exhaustion attacks via oversized request bodies:

- docker_agents.go: HandleReport (512KB), HandleCommandAck (8KB),
  HandleSetCustomDisplayName (8KB)
- alerts.go: UpdateAlertConfig (64KB), BulkAcknowledgeAlerts (32KB),
  BulkClearAlerts (32KB)
- config_handlers.go: HandleAddNode, HandleTestConnection,
  HandleUpdateNode, HandleTestNodeConfig (32KB each),
  HandleVerifyTemperatureSSH, HandleExportConfig, HandleDiscoverServers,
  HandleSetupScriptURL (8KB each), HandleImportConfig (1MB),
  HandleUpdateMockMode (16KB)
2025-12-02 16:43:13 +00:00
rcourtman
6eb7f06df1 security: Add request body size limits to notification handlers
Add http.MaxBytesReader limits to prevent memory exhaustion attacks:
- UpdateEmailConfig: 32KB limit
- UpdateAppriseConfig: 64KB limit
- CreateWebhook: 64KB limit
- UpdateWebhook: 64KB limit

This follows the pattern already used in system_settings.go for
SSH config validation.
2025-12-02 16:37:30 +00:00
rcourtman
c4fef5e560 test: Fix unreachable code warning in panic recovery test
Refactor the test to avoid unreachable code after panic by
checking a flag set before the panic instead of after. This
resolves the go vet warning while maintaining test coverage.
2025-12-02 16:30:31 +00:00
rcourtman
322573157e refactor: Use zerolog instead of fmt.Printf in config export
Replace raw fmt.Printf calls with structured zerolog logging for
consistency with the rest of the codebase. This improves log
formatting and enables proper log level filtering.
2025-12-02 16:27:54 +00:00
rcourtman
ed9907accb test: Add tests for SyncGuestBackupTimes and UpdateStorageBackupsForInstance
Add tests for the remaining uncovered State methods:
- SyncGuestBackupTimes: Tests backup time sync from storage and PBS backups
- UpdateStorageBackupsForInstance: Tests storage backup updates with instance isolation

Improves internal/models coverage from 89.6% to 95.6%.
2025-12-02 16:22:38 +00:00
rcourtman
1df5897369 test: Add tests for Ceph, backup, replication, and snapshot methods
Add tests for previously uncovered State methods:
- UpdateCephClustersForInstance
- UpdateBackupTasksForInstance
- UpdateReplicationJobsForInstance
- UpdateGuestSnapshotsForInstance

Improves internal/models coverage from 80.4% to 89.6%.
2025-12-02 16:20:11 +00:00
rcourtman
f044b42e39 test: Add more State method tests for coverage
Add tests for previously uncovered State methods:
- UpdatePhysicalDisks
- UpdateStorageForInstance
- UpdatePBSInstances / UpdatePBSInstance
- UpdatePMGInstances / UpdatePMGInstance

Improves internal/models coverage from 71.9% to 80.4%.
2025-12-02 16:17:04 +00:00
rcourtman
9d57025a88 test: Add tests for state connection health and backup methods
Add tests for previously uncovered State methods:
- SetConnectionHealth / RemoveConnectionHealth
- UpdatePBSBackups
- UpdatePMGBackups

Improves internal/models coverage from 67.0% to 71.9%.
2025-12-02 16:14:40 +00:00
rcourtman
b9db9c140b docs: Add godoc comments to more exported functions
Add missing godoc comments to:
- BuildGuestKey in alerts/alerts.go
- GenerateMockData in mock/generator.go
- NewDockerUpdater, NewAURUpdater in updates/adapter_installsh.go
- NewMockUpdater in updates/mock_updater.go
2025-12-02 16:03:57 +00:00
rcourtman
c05817f9de docs: Add godoc comments to exported functions
Add missing godoc comments to:
- NewRateLimiter and Allow in ratelimit.go
- SnapshotSyncStatus in temperature_proxy.go
- NewClient and GetVersion in pkg/pmg/client.go
2025-12-02 15:58:59 +00:00
rcourtman
097976321b perf: Cache hostname lowercase in temperature proxy lookups
Pre-compute strings.ToLower(hostname) before loops that search for
matching PVE instances. Avoids repeated lowercasing in two functions.
2025-12-02 15:43:41 +00:00