Commit Graph

42 Commits

Author SHA1 Message Date
rcourtman
95c85f6e01 fix: use correct service name (pulse.service) for proxy environment override
The installer was configuring pulse-backend.service.d but the actual
service is pulse.service, so the PULSE_SENSOR_PROXY_SOCKET environment
variable wasn't being set.

Changed: pulse-backend.service → pulse.service

This ensures Pulse actually uses the proxy socket for temperature
monitoring instead of attempting SSH connections.
2025-10-20 22:28:33 +00:00
rcourtman
8faa9040fb fix: show curl errors in installer download failures
Changed curl flags from -fsSL to -fSL to enable error output.
The -s flag was silencing all curl errors including SSL/TLS issues,
making it impossible to diagnose download failures.

With -S (show errors), stderr now captures meaningful error messages
like certificate problems, connection failures, etc.
2025-10-20 21:31:54 +00:00
rcourtman
90d51a2b1b feat: add rollback mechanism for container config changes
- Back up container config before making mount modifications
- Restore original config if socket verification fails
- Clean up backup file on success or when verification is skipped
- Leave host-level resources (user, binary, service) in place for idempotency

This ensures failed installations don't leave containers in an
inconsistent state while keeping successfully installed host services
for faster re-runs.
2025-10-20 21:16:06 +00:00
rcourtman
d421f101ba feat: harden temperature proxy installation with better validation and error handling
Setup script improvements (config_handlers.go):
- Remove redundant mount configuration and container restart logic
- Let installer handle all mount/restart operations (single source of truth)
- Eliminate hard-coded mp0 assumption

Installer improvements (install-sensor-proxy.sh):
- Add mount configuration persistence validation via pct config check
- Surface pct set errors instead of silencing with 2>/dev/null
- Capture and display curl download errors with temp files
- Check systemd daemon-reload/enable/restart exit codes
- Show journalctl output when service fails to start
- Make socket verification fatal (was warning)
- Provide clear manual steps when hot-plug fails on running container

This makes the installation fail fast with actionable error messages
instead of silently proceeding with broken configuration.
2025-10-20 21:14:00 +00:00
rcourtman
001d7f5f1c fix: comprehensive temperature proxy setup improvements
Addresses multiple issues that prevented successful temperature monitoring setup:

1. **Missing log directory (install-sensor-proxy.sh)**
   - Added LogsDirectory=pulse/sensor-proxy to both systemd service templates
   - Fixes crash: "open /var/log/pulse/sensor-proxy/audit.log: read-only file system"
   - Uses systemd's LogsDirectory directive for proper permissions

2. **Invalid pct restart command (install-sensor-proxy.sh:822)**
   - Changed from `pct restart` (doesn't exist) to `pct stop && sleep 2 && pct start`
   - Fixes container restart failures during proxy setup

3. **Version compatibility check (config_handlers.go)**
   - Added const minProxyReadyVersion = "4.24.0"
   - Setup script now queries /api/version endpoint
   - Blocks proxy setup on Pulse < v4.24.0 with clear upgrade message
   - Prevents users from attempting proxy setup on incompatible versions

4. **Proxy service health validation (config_handlers.go)**
   - Verifies pulse-sensor-proxy service is actually running
   - Checks socket exists at /run/pulse-sensor-proxy/pulse-sensor-proxy.sock
   - Shows journalctl command for troubleshooting on failure
   - Sets TEMP_MONITORING_AVAILABLE=false to skip remaining steps

5. **Interactive LXC restart prompt (config_handlers.go)**
   - Replaced passive "please restart" message with interactive prompt
   - Default action is "yes" for easy acceptance
   - Actually executes pct stop/start on confirmation
   - Handles non-interactive environments gracefully

6. **Post-restart socket verification (config_handlers.go)**
   - Validates socket is accessible inside container after restart
   - Provides clear error if mount didn't work
   - Prevents claiming success when setup is incomplete

All changes tested with fresh LXC installation. Temperature monitoring now
works end-to-end with proper error handling and user guidance.

Fixes temperature proxy setup flow for v4.24.0+
2025-10-20 18:00:21 +00:00
rcourtman
c91b7874ac docs: comprehensive v4.24.0 documentation audit and updates
Complete documentation overhaul for Pulse v4.24.0 release covering all new
features and operational procedures.

Documentation Updates (19 files):

P0 Release-Critical:
- Operations: Rewrote ADAPTIVE_POLLING_ROLLOUT.md as GA operations runbook
- Operations: Updated ADAPTIVE_POLLING_MANAGEMENT_ENDPOINTS.md with DEFERRED status
- Operations: Enhanced audit-log-rotation.md with scheduler health checks
- Security: Updated proxy hardening docs with rate limit defaults
- Docker: Added runtime logging and rollback procedures

P1 Deployment & Integration:
- KUBERNETES.md: Runtime logging config, adaptive polling, post-upgrade verification
- PORT_CONFIGURATION.md: Service naming, change tracking via update history
- REVERSE_PROXY.md: Rate limit headers, error pass-through, v4.24.0 verification
- PROXY_AUTH.md, OIDC.md, WEBHOOKS.md: Runtime logging integration
- TROUBLESHOOTING.md, VM_DISK_MONITORING.md, zfs-monitoring.md: Updated workflows

Features Documented:
- X-RateLimit-* headers for all API responses
- Updates rollback workflow (UI & CLI)
- Scheduler health API with rich metadata
- Runtime logging configuration (no restart required)
- Adaptive polling (GA, enabled by default)
- Enhanced audit logging
- Circuit breakers and dead-letter queue

Supporting Changes:
- Discovery service enhancements
- Config handlers updates
- Sensor proxy installer improvements

Total Changes: 1,626 insertions(+), 622 deletions(-)
Files Modified: 24 (19 docs, 5 code)

All documentation is production-ready for v4.24.0 release.
2025-10-20 17:20:13 +00:00
rcourtman
0fcfad3dc5 feat: add shared script library system and refactor docker-agent installer
Implements a comprehensive script improvement infrastructure to reduce code
duplication, improve maintainability, and enable easier testing of installer
scripts.

## New Infrastructure

### Shared Library System (scripts/lib/)
- common.sh: Core utilities (logging, sudo, dry-run, cleanup management)
- systemd.sh: Service management helpers with container-safe systemctl
- http.sh: HTTP/download helpers with curl/wget fallback and retry logic
- README.md: Complete API documentation for all library functions

### Bundler System
- scripts/bundle.sh: Concatenates library modules into single-file installers
- scripts/bundle.manifest: Defines bundling configuration for distributables
- Enables both modular development and curl|bash distribution

### Test Infrastructure
- scripts/tests/run.sh: Test harness for running all smoke tests
- scripts/tests/test-common-lib.sh: Common library validation (5 tests)
- scripts/tests/test-docker-agent-v2.sh: Installer smoke tests (4 tests)
- scripts/tests/integration/: Container-based integration tests (5 scenarios)
- All tests passing ✓

## Refactored Installer

### install-docker-agent-v2.sh
- Reduced from 1098 to 563 lines (48% code reduction)
- Uses shared libraries for all common operations
- NEW: --dry-run flag support
- Maintains 100% backward compatibility with original
- Fully tested with smoke and integration tests

### Key Improvements
- Sudo escalation: 100+ lines → 1 function call
- Download logic: 51 lines → 1 function call
- Service creation: 33 lines → 2 function calls
- Logging: Standardized across all operations
- Error handling: Improved with common library

## Documentation

### Rollout Strategy (docs/installer-v2-rollout.md)
- 3-phase rollout plan (Alpha → Beta → GA)
- Feature flag mechanism for gradual deployment
- Testing checklist and success metrics
- Rollback procedures and communication plan

### Developer Guides
- docs/script-library-guide.md: Complete library usage guide
- docs/CONTRIBUTING-SCRIPTS.md: Contribution workflow
- docs/installer-v2-quickref.md: Quick reference for operators

## Metrics

- Code reduction: 48% (1098 → 563 lines)
- Reusable functions: 0 → 30+
- Test coverage: 0 → 8 test scenarios
- Documentation: 0 → 5 comprehensive guides

## Testing

All tests passing:
- Smoke tests: 2/2 passed (8 test cases)
- Integration tests: 5/5 scenarios passed
- Bundled output: Syntax validated, dry-run tested

## Next Steps

This lays the foundation for migrating other installers (install.sh,
install-sensor-proxy.sh) to use the same pattern, reducing overall
maintenance burden and improving code quality across the project.
2025-10-20 15:13:38 +00:00
rcourtman
aa5c08ad4a feat: implement priority queue-based task execution (Phase 2 Task 6)
Replaces immediate polling with queue-based scheduling:
- TaskQueue with min-heap (container/heap) for NextRun-ordered execution
- Worker goroutines that block on WaitNext() until tasks are due
- Tasks only execute when NextRun <= now, respecting adaptive intervals
- Automatic rescheduling after execution via scheduler.BuildPlan
- Queue depth tracking for backpressure-aware interval adjustments
- Upsert semantics for updating scheduled tasks without duplicates

Task 6 of 10 complete (60%). Ready for error/backoff policies.
2025-10-20 15:13:37 +00:00
rcourtman
c554380cb5 feat: verify adaptive interval logic implementation (Phase 2 Task 5)
Confirms adaptive scheduling logic is fully operational:
- EMA smoothing (alpha=0.6) to prevent interval oscillations
- Staleness-based interpolation between min/max intervals
- Error penalty (0.6x per error) for faster recovery detection
- Queue depth stretch (0.1x per task) for backpressure handling
- ±5% jitter to prevent thundering herd effects
- Per-instance state tracking for smooth transitions

Task 5 of 10 complete. Scheduler foundation ready for queue-based execution.
2025-10-20 15:13:37 +00:00
rcourtman
524f42cc28 security: complete Phase 1 sensor proxy hardening
Implements comprehensive security hardening for pulse-sensor-proxy:
- Privilege drop from root to unprivileged user (UID 995)
- Hash-chained tamper-evident audit logging with remote forwarding
- Per-UID rate limiting (0.2 QPS, burst 2) with concurrency caps
- Enhanced command validation with 10+ attack pattern tests
- Fuzz testing (7M+ executions, 0 crashes)
- SSH hardening, AppArmor/seccomp profiles, operational runbooks

All 27 Phase 1 tasks complete. Ready for production deployment.
2025-10-20 15:13:37 +00:00
rcourtman
67862e6f11 feat: add user-friendly explanation for socket bind mount
Added clear messaging to explain why the socket bind mount is configured,
focusing on the security benefits rather than technical implementation.

Changes:
- Add explanatory header "Secure Container Communication Setup"
- Explain the three key benefits:
  • Container communicates via Unix socket (not SSH)
  • No SSH keys exposed inside container (enhanced security)
  • Proxy on host manages all temperature collection
- Update technical messages to be more user-friendly:
  • "Configuring socket bind mount" instead of "Ensuring..."
  • "Restarting container to activate secure communication"
  • "Verifying secure communication channel"
  • "✓ Secure socket communication ready"
  • "Configuring Pulse to use proxy"

This helps users understand WHY the bind mount exists (security) rather
than just seeing technical implementation details.
2025-10-19 16:22:03 +00:00
rcourtman
171723a7d3 fix: automatically restart container when proxy mount is configured
Instead of warning the user to restart the container manually, the script
now automatically restarts it when the socket mount configuration is
updated. This ensures the mount is immediately active and temperature
monitoring works right away without user intervention.

Uses 'pct restart' if running, 'pct start' if stopped.
2025-10-19 15:56:31 +00:00
rcourtman
f81d77bb98 fix: fall back to Pulse server when GitHub download fails for pulse-sensor-proxy
The install-sensor-proxy.sh script now tries GitHub releases first, then falls
back to downloading from the Pulse server if GitHub fails or doesn't have the
binary (common when building from main).

The LXC installer sets PULSE_SENSOR_PROXY_FALLBACK_URL to point to the Pulse
server running inside the newly created LXC, ensuring the proxy binary can be
downloaded from /api/install/pulse-sensor-proxy.

This fixes the issue where installing with --main would fail to install
pulse-sensor-proxy on the host because GitHub releases don't include it yet.
2025-10-19 15:17:59 +00:00
rcourtman
049f79987f feat: add turnkey Docker installer with automatic proxy setup
Adds a one-command Docker deployment flow that:
- Detects if running in LXC and installs Docker if needed
- Automatically installs pulse-sensor-proxy on the Proxmox host
- Configures bind mount for proxy socket into LXC
- Generates optimized docker-compose.yml with proxy socket
- Enables temperature monitoring via host-side proxy

The install-docker.sh script handles the complete setup including:
- Docker installation (if needed)
- ACL configuration for container UIDs
- Bind mount setup
- Automatic apparmor=unconfined for socket access

Accessible via: curl -sSL http://pulse:7655/api/install/install-docker.sh | bash
2025-10-19 15:03:24 +00:00
Pulse Automation Bot
d15ad1d0b4 Add Helm chart tooling, CI, and release packaging 2025-10-18 11:50:57 +00:00
Richard Courtman
02701ca22b fix: gracefully handle standalone node cleanup limitation
- Cleanup script now detects forced command restriction on standalone nodes
- Logs helpful message explaining limitation (security by design)
- Does not fail when standalone nodes cannot be cleaned up
- Documents that standalone node cleanup is limited by forced command security
- Automatic cleanup works fully for cluster nodes
- Manual cleanup command provided for standalone nodes if needed
2025-10-18 07:34:18 +00:00
Richard Courtman
c9bbb5e6fb fix: use proxy SSH key for cleanup of standalone nodes
- Cleanup script now tries proxy's SSH key first for standalone nodes
- Falls back to default SSH if proxy key not available
- Fixes cleanup failure when Proxmox host doesn't have direct SSH to standalone nodes
2025-10-18 07:27:15 +00:00
Richard Courtman
7a7158d9bd feat: add automatic SSH key cleanup when nodes are removed
- Create cleanup script that removes Pulse SSH keys from nodes
- Add systemd path unit to watch for cleanup requests
- Add systemd service to execute cleanup script
- Update install-sensor-proxy.sh to install cleanup system
- Handles both cluster nodes (pulse-managed-key) and standalone nodes (pulse-proxy-key)
- Cleanup is triggered automatically when nodes are deleted from Pulse
- All cleanup actions are logged via syslog for auditability
2025-10-18 07:03:05 +00:00
Richard Courtman
669d7dc05c feat: add turnkey temperature monitoring for standalone nodes
Implements automatic temperature monitoring setup for standalone
Proxmox/Pimox nodes without manual SSH key configuration.

Changes:
- Add /api/system/proxy-public-key endpoint to expose proxy's SSH public key
- Setup script now detects standalone nodes (non-cluster)
- Auto-fetches and installs proxy SSH key with forced commands
- Add Raspberry Pi temperature support via cpu_thermal and /sys/class/thermal
- Enhance setup script with better error handling for lm-sensors installation
- Add RPi detection to skip lm-sensors and use native thermal interface

Security:
- Public key endpoint is safe (public keys are meant to be public)
- All installed keys use forced command="sensors -j" with full restrictions
- No shell access, port forwarding, or other SSH features enabled
2025-10-17 22:15:50 +00:00
rcourtman
5886b920ba fix: improve sensor proxy install script reliability
Fixes two issues with the sensor proxy installation:
1. Local node IP detection now uses exact matching instead of substring matching to avoid false negatives
2. Removes duplicate output filtering in the setup script wrapper

These changes ensure that the proxy SSH key is correctly configured on the local node during cluster installations.
2025-10-17 19:09:54 +00:00
rcourtman
123e0f04ca feat: add comprehensive node cleanup system
Implements automated cleanup workflow when nodes are deleted from Pulse, removing all monitoring footprint from the host. Changes include a new RPC handler in the sensor proxy for cleanup requests, enhanced node deletion modal with detailed cleanup explanations, and improved SSH key management with proper tagging for atomic updates.
2025-10-17 18:53:45 +00:00
rcourtman
864a90e58a fix: remove reference to deleted 'Ensure cluster keys' button in installer
The button was removed in previous commit, update error message to suggest
re-running the script instead.
2025-10-17 14:11:50 +00:00
rcourtman
f141f7db33 feat: enhance sensor proxy with improved cluster discovery and SSH management
Improvements to pulse-sensor-proxy:
- Fix cluster discovery to use pvecm status for IP addresses instead of node names
- Add standalone node support for non-clustered Proxmox hosts
- Enhanced SSH key push with detailed logging, success/failure tracking, and error reporting
- Add --pulse-server flag to installer for custom Pulse URLs
- Configure www-data group membership for Proxmox IPC access

UI and API cleanup:
- Remove unused "Ensure cluster keys" button from Settings
- Remove /api/diagnostics/temperature-proxy/ensure-cluster-keys endpoint
- Remove EnsureClusterKeys method from tempproxy client

The setup script already handles SSH key distribution during initial configuration,
making the manual refresh button redundant.
2025-10-17 11:43:26 +00:00
rcourtman
3a4fc044ea Add guest agent caching and update doc hints (refs #560) 2025-10-16 08:15:49 +00:00
rcourtman
91fecacfef feat: add docker agent command handling 2025-10-15 19:27:19 +00:00
rcourtman
46320015cd Improve docker agent installer path handling 2025-10-14 16:39:30 +00:00
rcourtman
261bd7ac74 Adopt multi-token auth across docs, UI, and tooling 2025-10-14 15:47:49 +00:00
rcourtman
e4c3b06f14 Automate sensor proxy container mount and auth 2025-10-14 12:41:48 +00:00
rcourtman
156fd34c50 Update Proxmox guest agent permissions docs and tooling (refs #548) 2025-10-14 10:21:52 +00:00
rcourtman
5c79d2516d feat: streamline docker agent onboarding 2025-10-14 09:45:32 +00:00
rcourtman
6c7314b86b polish: Clean up setup script output for professional presentation
Made the setup and installation output more concise and reassuring for users. Less verbosity, clearer messaging.

**Setup script improvements:**
- Changed "Container Detection" → "Enhanced Security"
- Simplified prompts: "Enable secure proxy? [Y/n]"
- Cleaned up success messages: "✓ Secure proxy architecture enabled"
- Removed verbose status messages (node-by-node cleanup output)
- Only show essential information users need to see

**install-sensor-proxy.sh improvements:**
- Added --quiet flag to suppress verbose output
- In quiet mode, only shows: "✓ pulse-sensor-proxy installed and running"
- Full output still available when run manually
- Removed redundant "Installation complete!" banners
- Cleaner legacy key cleanup messaging

**Result:**
Users see a clean, professional installation flow that builds confidence. Technical details are hidden unless needed. Messages are clear and reassuring rather than verbose.
2025-10-13 13:51:17 +00:00
rcourtman
fd09af6eee feat: Auto-cleanup legacy SSH keys when migrating to proxy
When pulse-sensor-proxy is installed, automatically remove old SSH keys that were stored in the container for security.

Changes:

**install-sensor-proxy.sh:**
- Checks container for SSH private keys (id_rsa, id_ed25519, etc.)
- Removes any found keys from container
- Warns user that legacy keys were cleaned up
- Explains proxy now handles SSH

**Setup script (config_handlers.go):**
- After successful proxy install, removes old SSH keys from all cluster nodes
- Cleans up authorized_keys entries that match the old container-based key
- Keeps only proxy-managed keys (pulse-sensor-proxy comment)

This provides a clean migration path from the old direct-SSH method to the secure proxy architecture. Users upgrading from pre-v4.24 versions get automatic cleanup of insecure container-stored keys.
2025-10-13 13:47:19 +00:00
rcourtman
fcd8b62705 refactor: Rename install-temp-proxy.sh to install-sensor-proxy.sh
Complete the pulse-sensor-proxy rename by updating the installer script name and all references to it.

Updated:
- Renamed scripts/install-temp-proxy.sh → scripts/install-sensor-proxy.sh
- Updated all documentation references
- Updated install.sh references
- Updated build-release.sh comments
2025-10-13 13:23:53 +00:00
rcourtman
b952444837 refactor: Rename pulse-temp-proxy to pulse-sensor-proxy
The name "temp-proxy" implied a temporary or incomplete implementation. The new name better reflects its purpose as a secure sensor data bridge for containerized Pulse deployments.

Changes:
- Renamed cmd/pulse-temp-proxy/ to cmd/pulse-sensor-proxy/
- Updated all path constants and binary references
- Renamed environment variables: PULSE_TEMP_PROXY_* to PULSE_SENSOR_PROXY_*
- Updated systemd service and service account name
- Updated installation, rotation, and build scripts
- Renamed hardening documentation
- Maintained backward compatibility for key removal during upgrades
2025-10-13 13:17:05 +00:00
rcourtman
2244ff0314 fix: Change RuntimeDirectoryMode to 0775 for container access
The pulse user in the container (UID 1001) needs to access the
/run/pulse-temp-proxy directory owned by root:root. Changed from
0770 (owner+group only) to 0775 (add world read+execute) so the
pulse user can access the socket.

Related to #528
2025-10-12 22:39:18 +00:00
rcourtman
c7bb76c12e fix: Switch proxy socket to directory-level bind mount for stability
Fixes LXC bind mount issue where socket-level mounts break when the
socket is recreated by systemd. Following Codex's recommendation to
bind mount the directory instead of the file.

Changes:
- Socket path: /run/pulse-temp-proxy/pulse-temp-proxy.sock
- Systemd: RuntimeDirectory=pulse-temp-proxy (auto-creates /run/pulse-temp-proxy)
- Systemd: RuntimeDirectoryMode=0770 for group access
- LXC mount: Bind entire /run/pulse-temp-proxy directory
- Install script: Upgrades old socket-level mounts to directory-level
- Install script: Detects and handles bind mount changes

This survives socket recreations and container restarts. The directory
mount persists even when systemd unlinks/recreates the socket file.

Related to #528
2025-10-12 22:33:53 +00:00
rcourtman
58971aa982 feat: Add --local-binary flag to install-temp-proxy.sh for pre-release testing
Allows testing the proxy installation with a locally-built binary instead
of requiring a GitHub release. This enables proper E2E testing before
shipping a release.

Usage:
  ./install-temp-proxy.sh --ctid 112 --local-binary /path/to/pulse-temp-proxy

The script will:
- Use the provided binary instead of downloading from GitHub
- Still handle all setup (systemd service, SSH keys, bind mounts)
- Allow full integration testing without a release

This solves the chicken-and-egg problem of needing a release to test
the installation process.

Related to #528
2025-10-12 22:25:49 +00:00
rcourtman
1b65e792e2 build: Add pulse-temp-proxy to release build process
Updates build script and release checklist to include pulse-temp-proxy binaries:
- Build pulse-temp-proxy for all architectures (amd64, arm64, armv7)
- Include in tarballs alongside pulse and pulse-docker-agent
- Copy standalone binaries to release/ for install-temp-proxy.sh
- Update release checklist to upload standalone binaries as assets

This ensures install-temp-proxy.sh can download binaries from GitHub releases.
2025-10-12 21:46:48 +00:00
rcourtman
e7bc338891 feat: Implement secure temperature proxy for containerized deployments
Addresses #528

Introduces pulse-temp-proxy architecture to eliminate SSH key exposure in containers:

**Architecture:**
- pulse-temp-proxy runs on Proxmox host (outside LXC/Docker)
- SSH keys stored on host filesystem (/var/lib/pulse-temp-proxy/ssh/)
- Pulse communicates via unix socket (bind-mounted into container)
- Proxy handles cluster discovery, key rollout, and temperature fetching

**Components:**
- cmd/pulse-temp-proxy: Standalone Go binary with unix socket RPC server
- internal/tempproxy: Client library for Pulse backend
- scripts/install-temp-proxy.sh: Idempotent installer for existing deployments
- scripts/pulse-temp-proxy.service: Systemd service for proxy

**Integration:**
- Pulse automatically detects and uses proxy when socket exists
- Falls back to direct SSH for native installations
- Installer automatically configures proxy for new LXC deployments
- Existing LXC users can upgrade by running install-temp-proxy.sh

**Security improvements:**
- Container compromise no longer exposes SSH keys
- SSH keys never enter container filesystem
- Maintains forced command restrictions
- Transparent to users - no workflow changes

**Documentation:**
- Updated TEMPERATURE_MONITORING.md with new architecture
- Added verification steps and upgrade instructions
- Preserved legacy documentation for native installs
2025-10-12 21:35:35 +00:00
rcourtman
7ba89da12c Fix docker agent installer for BusyBox wget (#255) 2025-10-12 10:57:19 +00:00
rcourtman
a74baed121 feat: capture Proxmox memory snapshots in diagnostics 2025-10-12 10:25:43 +00:00
rcourtman
f46ff1792b Fix settings security tab navigation 2025-10-11 23:29:47 +00:00