Improvements to pulse-sensor-proxy:
- Fix cluster discovery to use pvecm status for IP addresses instead of node names
- Add standalone node support for non-clustered Proxmox hosts
- Enhanced SSH key push with detailed logging, success/failure tracking, and error reporting
- Add --pulse-server flag to installer for custom Pulse URLs
- Configure www-data group membership for Proxmox IPC access
UI and API cleanup:
- Remove unused "Ensure cluster keys" button from Settings
- Remove /api/diagnostics/temperature-proxy/ensure-cluster-keys endpoint
- Remove EnsureClusterKeys method from tempproxy client
The setup script already handles SSH key distribution during initial configuration,
making the manual refresh button redundant.
Made the setup and installation output more concise and reassuring for users. Less verbosity, clearer messaging.
**Setup script improvements:**
- Changed "Container Detection" → "Enhanced Security"
- Simplified prompts: "Enable secure proxy? [Y/n]"
- Cleaned up success messages: "✓ Secure proxy architecture enabled"
- Removed verbose status messages (node-by-node cleanup output)
- Only show essential information users need to see
**install-sensor-proxy.sh improvements:**
- Added --quiet flag to suppress verbose output
- In quiet mode, only shows: "✓ pulse-sensor-proxy installed and running"
- Full output still available when run manually
- Removed redundant "Installation complete!" banners
- Cleaner legacy key cleanup messaging
**Result:**
Users see a clean, professional installation flow that builds confidence. Technical details are hidden unless needed. Messages are clear and reassuring rather than verbose.
When pulse-sensor-proxy is installed, automatically remove old SSH keys that were stored in the container for security.
Changes:
**install-sensor-proxy.sh:**
- Checks container for SSH private keys (id_rsa, id_ed25519, etc.)
- Removes any found keys from container
- Warns user that legacy keys were cleaned up
- Explains proxy now handles SSH
**Setup script (config_handlers.go):**
- After successful proxy install, removes old SSH keys from all cluster nodes
- Cleans up authorized_keys entries that match the old container-based key
- Keeps only proxy-managed keys (pulse-sensor-proxy comment)
This provides a clean migration path from the old direct-SSH method to the secure proxy architecture. Users upgrading from pre-v4.24 versions get automatic cleanup of insecure container-stored keys.
Complete the pulse-sensor-proxy rename by updating the installer script name and all references to it.
Updated:
- Renamed scripts/install-temp-proxy.sh → scripts/install-sensor-proxy.sh
- Updated all documentation references
- Updated install.sh references
- Updated build-release.sh comments
The name "temp-proxy" implied a temporary or incomplete implementation. The new name better reflects its purpose as a secure sensor data bridge for containerized Pulse deployments.
Changes:
- Renamed cmd/pulse-temp-proxy/ to cmd/pulse-sensor-proxy/
- Updated all path constants and binary references
- Renamed environment variables: PULSE_TEMP_PROXY_* to PULSE_SENSOR_PROXY_*
- Updated systemd service and service account name
- Updated installation, rotation, and build scripts
- Renamed hardening documentation
- Maintained backward compatibility for key removal during upgrades
The pulse user in the container (UID 1001) needs to access the
/run/pulse-temp-proxy directory owned by root:root. Changed from
0770 (owner+group only) to 0775 (add world read+execute) so the
pulse user can access the socket.
Related to #528
Fixes LXC bind mount issue where socket-level mounts break when the
socket is recreated by systemd. Following Codex's recommendation to
bind mount the directory instead of the file.
Changes:
- Socket path: /run/pulse-temp-proxy/pulse-temp-proxy.sock
- Systemd: RuntimeDirectory=pulse-temp-proxy (auto-creates /run/pulse-temp-proxy)
- Systemd: RuntimeDirectoryMode=0770 for group access
- LXC mount: Bind entire /run/pulse-temp-proxy directory
- Install script: Upgrades old socket-level mounts to directory-level
- Install script: Detects and handles bind mount changes
This survives socket recreations and container restarts. The directory
mount persists even when systemd unlinks/recreates the socket file.
Related to #528
Allows testing the proxy installation with a locally-built binary instead
of requiring a GitHub release. This enables proper E2E testing before
shipping a release.
Usage:
./install-temp-proxy.sh --ctid 112 --local-binary /path/to/pulse-temp-proxy
The script will:
- Use the provided binary instead of downloading from GitHub
- Still handle all setup (systemd service, SSH keys, bind mounts)
- Allow full integration testing without a release
This solves the chicken-and-egg problem of needing a release to test
the installation process.
Related to #528
Updates build script and release checklist to include pulse-temp-proxy binaries:
- Build pulse-temp-proxy for all architectures (amd64, arm64, armv7)
- Include in tarballs alongside pulse and pulse-docker-agent
- Copy standalone binaries to release/ for install-temp-proxy.sh
- Update release checklist to upload standalone binaries as assets
This ensures install-temp-proxy.sh can download binaries from GitHub releases.
Addresses #528
Introduces pulse-temp-proxy architecture to eliminate SSH key exposure in containers:
**Architecture:**
- pulse-temp-proxy runs on Proxmox host (outside LXC/Docker)
- SSH keys stored on host filesystem (/var/lib/pulse-temp-proxy/ssh/)
- Pulse communicates via unix socket (bind-mounted into container)
- Proxy handles cluster discovery, key rollout, and temperature fetching
**Components:**
- cmd/pulse-temp-proxy: Standalone Go binary with unix socket RPC server
- internal/tempproxy: Client library for Pulse backend
- scripts/install-temp-proxy.sh: Idempotent installer for existing deployments
- scripts/pulse-temp-proxy.service: Systemd service for proxy
**Integration:**
- Pulse automatically detects and uses proxy when socket exists
- Falls back to direct SSH for native installations
- Installer automatically configures proxy for new LXC deployments
- Existing LXC users can upgrade by running install-temp-proxy.sh
**Security improvements:**
- Container compromise no longer exposes SSH keys
- SSH keys never enter container filesystem
- Maintains forced command restrictions
- Transparent to users - no workflow changes
**Documentation:**
- Updated TEMPERATURE_MONITORING.md with new architecture
- Added verification steps and upgrade instructions
- Preserved legacy documentation for native installs
Adds automatic data directory separation to prevent mock data from contaminating production alerts and configuration:
- hot-dev.sh: Explicitly sets PULSE_DATA_DIR based on PULSE_MOCK_MODE
- Production: /etc/pulse (real, persistent data)
- Mock: /opt/pulse/tmp/mock-data (isolated, throwaway data)
- clean-mock-alerts.sh: New utility to remove mock contamination from production alerts
- Auto-creates mock data directory when switching to mock mode
This prevents issues where mock alerts appear in production alert history after switching between modes.
Add green checkmark icon to "No active alerts" empty state for better visual feedback. Separate mock and production data directories to prevent contamination. Mock mode now uses /opt/pulse/tmp/mock-data while production uses /etc/pulse. Update toggle script to dynamically set data directory based on mode.
- Strip trailing slash from PULSE_URL in install script to prevent double-slash URLs
- Add path normalization in router for defense-in-depth on public endpoint matching
- Fixes issue #528 where users copying URLs with trailing slashes got 401 errors
The install script now normalizes PULSE_URL with ${PULSE_URL%/} before concatenating
with /download/pulse-docker-agent, preventing https://example.com//download URLs.
The router normalization provides additional resilience for path matching, though the
existing path traversal check already blocks double slashes at ServeHTTP level.
Add comprehensive PMG monitoring with mail statistics, queue depth tracking,
spam distribution analysis, and quarantine monitoring. Includes full discovery
support and UI consistency improvements across all Proxmox products.
Backend:
- Add pkg/pmg package with complete API client for PMG operations
- Implement mail statistics collection (inbound/outbound, spam, virus, bounces)
- Add queue depth monitoring (active, deferred, hold, incoming queues)
- Support spam score distribution and quarantine totals
- Add PMG-specific discovery logic to differentiate from PVE on port 8006
- Extend mock data generator with realistic PMG instances and metrics
- Add PMG node configuration support in config system
Frontend:
- Create MailGateway.tsx component with detailed PMG dashboard
- Display mail flow statistics with time-series charts
- Show queue depth with color-coded warnings (>50 messages or >30min age)
- Add spam distribution histogram and quarantine status
- Support cluster node status with individual queue monitoring
- Add PMG to network discovery with purple branding and mail icon
- Implement conditional navigation (hide PMG tab when no instances configured)
- Standardize discovery UI controls across PVE/PBS/PMG settings pages
API:
- Add /api/config/pmg endpoints for node configuration
- Support PMG-specific monitoring toggles (mail stats, queues, quarantine)
- Extend system settings with PMG configuration options
Discovery:
- Detect PMG vs PVE on shared port 8006 using /api2/json/statistics/mail endpoint
- Return 'pmg' type for mail gateway servers in discovery results
- Update DiscoveryModal to display PMG servers with appropriate styling
This completes ecosystem monitoring support for all three Proxmox products:
Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
- Consolidate powered-off alerts toggle into single compact row
- Fix encryption key mismatch between dev and production environments
- Always sync production encryption key to prevent decryption errors
- Add sync command to toggle-mock.sh for manual config updates
Implements comprehensive Docker monitoring with a dedicated agent that collects
container metrics and reports them to the main Pulse server. Adds Docker-specific
alert rules and threshold management with a redesigned UI.
Backend changes:
- Add Docker agent binary with container metrics collection
- Implement Docker host and container models with CPU/memory tracking
- Add Docker-specific alert types (offline, state, health)
- Extend threshold system to support Docker resources
- Add WebSocket message types for Docker agent communication
- Implement Docker agent API endpoints for registration and metrics
Frontend changes:
- Add Docker monitoring page with host/container views
- Add Docker agent settings panel for configuration
- Reorganize thresholds page with Proxmox/Docker tabs
- Add Docker-specific alert threshold management
- Improve layout consistency with vertical stacking
- Fix defensive null checks and TypeScript errors
This change enables monitoring of Docker containers across multiple hosts
with the same alerting and threshold capabilities as Proxmox resources.
- Create shared NodeGroupHeader component to eliminate code duplication
- Replace vertical line indicator with circular dot matching guest rows
- Update online indicator to use bg-green-500 (matching guest indicators)
- Reduce node row padding from py-2 to py-1 for more compact layout
- Set background to dark:bg-gray-900 to match search bar styling
- Apply changes consistently across Dashboard and Storage tabs
This commit addresses all issues reported in GitHub issue #485:
1. **SMART Status Recognition**
- Fix disk health check to accept both "PASSED" and "OK" status
- Previously only "PASSED" was recognized as healthy
- Location: internal/monitoring/monitor.go:1255
2. **ZFS Spare Device False Alerts**
- Skip ZFS SPARE devices unless they have actual errors
- SPARE devices are intentional and should not trigger alerts
- Updated in two locations:
- pkg/proxmox/zfs.go:154 (device filtering)
- internal/alerts/alerts.go:1077 (alert generation)
3. **Memory Display Granularity**
- Increase byte formatting precision from 0 to 1 decimal place
- Improves accuracy (e.g., "1.7 GB" instead of "1 GB" for 86% of 2GB)
- Location: frontend-modern/src/utils/format.ts:3
4. **Custom Alert Rules Evaluation**
- Add ReevaluateGuestAlert() method for proper threshold reevaluation
- Add comments explaining custom rules evaluation limitations
- Next poll cycle will properly clear stale alerts with new thresholds
Additional improvements:
- Fix ZFS pool alert locking to prevent deadlocks
- Prevent discovery service from running in mock mode
- Restore discovery service when exiting mock mode
Fixes#485
Make mock mode configuration part of the repository instead of a local-only
file. This ensures consistent mock mode behavior across all environments
(development, CI/CD, demo server) and makes it work out of the box for
new contributors.
Changes:
- Add mock.env to repository with sensible defaults (mock mode OFF by default)
- Support mock.env.local for personal overrides (gitignored)
- Update .gitignore to allow mock.env but exclude .local variants
- Backend loads mock.env then merges mock.env.local overrides
- hot-dev.sh loads both files in correct order
Benefits:
- New developers can clone and use mock mode immediately
- Demo server gets consistent mock configuration
- Personal preferences stay private in .local file
- No surprises - mock mode disabled by default in fresh clones
- CI/CD can use mock mode without custom configuration
Documentation:
- Updated README.md to explain mock.env is in repo
- Enhanced MOCK_MODE.md with local override instructions
- Updated claude.md with new configuration strategy
- Added mock.env.local.example for quick setup
Example workflow:
git clone <repo>
npm run mock:on # Works immediately with repo defaults
# Or create personal config:
cp docs/development/mock.env.local.example mock.env.local
# Edit mock.env.local with your preferences
Improve performance when serving /api/state in mock mode by optimizing
alert handling and JSON serialization.
Changes:
- Add UpdateAlertSnapshots() to cache alerts without blocking
- Use lazy population of alert snapshots to avoid lock contention
- Switch to json.Marshal for better performance with large payloads
- Add debug logging to track /api/state performance
- Simplify GetState() logic in mock mode
Performance improvements:
- Eliminates alert manager lock during /api/state requests
- Reduces JSON encoding overhead for large mock datasets
- Ensures sub-second response times even with 7 nodes and 90+ guests
Testing:
- Mock mode returns state instantly without blocking
- Alert snapshots populate correctly on first request
- Debug logs confirm fast execution path
Implement a hot-reloadable mock mode system that works seamlessly in both
development and production environments without requiring manual restarts
or port changes.
Key Features:
- Backend watches mock.env and auto-reloads when changed (via fsnotify + polling)
- npm commands for easy toggling: mock:on, mock:off, mock:status, mock:edit
- Works in both hot-dev mode and systemd deployments
- Reload completes in 2-5 seconds with no manual intervention
- No port changes or process restarts required
Implementation:
- Extended ConfigWatcher to monitor both .env and mock.env
- Added callback system to trigger ReloadableMonitor.Reload()
- Enhanced toggle-mock.sh to support both hot-dev and systemd modes
- Updated hot-dev.sh banner to show mock status and commands
- Created comprehensive documentation in docs/development/MOCK_MODE.md
Testing:
- Backend builds successfully
- Watcher initializes and monitors both files
- npm run mock:on/off toggles successfully
- mock.env updates correctly
- Scripts work in both hot-dev and systemd modes
Documentation:
- Added Mock Mode section to README.md
- Created detailed guide in docs/development/MOCK_MODE.md
- Updated claude.md with mock mode architecture and usage
Mock mode continues to return cached data instantly from memory
(no API calls, no locks, no timeouts), ensuring fast /api/state responses.
Additional safeguards to prevent dev/production config conflicts:
1. **hot-dev.sh**: Explicitly export PULSE_DATA_DIR before starting backend
- Ensures backend always uses /opt/pulse/tmp/dev-config in dev mode
- Prevents accidental fallback to /etc/pulse
- Adds logging to show which config directory is being used
2. **sync-production-config.sh**: Smart encryption key handling
- Never overwrites existing dev encryption key
- Warns if production key is newer (unusual scenario)
- Keeps dev key to avoid breaking encrypted configs
- Adds detailed logging of sync decisions
These changes ensure that when Vite restarts:
- Backend always uses the correct dev-config directory
- Sync script never breaks working dev configuration
- All decisions are logged clearly for debugging
Related to previous commit fixing nodes.enc corruption.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit addresses critical issues where nodes configuration was being
lost or corrupted, causing user frustration and data loss.
## Changes:
### 1. Sync Script Protection (sync-production-config.sh)
- Never overwrites newer dev config with older production files
- Validates timestamps before syncing
- Shows detailed logging of sync decisions
- Prevents accidental overwrites of working configuration
### 2. Timestamped Backups (persistence.go)
- Creates timestamped backup before EVERY save (e.g., nodes.enc.backup-20251001-073000)
- Maintains "latest" backup for quick recovery
- Auto-cleans old backups (keeps last 10)
- Ensures we can always recover from corruption
### 3. Empty Config Protection (persistence.go)
- BLOCKS attempts to save empty nodes config when existing nodes exist
- Prevents accidental data wipes
- Returns error with clear message about what was blocked
### 4. Enhanced Corruption Recovery (persistence.go)
- Detects "cipher: message authentication failed" errors
- Automatically attempts recovery from backup files
- Renames corrupted files with timestamps for forensics
- Logs detailed recovery process
### 5. Performance Logging (GuestRow.tsx)
- Added timing for individual metadata API calls
- Helps identify performance bottlenecks
## Why This Matters:
Previous behavior allowed:
- Corrupted files to overwrite working configs
- Empty configs to delete all nodes
- No way to recover from corruption
- Race conditions during rapid restarts
New behavior ensures:
- Multiple backup copies always exist
- Corruption auto-recovers from backups
- Empty saves are blocked
- Sync script validates before overwriting
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Corrected widespread misinformation claiming API tokens cannot access guest agent data on Proxmox 9.
Changes:
- Rewrote VM_DISK_MONITORING.md with accurate technical explanation
- Deleted VM_DISK_STATS_TROUBLESHOOTING.md (contained false information)
- Updated FAQ.md with correct quick reference and troubleshooting link
- Added comprehensive VM disk troubleshooting section to TROUBLESHOOTING.md
- Fixed README.md troubleshooting reference
- Updated frontend tooltip to show accurate permission requirements
- Corrected backend log messages to remove "known limitation" language
- Updated test-vm-disk.sh diagnostic script with accurate guidance
Key corrections:
- API tokens work fine for guest agent queries on both PVE 8 and 9
- Proxmox API returning disk=0 is normal behavior, not a bug
- Both tokens and passwords work equally well
- Only requirements: guest agent installed + proper permissions
- Permission issues are config problems, not authentication method limitations
Documentation now provides clear user journey: FAQ → Troubleshooting → Full Guide
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Added streaming discovery that shows servers as they're found
- Backend sends WebSocket updates for each discovered server
- Frontend displays servers immediately without waiting for full scan
- Created sync-production-config.sh to preserve nodes when switching modes
- Updated toggle-mock.sh to sync config when disabling mock mode
- Dev environment now maintains separate config that syncs from production
- Enabled discovery service in dev environment by default
addresses real-time discovery UX and mock/production mode configuration persistence
- Fix port conflict: backend now uses 7656, frontend uses 7655
- Fix mock mode not loading: use load_env_file for proper export
- Fix pipefail crashes on port checks: disable during lsof checks
- Add error handling for /etc/pulse/.env permission issues
- Update .gitignore to exclude sensitive files and temp scripts
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The alert acknowledgment endpoints were hanging because GetState() was called
synchronously to broadcast updates via WebSocket, which could take significant
time with many nodes/guests. This caused the HTTP response to timeout, showing
an error to users even though the alert was successfully acknowledged.
Fixed by:
- Sending HTTP response immediately after acknowledging the alert
- Moving WebSocket broadcast to a goroutine to avoid blocking
- Applied fix to all alert endpoints (acknowledge, unacknowledge, clear, bulk ops)
This resolves the issue where users saw 'Failed to acknowledge alert' errors
but the alert was actually acknowledged (disappeared on refresh).
- Skip auth check entirely in App.tsx for development
- Add .env.dev file with DISABLE_AUTH=true and PULSE_MOCK_MODE=true
- Update hot-dev.sh to load .env.dev environment variables
- This ensures the app loads immediately without auth issues
- WebSocket and API now work without authentication in dev mode
- Fixed PBS alert toggle not responding in thresholds settings
- PBS servers now use connectivity toggle like nodes instead of disabled toggle
- Added support for disableConnectivity flag on PBS instances in backend
- Fixed PBS ID format mismatch between frontend and backend
- PBS offline alerts now properly respect the disableConnectivity setting
- Prevents spam alerts by checking disableConnectivity flag for PBS offline alerts