Commit Graph

324 Commits

Author SHA1 Message Date
rcourtman
105215b5c3 Improve sensor proxy socket verification 2025-11-14 20:14:25 +00:00
rcourtman
1068eafbfa Related to #710: harden Windows installer arch detection 2025-11-14 10:50:56 +00:00
rcourtman
93cde2439d docs: highlight runbooks in index and script verification checklist 2025-11-14 10:39:10 +00:00
rcourtman
36429f6540 Related to #701: improve sensor proxy installer caching 2025-11-14 00:51:54 +00:00
rcourtman
3553977141 Refine Windows host installer logging (related to #709) 2025-11-13 23:09:22 +00:00
rcourtman
61f011af1d Improve temperature proxy diagnostics and tests 2025-11-13 22:31:53 +00:00
rcourtman
59c630633f Add localhost to allowed_source_subnets for self-monitoring
The HTTP mode installer now includes 127.0.0.1/32 in allowed_source_subnets
to permit self-monitoring queries from localhost. This fixes 403 Forbidden
errors when nodes query their own sensor-proxy instance.

Related to HTTP mode implementation for external PVE hosts.
2025-11-13 19:34:11 +00:00
rcourtman
aa357e5013 Fix HTTP mode for pulse-sensor-proxy and improve installer safety
## HTTP Server Fixes
- Add source IP middleware to enforce allowed_source_subnets
- Fix missing source subnet validation for external HTTP requests
- HTTP health endpoint now respects subnet restrictions

## Installer Improvements
- Auto-configure allowed_source_subnets with Pulse server IP
- Add cluster node hostnames to allowed_nodes (not just IPs)
- Fix node validation to accept both hostnames and IPs
- Add Pulse server reachability check before installation
- Add port availability check for HTTP mode
- Add automatic rollback on service startup failure
- Add HTTP endpoint health check after installation
- Fix config backup and deduplication (prevent duplicate keys)
- Fix IPv4 validation with loopback rejection
- Improve registration retry logic with detailed errors
- Add automatic LXC bind mount cleanup on uninstall

## Temperature Collection Fixes
- Add local temperature collection for self-monitoring nodes
- Fix node identifier matching (use hostname not SSH host)
- Fix JSON double-encoding in HTTP client response

Related to #XXX (temperature monitoring fixes)
2025-11-13 18:22:36 +00:00
rcourtman
559dfd0235 Add HTTP mode support to sensor-proxy installer
Implements complete HTTP mode installation workflow for external PVE hosts.

New installer features:
- `--http-mode` flag: Enable HTTP server mode for remote temperature monitoring
- `--http-addr <addr>` flag: Configure listen address (default :8443)
- Auto-generates self-signed TLS certificates (4096-bit RSA, 10-year validity)
- Registers with Pulse API and receives authentication token
- Configures systemd service with proper security hardening

Installation workflow (HTTP mode):
1. Validate --pulse-server parameter is provided
2. Generate TLS certificate with SAN (hostname + IPs)
3. Call Pulse API POST /api/temperature-proxy/register
4. Receive and store auth token securely (mode 600)
5. Append HTTP config to config.yaml
6. Update systemd service with TLS paths
7. Start service

TLS certificate generation:
- Uses openssl req with RSA 4096-bit keys
- 10-year validity period
- SubjectAltName includes hostname + all IPs
- Files stored in /etc/pulse-sensor-proxy/tls/
- Permissions: 640 root:pulse-sensor-proxy
- Logs SHA256 fingerprint for audit

API registration:
- Calls POST /api/temperature-proxy/register
- Payload: {"hostname": "...", "proxy_url": "https://..."}
- Response: {"token": "...", "pve_instance": "..."}
- Aborts installation on registration failure (fail-fast)
- Token stored in config.yaml

Systemd service updates:
- Adds ReadOnlyPaths=/etc/pulse-sensor-proxy/tls for HTTP mode
- RestrictAddressFamilies already includes AF_INET/AF_INET6
- Maintains all existing security hardening

Error handling:
- Validates required parameters before starting
- Aborts on TLS generation failure
- Aborts on API registration failure
- Provides actionable troubleshooting guidance
- Logs clear error messages

Security:
- Tokens stored with mode 600, owned by service user
- TLS keys protected with mode 640
- Service runs as unprivileged pulse-sensor-proxy user
- Full systemd hardening maintained

Usage example:
  curl -fsSL https://pulse-server/download/install-sensor-proxy.sh | \
    bash -s -- --http-mode --pulse-server https://pulse.example.com:7655

Related to #571
2025-11-13 16:33:12 +00:00
rcourtman
aba0eabd21 Fix #571: Installer now auto-configures allowed_nodes for temperature monitoring
When cluster IPC validation fails (due to systemd hardening), the proxy
falls back to allowlist-based validation. The installer now automatically
populates allowed_nodes with:

- Cluster mode: All discovered cluster member IPs
- Standalone mode: localhost IP addresses (including 127.0.0.1/localhost)
- Fallback mode: localhost IPs when pvecm unavailable

This ensures out-of-the-box temperature monitoring works on fresh installs
without manual configuration.
2025-11-13 15:37:30 +00:00
rcourtman
4bb8ab15a7 Fix temperature monitoring for clustered and LXC Proxmox environments (addresses #571)
Root cause: pulse-sensor-proxy runs with strict systemd hardening that prevents
access to Proxmox corosync IPC (abstract UNIX sockets). When pvecm fails with
IPC errors, the code incorrectly treated it as "standalone mode" and only
discovered localhost addresses, rejecting legitimate cluster members and external
nodes.

Changes:

1. **Distinguish IPC failures from true standalone mode**
   - Detect ipcc_send_rec and access control list errors specifically
   - These indicate a cluster exists but isn't accessible (LXC, systemd restrictions)
   - Return error to disable cluster validation instead of misusing standalone logic

2. **Graceful degradation when cluster validation fails**
   - When cluster IPC is unavailable, fall through to permissive mode
   - Log debug message suggesting allowed_nodes configuration
   - Allows requests to proceed rather than blocking all temperature monitoring

3. **Improve local address discovery for true standalone nodes**
   - Use Go's native net.Interfaces() instead of shelling out to 'ip addr'
   - More reliable and works with AF_NETLINK restrictions
   - Add helpful logging when only hostnames are discovered

4. **Systemd hardening adjustments**
   - Add AF_NETLINK to RestrictAddressFamilies (for net.Interfaces())
   - Remove RemoveIPC=true (attempted fix for corosync, insufficient)
   - Add ReadWritePaths=-/run/corosync (optional path, corosync uses abstract sockets anyway)

Result: Temperature monitoring now works in:
- Clustered Proxmox hosts (falls back to permissive when IPC blocked)
- LXC containers (correctly detects IPC failure, allows requests)
- Standalone nodes (proper local address discovery with IPs)

Workaround for maximum security: Configure allowed_nodes in /etc/pulse-sensor-proxy/config.yaml
when cluster validation cannot be used.
2025-11-13 13:25:27 +00:00
rcourtman
573851a388 Fix temperature monitoring on standalone Proxmox nodes (addresses #571)
Root cause: The systemd service hardening blocked AF_NETLINK sockets,
preventing IP address discovery on standalone nodes. The proxy could
only discover hostnames, causing node_not_cluster_member rejections
when users configured Pulse with IP addresses.

Changes:
1. Add AF_NETLINK to RestrictAddressFamilies in all systemd services
   - pulse-sensor-proxy.service
   - install-sensor-proxy.sh (both modes)
   - pulse-sensor-cleanup.service

2. Replace shell-based 'ip addr' with Go native net.Interfaces() API
   - More reliable and doesn't require external commands
   - Works even with strict systemd restrictions
   - Properly filters loopback, link-local, and down interfaces

3. Improve error logging and user guidance
   - Warn when no IP addresses can be discovered
   - Provide clear instructions about allowed_nodes workaround
   - Include address counts in logs for debugging

This fix ensures standalone Proxmox nodes can properly validate
temperature requests by IP address without requiring manual
allowed_nodes configuration.
2025-11-13 13:02:15 +00:00
rcourtman
ac94f54b54 Fix remote sync check in release trigger script
- Replace unreliable git fetch --dry-run check
- Use git rev-parse to compare local and remote commits
- Prevents false warnings about diverged branches
2025-11-13 11:43:36 +00:00
rcourtman
dd33891095 Add pre-flight validation script for releases
- Check VERSION file matches before triggering workflow
- Validate working directory is clean
- Confirm on main branch and up to date
- Load release notes from /tmp/release_notes_X.Y.Z.md
- Prevents wasting CI time on misconfigured releases
2025-11-13 11:36:53 +00:00
rcourtman
d3875eaae5 Dramatically improve temperature proxy installation robustness
Users were abandoning Pulse due to catastrophic temperature monitoring setup failures. This commit addresses the root causes:

**Problem 1: Silent Failures**
- Installations reported "SUCCESS" even when proxy never started
- UI showed green checkmarks with no temperature data
- Zero feedback when things went wrong

**Problem 2: Missing Diagnostics**
- Service failures logged only in journald
- Users saw "Something going on with the proxy" with no actionable guidance
- No way to troubleshoot from error messages

**Problem 3: Standalone Node Issues**
- Proxy daemon logged continuous pvecm errors as warnings
- "ipcc_send_rec" and "Unknown error -1" messages confused users
- These are expected for non-clustered/LXC setups

**Solutions Implemented:**

1. **Health Gate in install.sh (lines 1588-1629)**
   - Verify service is running after installation
   - Check socket exists on host
   - Confirm socket visible inside container via bind mount
   - Fail loudly with specific diagnostics if any check fails

2. **Actionable Error Messages in install-sensor-proxy.sh (lines 822-877)**
   - When service fails to start: dump full systemctl status + 40 lines of logs
   - When socket missing: show permissions, service status, and remediation command
   - Include common issues checklist (missing user, permission errors, lm-sensors, etc.)
   - Direct link to troubleshooting docs

3. **Better Standalone Node Detection in ssh.go (lines 585-595)**
   - Recognize "Unknown error -1" and "Unable to load access control list" as LXC indicators
   - Log at INFO level (not WARN) since this is expected behavior
   - Clarify message: "using localhost for temperature collection"

**Impact:**
- Eliminates "green checkmark but no temps" scenario
- Users get immediate actionable feedback on failures
- Standalone/LXC installations work silently without error spam
- Reduces support burden from #571 (15+ comments of user frustration)

Related to #571
2025-11-13 10:14:19 +00:00
rcourtman
60de1cade6 Polish release notes fallback 2025-11-13 09:10:43 +00:00
rcourtman
57c58eb7a6 Add deterministic release notes fallback 2025-11-13 00:00:25 +00:00
rcourtman
85217afb1d Improve release notes fallback 2025-11-12 23:40:26 +00:00
rcourtman
305a5b17bc Handle Snap Docker home restrictions (Related to #693) 2025-11-12 19:20:04 +00:00
rcourtman
18bf8d2b3a Handle docker validation under errexit
Related to #693
2025-11-12 18:34:53 +00:00
rcourtman
4a0637957c Related to #698: harden installer release detection 2025-11-12 17:56:16 +00:00
rcourtman
798f91792b Improve sensor-proxy release detection (related to #701) 2025-11-12 17:49:20 +00:00
rcourtman
95c8fe9c2d Add Snap Docker support to install-docker-agent.sh
Snap-installed Docker does not automatically create a docker group,
causing permission denied errors when the pulse-docker service user
tries to access /var/run/docker.sock.

Changes:
- Auto-detect Snap Docker installations
- Create docker group if missing when Snap Docker is detected
- Restart Snap Docker after group creation to refresh socket ACLs
- Add socket access validation before starting the service
- Handle symlinked Docker sockets in systemd unit ReadWritePaths
- Document troubleshooting steps in DOCKER_MONITORING.md
2025-11-11 23:07:29 +00:00
rcourtman
8a5be2c0da Release workflow guardrails (related to #695) 2025-11-11 22:34:00 +00:00
rcourtman
535ae1c70f Fix Windows/macOS host agent downloads for bare metal installs (related to #684)
Bare metal installations couldn't serve Windows host agent downloads because
the Windows and macOS binaries weren't included in the universal tarball. The
download endpoint would return 404 when Windows users tried to install the
host agent from a bare metal Pulse deployment (Proxmox LXC, Debian VM, etc.).

Changes:
- build-release.sh: Copy Windows/macOS host agent binaries into universal tarball
- build-release.sh: Create symlinks for Windows binaries without .exe extension
- validate-release.sh: Add Windows 386 binary and symlink to Docker validation
- validate-release.sh: Add explicit validation that universal tarball contains all Windows/macOS binaries

The universal tarball now matches the Docker image, ensuring both deployment
methods can serve the complete set of downloadable binaries for the /download/
endpoint.
2025-11-11 21:26:33 +00:00
rcourtman
399534dc7f Fix SELinux compatibility in host agent installer
Replace mv with install command to ensure correct SELinux context.
The mv command preserves the user_tmp_t label from /tmp, which
prevents systemd from executing the binary on SELinux systems.

The install command creates a new file with the correct label for
/usr/local/bin. Added automatic restorecon call for SELinux systems
to ensure policy compliance.

Related to #688
2025-11-11 21:13:33 +00:00
rcourtman
58717c759a Generate both checksums.txt and .sha256 files for backward compatibility
Following best practices for release format transitions:
- build-release.sh now generates both formats from same sha256sum run
- Workflow uploads both checksums.txt and individual .sha256 files
- Validation ensures both formats exist and match

This provides a safe transition period for users with older install scripts
while maintaining the cleaner checksums.txt format going forward. After 2-3
releases when most users have updated scripts, we can remove .sha256 generation.

Related: Install script already supports both formats (falls back gracefully).
2025-11-11 20:31:15 +00:00
rcourtman
c7895839fb Fix validation: Linux host-agent binaries are in main tarballs
Linux host-agent binaries don't have separate archives - they're included in
the main pulse-v*.tar.gz files. Only macOS and Windows have separate archives.
2025-11-11 19:25:14 +00:00
rcourtman
3ea15b1e79 Update validation script to match new asset list
Removed validation checks for standalone binaries that are no longer
uploaded to GitHub releases. These binaries are only needed in Docker
images for the /download/ endpoint.

Updated required assets list to include all versioned tarballs/zips
instead of standalone binaries.
2025-11-11 17:50:02 +00:00
rcourtman
19e86f4560 Reduce release assets by removing duplicates
Removed:
- Individual .sha256 files (checksums.txt already contains all checksums)
- Standalone binaries without version numbers (users should download versioned tarballs/zips)

Standalone binaries are only needed in Docker images for the /download/ endpoint.
GitHub releases should only contain versioned archives for user downloads.

This reduces release assets from ~54 files to ~19 files per release.
2025-11-11 17:26:00 +00:00
rcourtman
b00b176b9a Exclude development/infrastructure changes from release notes
Users don't care about CI/CD improvements, release workflows, build
processes, or testing infrastructure. Only include user-visible changes.

Related to #671
2025-11-11 17:18:50 +00:00
rcourtman
7ad8d8310b Remove commit hashes from LLM-generated release notes
Commit hashes clutter the release notes and aren't useful for end users.
Only include issue references when explicitly mentioned in commits.

Related to #671
2025-11-11 17:11:02 +00:00
rcourtman
583d21bdf9 Fix commit hash linking in release notes
Remove # symbol from commit hash references so GitHub auto-links them.
Format: (abc123) instead of (#abc123)
Issue references still use #: (#123)

Related to #671
2025-11-11 17:03:39 +00:00
rcourtman
151aaceafc Update release notes template to match established format
- Use exact template format from v4.28.0 and prior releases
- Include all standard sections: New Features, Bug Fixes, Improvements, Breaking Changes
- Add complete installation instructions (systemd, Docker, Manual Binary, Helm)
- Include Downloads section with all artifact types
- Add Notes section for important highlights and upgrade considerations
- Ensure LLM outputs format exactly matching previous releases

Related to #671 (automated release workflow)
2025-11-11 14:05:15 +00:00
rcourtman
c7b64685e0 Add LLM-powered release notes generation
- Create scripts/generate-release-notes.sh to auto-generate release notes from git commits
- Supports both Anthropic Claude and OpenAI APIs
- Uses Claude Haiku 4.5 (claude-haiku-4-5-20251001) for cost efficiency ($1/$5 per million tokens)
- Falls back to OpenAI gpt-4o-mini if Anthropic key not available
- Integrates into release workflow between validation and release creation
- Compares current version with previous git tag to generate changelog
- Outputs categorized, user-friendly release notes with installation instructions

Workflow now automatically:
1. Finds previous release tag
2. Analyzes all commits since last release
3. Generates structured release notes via LLM
4. Uses generated notes for draft release body

Requires ANTHROPIC_API_KEY or OPENAI_API_KEY in GitHub secrets.

Related to #671 (automated release workflow)
2025-11-11 14:01:34 +00:00
rcourtman
d5e67d8e6b Fix critical release workflow issues identified in review
Addresses 3 critical issues from 4-dev team review:

1. CRITICAL: Fix non-deterministic checksum generation (Dev 2 & 3)
   - Add explicit sorting to checksums.txt generation
   - Prevents #671 checksum mismatches between builds
   - Location: scripts/build-release.sh:348

2. CRITICAL: Fix upload/validation race condition (Dev 1)
   - Change validation trigger from 'release: created' to 'workflow_run'
   - Prevents validation from running while assets still uploading
   - Prevents valid releases from being incorrectly deleted
   - Location: .github/workflows/validate-release-assets.yml:4-8

3. CRITICAL: Fix GitHub token exposure in logs (Dev 1)
   - Replace curl commands with gh CLI
   - Prevents token leakage in workflow logs
   - Location: .github/workflows/validate-release-assets.yml:44, 63

All three issues were blocking issues that could cause release failures.
Remaining high/medium priority issues to be addressed in follow-up PRs.
2025-11-11 11:32:44 +00:00
rcourtman
d472b25a2b Fix validate-release.sh path issues after pushd
The script does pushd into RELEASE_DIR, so tarball paths should not include
the RELEASE_DIR prefix. Also fixed checksum validation glob patterns to
exclude .sha256 files from matching.
2025-11-11 10:54:00 +00:00
rcourtman
8ee3f12efb Fix validation script to check for ./ prefix in tarballs
Tarballs are created with ./bin/pulse paths (relative from inside staging dir)
but validation was looking for bin/pulse paths. Updated all tar -tzf checks
to use correct ./ prefix.
2025-11-11 10:43:26 +00:00
rcourtman
741254e20c Fix validate-release.sh to use RELEASE_DIR path prefix
The validation script was looking for tarballs in the current directory
instead of the release/ directory, causing all validations to fail.
Now properly prepends $RELEASE_DIR to all file paths.
2025-11-11 10:32:36 +00:00
Claude
b3f220f1a1 Add automated release workflow with validation
This commit introduces a comprehensive GitHub Actions workflow for
creating releases, ensuring all artifacts are validated before upload.

Changes:
- Add .github/workflows/release.yml: Manual workflow_dispatch trigger
  that builds, validates, and creates draft releases
- Update scripts/validate-release.sh: Add --skip-docker flag to allow
  validation without Docker image checks

Key features:
- Validation runs BEFORE any assets are uploaded
- If validation fails, no release is created
- checksums.txt and artifacts come from the same build
- No manual steps between validation and upload
- Checksums uploaded first, then all other assets
- Creates draft release for manual review before publishing

The workflow ensures that checksums.txt cannot drift from binaries
by running the entire build-validate-upload pipeline atomically.
2025-11-11 09:22:03 +00:00
rcourtman
5bac91a664 Fix pulse-sensor-proxy configuration not applied in LXC containers (related to #600)
This fixes two bugs that prevented temperature monitoring from working
after running install-sensor-proxy.sh on LXC deployments:

1. CRITICAL: Pulse service not restarted after systemd override
   - The installer wrote PULSE_SENSOR_PROXY_SOCKET env var to systemd
     drop-in and ran daemon-reload, but never restarted Pulse service
   - Running Pulse instances continued using old environment variables
   - Temperatures wouldn't work until manual Pulse restart
   - Now: Automatically restart Pulse if running after writing override

2. Added guard to check if Pulse service exists before configuring
   - Installer would write systemd override even if Pulse not installed
   - Left orphaned drop-in files that confused users
   - Now: Check if pulse.service exists, warn and skip if not found

3. MINOR: Fix inconsistent Docker mount instructions
   - docker-compose.yml showed :ro (read-only) mount
   - Installer output showed :rw (read-write) mount
   - Changed installer to match compose file (:ro is correct and secure)

Impact: Users in #600 reported "socketFound=false" even after running
installer successfully. This was because Pulse never picked up the new
socket path without a restart.
2025-11-09 16:44:08 +00:00
rcourtman
23ce2c6d11 Add support for Windows 32-bit (windows-386) architecture (related to #674)
Adds build support for 32-bit Windows (windows-386) for pulse-host-agent.

Changes:
- Add windows-386 build to Dockerfile host-agent build section
- Add windows-386 binary copy and symlink to Dockerfile
- Add windows-386 build to build-release.sh
- Add windows-386 zip package to release artifacts
- Include windows-386 binary in standalone binary copies

This enables pulse-host-agent to run on 32-bit Windows systems, which are still relevant in legacy/industrial monitoring environments through late 2025.
2025-11-09 08:57:30 +00:00
rcourtman
4834dea05b Add support for linux-386 and linux-armv6 architectures (related to #674)
Adds build support for 32-bit x86 (i386/i686) and ARMv6 (older Raspberry Pi models) architectures across all agents and install scripts.

Changes:
- Add linux-386 and linux-armv6 to build-release.sh builds array
- Update Dockerfile to build docker-agent, host-agent, and sensor-proxy for new architectures
- Update all install scripts to detect and handle i386/i686 and armv6l architectures
- Add architecture normalization in router download endpoints
- Update update manager architecture mapping
- Update validate-release.sh to expect 24 binaries (was 18)

This enables Pulse agents to run on older/legacy hardware including 32-bit x86 systems and Raspberry Pi Zero/Zero W devices.
2025-11-09 08:35:24 +00:00
rcourtman
334b8c727f Fix SMART temperature collection on smartctl 7.4+ (related to #672)
Fixes two critical bugs in refresh_smart_cache() that prevented SMART
temperature collection from working:

1. Invalid smartctl parameter: Changed -n standby,after to -n standby
   The 'after' parameter is not valid in smartctl 7.4 and causes:
   "INVALID ARGUMENT TO -n: standby,after"
   Valid syntax is standby[,STATUS[,STATUS2]] where STATUS must be numeric.

2. Broken process detection: Replaced exec -a with lock file approach
   The original exec -a pulse-sensor-wrapper-refresh bash line replaced
   the subshell with a new bash process that had no script to run, causing
   the function to exit immediately without collecting any SMART data.

   New approach uses a lock file ($CACHE_DIR/smart-refresh.lock) with
   trap-based cleanup to prevent concurrent refresh operations.

Credits to @ZaDarkSide for identifying these issues in PR #672.
2025-11-08 23:40:43 +00:00
rcourtman
16c29463f9 Fix Windows host agent installer reliability (related to #654)
The download endpoint had a dangerous fallback that silently served the
wrong binary when the requested platform/arch combination was missing.
If a Docker image shipped without Windows binaries, the installer would
receive a Linux ELF instead of a Windows PE, causing ERROR_BAD_EXE_FORMAT.

Changes:
- Download handler now operates in strict mode when platform+arch are
  specified, returning 404 instead of serving mismatched binaries
- PowerShell installer validates PE header (MZ signature)
- PowerShell installer verifies PE machine type matches requested arch
- PowerShell installer fetches and verifies SHA256 checksums
- PowerShell installer shows diagnostic info: OS arch, download URL,
  file size for better troubleshooting

This prevents silent failures and provides clear error messages when
binaries are missing or corrupted.
2025-11-07 22:55:03 +00:00
rcourtman
679225510e Silence broken pipe error in sensor proxy self-heal script (related to #628)
The self-heal timer runs 'systemctl list-unit-files | grep -q' every hour.
When grep matches and exits early, systemctl logs "Failed to print table:
Broken pipe" to syslog. This is cosmetic but floods Proxmox logs and
confuses operators.

Changes:
- Redirect stderr from systemctl to /dev/null
- Prevents the broken pipe message from reaching syslog
- Self-heal functionality unchanged

This addresses the concern raised in discussion #628.
2025-11-07 17:46:23 +00:00
rcourtman
97f9de6c95 feat(security): Enhance systemd hardening directives
Adds additional systemd security directives for defense in depth:
- MemoryDenyWriteExecute=true (prevents RWX memory)
- RestrictRealtime=true (denies realtime scheduling)
- ProtectHostname=true (hostname protection)
- ProtectKernelLogs=true (kernel log protection)
- SystemCallArchitectures=native (native syscalls only)

These directives provide additional layers to slow/prevent
post-compromise exploitation of the proxy process.

Related to security audit 2025-11-07.
2025-11-07 17:09:47 +00:00
rcourtman
32e0d453c4 Add Windows ARM64 support for host agent (related to #654)
Windows 11 25H2 ships exclusively on ARM64 hardware. When users on ARM64
attempt to install the host agent, the Service Control Manager fails to
load the amd64 binary with ERROR_BAD_EXE_FORMAT, surfaced as "The Pulse
Host Agent is not compatible with this Windows version".

Changes:
- Dockerfile: Build pulse-host-agent-windows-arm64.exe alongside amd64
- Dockerfile: Copy windows-arm64 binary and create symlink for download endpoint
- install-host-agent.ps1: Use RuntimeInformation.OSArchitecture to detect ARM64
- build-release.sh: Build darwin-amd64, darwin-arm64, windows-amd64, windows-arm64
- build-release.sh: Package Windows binaries as .zip archives
- validate-release.sh: Check for windows-arm64 binary and symlink
- validate-release.sh: Add architecture validation for all darwin/windows variants

The installer now correctly detects ARM64 and downloads the appropriate binary.
2025-11-07 12:18:57 +00:00
rcourtman
2a79d57f73 Add SMART temperature collection for physical disks (related to #652)
Extends temperature monitoring to collect SMART temps for SATA/SAS disks,
addressing issue #652 where physical disk temperatures showed as empty.

Architecture:
- Deploys pulse-sensor-wrapper.sh as SSH forced command on Proxmox nodes
- Wrapper collects both CPU/GPU temps (sensors -j) and disk temps (smartctl)
- Implements 30-min cache with background refresh to avoid performance impact
- Uses smartctl -n standby,after to skip sleeping drives without waking them
- Returns unified JSON: {sensors: {...}, smart: [...]}

Backend changes:
- Add DiskTemp model with device, serial, WWN, temperature, lastUpdated
- Extend Temperature model with SMART []DiskTemp field and HasSMART flag
- Add WWN field to PhysicalDisk for reliable disk matching
- Update parseSensorsJSON to handle both legacy and new wrapper formats
- Rewrite mergeNVMeTempsIntoDisks to match SMART temps by WWN → serial → devpath
- Preserve legacy NVMe temperature support for backward compatibility

Performance considerations:
- SMART data cached for 30 minutes per node to avoid excessive smartctl calls
- Background refresh prevents blocking temperature requests
- Respects drive standby state to avoid spinning up idle arrays
- Staggered disk scanning with 0.1s delay to avoid saturating SATA controllers

Install script:
- Deploys wrapper to /usr/local/bin/pulse-sensor-wrapper.sh
- Updates SSH forced command from "sensors -j" to wrapper script
- Backward compatible - falls back to direct sensors output if wrapper missing

Testing note:
- Requires real hardware with smartmontools installed for full functionality
- Empty smart array returned gracefully when smartctl unavailable
- Legacy sensor-only nodes continue working without changes
2025-11-07 11:46:57 +00:00
rcourtman
fa7ca00250 Fix duplicate checksum in build-release.sh
The checksum generation was including pulse-host-agent-v*-darwin-arm64.tar.gz
twice: once from the *.tar.gz pattern and once from the pulse-host-agent-*
pattern. Fixed by using extglob to exclude .tar.gz and .sha256 files from
the agent binary patterns since tarballs are already matched separately.
2025-11-06 22:19:16 +00:00