feat: Add temperature collection to pulse-host-agent (related to #661)

Implements temperature monitoring in pulse-host-agent to support Docker-in-VM
deployments where the sensor proxy socket cannot cross VM boundaries.

Changes:
- Create internal/sensors package with local collection and parsing
- Add temperature collection to host agent (Linux only, best-effort)
- Support CPU package/core, NVMe, and GPU temperature sensors
- Update TEMPERATURE_MONITORING.md with Docker-in-VM setup instructions
- Update HOST_AGENT.md to document temperature feature

The host agent now automatically collects temperature data on Linux systems
with lm-sensors installed. This provides an alternative path for temperature
monitoring when running Pulse in a VM, avoiding the unix socket limitation.

Temperature collection is best-effort and fails gracefully if lm-sensors is
not available, ensuring other metrics continue to be reported.

Related to #661
This commit is contained in:
rcourtman
2025-11-07 22:54:40 +00:00
parent cb9d8d1ab1
commit 2b7492ac59
5 changed files with 411 additions and 7 deletions

View File

@@ -2,12 +2,30 @@
The Pulse host agent extends monitoring to standalone servers that do not expose
Proxmox or Docker APIs. With it you can surface uptime, OS metadata, CPU load,
memory/disk utilisation, and connection health for any Linux, macOS, or Windows
machine alongside the rest of your infrastructure. Starting in v4.26.0 the
installer handshakes with Pulse in real time so you can confirm registration
from the UI and receive host-agent alerts alongside your existing
memory/disk utilisation, temperature sensors, and connection health for any Linux,
macOS, or Windows machine alongside the rest of your infrastructure. Starting in
v4.26.0 the installer handshakes with Pulse in real time so you can confirm
registration from the UI and receive host-agent alerts alongside your existing
Docker/Proxmox notifications.
## Temperature Monitoring
The host agent automatically collects temperature data on Linux systems with lm-sensors installed:
- **CPU Package Temperature**: Overall CPU temperature
- **Per-Core Temperatures**: Individual CPU core readings
- **NVMe Drive Temperatures**: SSD thermal data
- **GPU Temperatures**: AMD and NVIDIA GPU sensors
Temperature data appears in the **Servers** tab alongside other host metrics. This is particularly useful for monitoring Proxmox hosts when running Pulse in a VM (where the sensor proxy socket cannot cross VM boundaries).
**Requirements:**
- Linux operating system
- lm-sensors package installed (`apt-get install lm-sensors`)
- Sensors configured (`sensors-detect --auto`)
Temperature collection is automatic and best-effort. If lm-sensors is not installed or sensors are unavailable, the agent continues reporting other metrics normally.
## Prerequisites
- Pulse v4.26.0 or newer (host agent reporting shipped with `/api/agents/host/report`)

View File

@@ -16,14 +16,40 @@ Pulse can display real-time CPU and NVMe temperatures directly in your dashboard
> **Important:** Temperature monitoring setup differs by deployment type:
> - **LXC containers:** Fully automatic via the setup script (Settings → Nodes → Setup Script)
> - **Docker containers:** Requires manual proxy installation (see below)
> - **Docker containers:** Requires manual proxy installation (see below) OR use pulse-host-agent
> - **Docker in VM:** Use pulse-host-agent on the Proxmox host (see [Docker in VM Setup](#docker-in-vm-setup))
> - **Native installs:** Direct SSH, no proxy needed
>
> **For automation (Ansible/Terraform/etc.):** Jump to [Automation-Friendly Installation](#automation-friendly-installation)
## Docker in VM Setup
**Running Pulse in Docker inside a VM on Proxmox?** The proxy socket cannot cross VM boundaries, so use pulse-host-agent instead.
pulse-host-agent runs natively on your Proxmox host and reports temperatures back to Pulse over HTTPS. This works across VM boundaries without requiring socket mounts or SSH configuration.
**Setup steps:**
1. Install lm-sensors on your Proxmox host (if not already installed):
```bash
apt-get update && apt-get install -y lm-sensors
sensors-detect --auto
```
2. Install pulse-host-agent on your Proxmox host:
```bash
# Generate an API token in Pulse (Settings → Security → API Tokens) with host-agent:report scope
curl -fsSL http://your-pulse-vm:7655/install-host-agent.sh | \
bash -s -- --url http://your-pulse-vm:7655 --token YOUR_API_TOKEN
```
3. Verify temperatures appear in Pulse UI under the Servers tab
The host agent will report CPU, NVMe, and GPU temperatures alongside other system metrics. No proxy installation or socket mounting needed.
## Quick Start for Docker Deployments
**Running Pulse in Docker?** Temperature monitoring requires installing a small service on your Proxmox host that reads hardware sensors. The Pulse container connects to this service through a shared socket.
**Running Pulse in Docker directly on Proxmox?** Temperature monitoring requires installing a small service on your Proxmox host that reads hardware sensors. The Pulse container connects to this service through a shared socket.
**Why this is needed:** Docker containers cannot directly access hardware sensors. The proxy runs on your Proxmox host where it has access to sensor data, then shares that data with the Pulse container through a secure connection.

View File

@@ -13,6 +13,7 @@ import (
"time"
"github.com/rcourtman/pulse-go-rewrite/internal/hostmetrics"
"github.com/rcourtman/pulse-go-rewrite/internal/sensors"
agentshost "github.com/rcourtman/pulse-go-rewrite/pkg/agents/host"
"github.com/rs/zerolog"
gohost "github.com/shirou/gopsutil/v4/host"
@@ -220,6 +221,9 @@ func (a *Agent) buildReport(ctx context.Context) (agentshost.Report, error) {
return agentshost.Report{}, fmt.Errorf("collect metrics: %w", err)
}
// Collect temperature data (best effort - don't fail if unavailable)
sensorData := a.collectTemperatures(collectCtx)
report := agentshost.Report{
Agent: agentshost.AgentInfo{
ID: a.agentID,
@@ -248,7 +252,7 @@ func (a *Agent) buildReport(ctx context.Context) (agentshost.Report, error) {
},
Disks: append([]agentshost.Disk(nil), snapshot.Disks...),
Network: append([]agentshost.NetworkInterface(nil), snapshot.Network...),
Sensors: agentshost.Sensors{},
Sensors: sensorData,
Tags: append([]string(nil), a.cfg.Tags...),
Timestamp: time.Now().UTC(),
}
@@ -304,3 +308,64 @@ func isLoopback(flags []string) bool {
}
return false
}
// collectTemperatures attempts to collect temperature data from the local system.
// Returns an empty Sensors struct if collection fails (best-effort).
func (a *Agent) collectTemperatures(ctx context.Context) agentshost.Sensors {
// Only collect on Linux for now (lm-sensors is Linux-specific)
if a.platform != "linux" {
return agentshost.Sensors{}
}
// Collect sensor JSON output
jsonOutput, err := sensors.CollectLocal(ctx)
if err != nil {
a.logger.Debug().Err(err).Msg("Failed to collect sensor data (lm-sensors may not be installed)")
return agentshost.Sensors{}
}
// Parse the sensor output
tempData, err := sensors.Parse(jsonOutput)
if err != nil {
a.logger.Debug().Err(err).Msg("Failed to parse sensor data")
return agentshost.Sensors{}
}
if !tempData.Available {
a.logger.Debug().Msg("No temperature sensors available on this system")
return agentshost.Sensors{}
}
// Convert to host agent sensor format
result := agentshost.Sensors{
TemperatureCelsius: make(map[string]float64),
}
// Add CPU package temperature
if tempData.CPUPackage > 0 {
result.TemperatureCelsius["cpu_package"] = tempData.CPUPackage
}
// Add individual core temperatures
for coreName, temp := range tempData.Cores {
// Normalize core name (e.g., "Core 0" -> "cpu_core_0")
normalizedName := strings.ToLower(strings.ReplaceAll(coreName, " ", "_"))
result.TemperatureCelsius["cpu_"+normalizedName] = temp
}
// Add NVMe temperatures
for nvmeName, temp := range tempData.NVMe {
result.TemperatureCelsius[nvmeName] = temp
}
// Add GPU temperatures
for gpuName, temp := range tempData.GPU {
result.TemperatureCelsius[gpuName] = temp
}
a.logger.Debug().
Int("temperatureCount", len(result.TemperatureCelsius)).
Msg("Collected temperature data")
return result
}

View File

@@ -0,0 +1,47 @@
package sensors
import (
"context"
"fmt"
"os/exec"
"strings"
"time"
)
// CollectLocal reads sensor data from the local machine using lm-sensors.
// Returns the raw JSON output from `sensors -j` or an error if sensors is not available.
func CollectLocal(ctx context.Context) (string, error) {
// Check if sensors command exists
if _, err := exec.LookPath("sensors"); err != nil {
return "", fmt.Errorf("lm-sensors not installed: %w", err)
}
// Create context with timeout
cmdCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
// Run sensors -j command
// sensors exits non-zero when optional subfeatures fail; "|| true" keeps the JSON for parsing
cmd := exec.CommandContext(cmdCtx, "sh", "-c", "sensors -j 2>/dev/null || true")
output, err := cmd.Output()
if err != nil {
return "", fmt.Errorf("failed to execute sensors: %w", err)
}
outputStr := strings.TrimSpace(string(output))
if outputStr == "" || outputStr == "{}" {
// Try Raspberry Pi temperature method as fallback
cmd = exec.CommandContext(cmdCtx, "cat", "/sys/class/thermal/thermal_zone0/temp")
if rpiOutput, rpiErr := cmd.Output(); rpiErr == nil {
rpiTemp := strings.TrimSpace(string(rpiOutput))
if rpiTemp != "" {
// Convert to pseudo-sensors format for compatibility
// Raspberry Pi reports in millidegrees Celsius
return fmt.Sprintf(`{"cpu_thermal-virtual-0":{"temp1":{"temp1_input":%s}}}`, rpiTemp), nil
}
}
return "", fmt.Errorf("sensors returned empty output")
}
return outputStr, nil
}

248
internal/sensors/parser.go Normal file
View File

@@ -0,0 +1,248 @@
package sensors
import (
"encoding/json"
"fmt"
"math"
"strings"
"github.com/rs/zerolog/log"
)
// TemperatureData contains parsed temperature readings from sensors
type TemperatureData struct {
CPUPackage float64 // Overall CPU package temperature
CPUMax float64 // Maximum CPU temperature
Cores map[string]float64 // Per-core temperatures (e.g., "Core 0": 45.0)
NVMe map[string]float64 // NVMe drive temperatures (e.g., "nvme0": 42.0)
GPU map[string]float64 // GPU temperatures (e.g., "amdgpu-pci-0400": 55.0)
Available bool // Whether any temperature data was found
}
// Parse extracts temperature data from sensors -j JSON output
func Parse(jsonStr string) (*TemperatureData, error) {
if strings.TrimSpace(jsonStr) == "" {
return nil, fmt.Errorf("empty sensors output")
}
var sensorsData map[string]interface{}
if err := json.Unmarshal([]byte(jsonStr), &sensorsData); err != nil {
return nil, fmt.Errorf("failed to parse sensors JSON: %w", err)
}
data := &TemperatureData{
Cores: make(map[string]float64),
NVMe: make(map[string]float64),
GPU: make(map[string]float64),
}
foundCPUChip := false
// Parse each sensor chip
for chipName, chipData := range sensorsData {
chipMap, ok := chipData.(map[string]interface{})
if !ok {
continue
}
chipLower := strings.ToLower(chipName)
// Handle CPU temperature sensors
if isCPUChip(chipLower) {
foundCPUChip = true
parseCPUTemps(chipMap, data)
}
// Handle NVMe temperature sensors
if strings.Contains(chipName, "nvme") {
parseNVMeTemps(chipName, chipMap, data)
}
// Handle GPU temperature sensors
if strings.Contains(chipLower, "amdgpu") || strings.Contains(chipLower, "nouveau") {
parseGPUTemps(chipName, chipMap, data)
}
}
// If we got CPU temps, calculate max from cores if package not available
if data.CPUPackage == 0 && len(data.Cores) > 0 {
for _, temp := range data.Cores {
if temp > data.CPUMax {
data.CPUMax = temp
}
}
// Use max core temp as package temp if not available
data.CPUPackage = data.CPUMax
}
data.Available = foundCPUChip || len(data.NVMe) > 0 || len(data.GPU) > 0
log.Debug().
Bool("available", data.Available).
Float64("cpuPackage", data.CPUPackage).
Float64("cpuMax", data.CPUMax).
Int("coreCount", len(data.Cores)).
Int("nvmeCount", len(data.NVMe)).
Int("gpuCount", len(data.GPU)).
Msg("Parsed temperature data")
return data, nil
}
func isCPUChip(chipLower string) bool {
cpuChips := []string{
"coretemp", "k10temp", "zenpower", "k8temp", "acpitz",
"it87", "nct6687", "nct6775", "nct6776", "nct6779",
"nct6791", "nct6792", "nct6793", "nct6795", "nct6796",
"nct6797", "nct6798", "w83627", "f71882",
"cpu_thermal", "rpitemp",
}
for _, chip := range cpuChips {
if strings.Contains(chipLower, chip) {
return true
}
}
return false
}
func parseCPUTemps(chipMap map[string]interface{}, data *TemperatureData) {
foundPackageTemp := false
var chipletTemps []float64
for sensorName, sensorData := range chipMap {
sensorMap, ok := sensorData.(map[string]interface{})
if !ok {
continue
}
sensorNameLower := strings.ToLower(sensorName)
// Look for Package id (Intel) or Tdie/Tctl (AMD)
if strings.Contains(sensorName, "Package id") ||
strings.Contains(sensorName, "Tdie") ||
strings.Contains(sensorNameLower, "tctl") {
if tempVal := extractTempInput(sensorMap); !math.IsNaN(tempVal) {
data.CPUPackage = tempVal
foundPackageTemp = true
if tempVal > data.CPUMax {
data.CPUMax = tempVal
}
}
}
// Look for AMD chiplet temperatures
if strings.HasPrefix(sensorName, "Tccd") {
if tempVal := extractTempInput(sensorMap); !math.IsNaN(tempVal) && tempVal > 0 {
chipletTemps = append(chipletTemps, tempVal)
if tempVal > data.CPUMax {
data.CPUMax = tempVal
}
}
}
// Look for SuperIO chip CPU temperature fields
if strings.Contains(sensorNameLower, "cputin") ||
strings.Contains(sensorNameLower, "cpu temperature") ||
(strings.Contains(sensorNameLower, "temp") && strings.Contains(sensorNameLower, "cpu")) {
if tempVal := extractTempInput(sensorMap); !math.IsNaN(tempVal) && tempVal > 0 {
if !foundPackageTemp {
data.CPUPackage = tempVal
foundPackageTemp = true
}
if tempVal > data.CPUMax {
data.CPUMax = tempVal
}
}
}
// Look for individual core temperatures
if strings.Contains(sensorName, "Core ") {
if tempVal := extractTempInput(sensorMap); !math.IsNaN(tempVal) {
data.Cores[sensorName] = tempVal
if tempVal > data.CPUMax {
data.CPUMax = tempVal
}
}
}
}
// If no package temp but we have chiplet temps, use highest chiplet
if !foundPackageTemp && len(chipletTemps) > 0 {
for _, temp := range chipletTemps {
if temp > data.CPUPackage {
data.CPUPackage = temp
}
}
}
}
func parseNVMeTemps(chipName string, chipMap map[string]interface{}, data *TemperatureData) {
for sensorName, sensorData := range chipMap {
sensorMap, ok := sensorData.(map[string]interface{})
if !ok {
continue
}
// Look for Composite temperature (main NVMe temp)
if strings.Contains(sensorName, "Composite") {
if tempVal := extractTempInput(sensorMap); !math.IsNaN(tempVal) {
data.NVMe[chipName] = tempVal
log.Debug().
Str("chip", chipName).
Float64("temp", tempVal).
Msg("Found NVMe temperature")
}
}
}
}
func parseGPUTemps(chipName string, chipMap map[string]interface{}, data *TemperatureData) {
for sensorName, sensorData := range chipMap {
sensorMap, ok := sensorData.(map[string]interface{})
if !ok {
continue
}
sensorNameLower := strings.ToLower(sensorName)
// Look for GPU temperature fields
if strings.Contains(sensorNameLower, "edge") ||
strings.Contains(sensorNameLower, "junction") ||
strings.Contains(sensorNameLower, "mem") ||
strings.Contains(sensorNameLower, "temp1") {
if tempVal := extractTempInput(sensorMap); !math.IsNaN(tempVal) {
// Use sensor name as key (e.g., "edge", "junction")
key := fmt.Sprintf("%s_%s", chipName, sensorName)
data.GPU[key] = tempVal
log.Debug().
Str("chip", chipName).
Str("sensor", sensorName).
Float64("temp", tempVal).
Msg("Found GPU temperature")
}
}
}
}
func extractTempInput(sensorMap map[string]interface{}) float64 {
// Look for temp*_input field (the actual temperature reading)
for key, value := range sensorMap {
if strings.HasSuffix(key, "_input") {
switch v := value.(type) {
case float64:
return v
case int:
return float64(v)
case string:
// Raspberry Pi reports in millidegrees as string
var milliTemp float64
if _, err := fmt.Sscanf(v, "%f", &milliTemp); err == nil {
// Convert from millidegrees to degrees
return milliTemp / 1000.0
}
}
}
}
return math.NaN()
}