mirror of
https://github.com/rcourtman/Pulse.git
synced 2026-02-18 00:17:39 +01:00
feat: Add temperature collection to pulse-host-agent (related to #661)
Implements temperature monitoring in pulse-host-agent to support Docker-in-VM deployments where the sensor proxy socket cannot cross VM boundaries. Changes: - Create internal/sensors package with local collection and parsing - Add temperature collection to host agent (Linux only, best-effort) - Support CPU package/core, NVMe, and GPU temperature sensors - Update TEMPERATURE_MONITORING.md with Docker-in-VM setup instructions - Update HOST_AGENT.md to document temperature feature The host agent now automatically collects temperature data on Linux systems with lm-sensors installed. This provides an alternative path for temperature monitoring when running Pulse in a VM, avoiding the unix socket limitation. Temperature collection is best-effort and fails gracefully if lm-sensors is not available, ensuring other metrics continue to be reported. Related to #661
This commit is contained in:
@@ -2,12 +2,30 @@
|
||||
|
||||
The Pulse host agent extends monitoring to standalone servers that do not expose
|
||||
Proxmox or Docker APIs. With it you can surface uptime, OS metadata, CPU load,
|
||||
memory/disk utilisation, and connection health for any Linux, macOS, or Windows
|
||||
machine alongside the rest of your infrastructure. Starting in v4.26.0 the
|
||||
installer handshakes with Pulse in real time so you can confirm registration
|
||||
from the UI and receive host-agent alerts alongside your existing
|
||||
memory/disk utilisation, temperature sensors, and connection health for any Linux,
|
||||
macOS, or Windows machine alongside the rest of your infrastructure. Starting in
|
||||
v4.26.0 the installer handshakes with Pulse in real time so you can confirm
|
||||
registration from the UI and receive host-agent alerts alongside your existing
|
||||
Docker/Proxmox notifications.
|
||||
|
||||
## Temperature Monitoring
|
||||
|
||||
The host agent automatically collects temperature data on Linux systems with lm-sensors installed:
|
||||
|
||||
- **CPU Package Temperature**: Overall CPU temperature
|
||||
- **Per-Core Temperatures**: Individual CPU core readings
|
||||
- **NVMe Drive Temperatures**: SSD thermal data
|
||||
- **GPU Temperatures**: AMD and NVIDIA GPU sensors
|
||||
|
||||
Temperature data appears in the **Servers** tab alongside other host metrics. This is particularly useful for monitoring Proxmox hosts when running Pulse in a VM (where the sensor proxy socket cannot cross VM boundaries).
|
||||
|
||||
**Requirements:**
|
||||
- Linux operating system
|
||||
- lm-sensors package installed (`apt-get install lm-sensors`)
|
||||
- Sensors configured (`sensors-detect --auto`)
|
||||
|
||||
Temperature collection is automatic and best-effort. If lm-sensors is not installed or sensors are unavailable, the agent continues reporting other metrics normally.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Pulse v4.26.0 or newer (host agent reporting shipped with `/api/agents/host/report`)
|
||||
|
||||
@@ -16,14 +16,40 @@ Pulse can display real-time CPU and NVMe temperatures directly in your dashboard
|
||||
|
||||
> **Important:** Temperature monitoring setup differs by deployment type:
|
||||
> - **LXC containers:** Fully automatic via the setup script (Settings → Nodes → Setup Script)
|
||||
> - **Docker containers:** Requires manual proxy installation (see below)
|
||||
> - **Docker containers:** Requires manual proxy installation (see below) OR use pulse-host-agent
|
||||
> - **Docker in VM:** Use pulse-host-agent on the Proxmox host (see [Docker in VM Setup](#docker-in-vm-setup))
|
||||
> - **Native installs:** Direct SSH, no proxy needed
|
||||
>
|
||||
> **For automation (Ansible/Terraform/etc.):** Jump to [Automation-Friendly Installation](#automation-friendly-installation)
|
||||
|
||||
## Docker in VM Setup
|
||||
|
||||
**Running Pulse in Docker inside a VM on Proxmox?** The proxy socket cannot cross VM boundaries, so use pulse-host-agent instead.
|
||||
|
||||
pulse-host-agent runs natively on your Proxmox host and reports temperatures back to Pulse over HTTPS. This works across VM boundaries without requiring socket mounts or SSH configuration.
|
||||
|
||||
**Setup steps:**
|
||||
|
||||
1. Install lm-sensors on your Proxmox host (if not already installed):
|
||||
```bash
|
||||
apt-get update && apt-get install -y lm-sensors
|
||||
sensors-detect --auto
|
||||
```
|
||||
|
||||
2. Install pulse-host-agent on your Proxmox host:
|
||||
```bash
|
||||
# Generate an API token in Pulse (Settings → Security → API Tokens) with host-agent:report scope
|
||||
curl -fsSL http://your-pulse-vm:7655/install-host-agent.sh | \
|
||||
bash -s -- --url http://your-pulse-vm:7655 --token YOUR_API_TOKEN
|
||||
```
|
||||
|
||||
3. Verify temperatures appear in Pulse UI under the Servers tab
|
||||
|
||||
The host agent will report CPU, NVMe, and GPU temperatures alongside other system metrics. No proxy installation or socket mounting needed.
|
||||
|
||||
## Quick Start for Docker Deployments
|
||||
|
||||
**Running Pulse in Docker?** Temperature monitoring requires installing a small service on your Proxmox host that reads hardware sensors. The Pulse container connects to this service through a shared socket.
|
||||
**Running Pulse in Docker directly on Proxmox?** Temperature monitoring requires installing a small service on your Proxmox host that reads hardware sensors. The Pulse container connects to this service through a shared socket.
|
||||
|
||||
**Why this is needed:** Docker containers cannot directly access hardware sensors. The proxy runs on your Proxmox host where it has access to sensor data, then shares that data with the Pulse container through a secure connection.
|
||||
|
||||
|
||||
@@ -13,6 +13,7 @@ import (
|
||||
"time"
|
||||
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/hostmetrics"
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/sensors"
|
||||
agentshost "github.com/rcourtman/pulse-go-rewrite/pkg/agents/host"
|
||||
"github.com/rs/zerolog"
|
||||
gohost "github.com/shirou/gopsutil/v4/host"
|
||||
@@ -220,6 +221,9 @@ func (a *Agent) buildReport(ctx context.Context) (agentshost.Report, error) {
|
||||
return agentshost.Report{}, fmt.Errorf("collect metrics: %w", err)
|
||||
}
|
||||
|
||||
// Collect temperature data (best effort - don't fail if unavailable)
|
||||
sensorData := a.collectTemperatures(collectCtx)
|
||||
|
||||
report := agentshost.Report{
|
||||
Agent: agentshost.AgentInfo{
|
||||
ID: a.agentID,
|
||||
@@ -248,7 +252,7 @@ func (a *Agent) buildReport(ctx context.Context) (agentshost.Report, error) {
|
||||
},
|
||||
Disks: append([]agentshost.Disk(nil), snapshot.Disks...),
|
||||
Network: append([]agentshost.NetworkInterface(nil), snapshot.Network...),
|
||||
Sensors: agentshost.Sensors{},
|
||||
Sensors: sensorData,
|
||||
Tags: append([]string(nil), a.cfg.Tags...),
|
||||
Timestamp: time.Now().UTC(),
|
||||
}
|
||||
@@ -304,3 +308,64 @@ func isLoopback(flags []string) bool {
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// collectTemperatures attempts to collect temperature data from the local system.
|
||||
// Returns an empty Sensors struct if collection fails (best-effort).
|
||||
func (a *Agent) collectTemperatures(ctx context.Context) agentshost.Sensors {
|
||||
// Only collect on Linux for now (lm-sensors is Linux-specific)
|
||||
if a.platform != "linux" {
|
||||
return agentshost.Sensors{}
|
||||
}
|
||||
|
||||
// Collect sensor JSON output
|
||||
jsonOutput, err := sensors.CollectLocal(ctx)
|
||||
if err != nil {
|
||||
a.logger.Debug().Err(err).Msg("Failed to collect sensor data (lm-sensors may not be installed)")
|
||||
return agentshost.Sensors{}
|
||||
}
|
||||
|
||||
// Parse the sensor output
|
||||
tempData, err := sensors.Parse(jsonOutput)
|
||||
if err != nil {
|
||||
a.logger.Debug().Err(err).Msg("Failed to parse sensor data")
|
||||
return agentshost.Sensors{}
|
||||
}
|
||||
|
||||
if !tempData.Available {
|
||||
a.logger.Debug().Msg("No temperature sensors available on this system")
|
||||
return agentshost.Sensors{}
|
||||
}
|
||||
|
||||
// Convert to host agent sensor format
|
||||
result := agentshost.Sensors{
|
||||
TemperatureCelsius: make(map[string]float64),
|
||||
}
|
||||
|
||||
// Add CPU package temperature
|
||||
if tempData.CPUPackage > 0 {
|
||||
result.TemperatureCelsius["cpu_package"] = tempData.CPUPackage
|
||||
}
|
||||
|
||||
// Add individual core temperatures
|
||||
for coreName, temp := range tempData.Cores {
|
||||
// Normalize core name (e.g., "Core 0" -> "cpu_core_0")
|
||||
normalizedName := strings.ToLower(strings.ReplaceAll(coreName, " ", "_"))
|
||||
result.TemperatureCelsius["cpu_"+normalizedName] = temp
|
||||
}
|
||||
|
||||
// Add NVMe temperatures
|
||||
for nvmeName, temp := range tempData.NVMe {
|
||||
result.TemperatureCelsius[nvmeName] = temp
|
||||
}
|
||||
|
||||
// Add GPU temperatures
|
||||
for gpuName, temp := range tempData.GPU {
|
||||
result.TemperatureCelsius[gpuName] = temp
|
||||
}
|
||||
|
||||
a.logger.Debug().
|
||||
Int("temperatureCount", len(result.TemperatureCelsius)).
|
||||
Msg("Collected temperature data")
|
||||
|
||||
return result
|
||||
}
|
||||
|
||||
47
internal/sensors/collector.go
Normal file
47
internal/sensors/collector.go
Normal file
@@ -0,0 +1,47 @@
|
||||
package sensors
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"os/exec"
|
||||
"strings"
|
||||
"time"
|
||||
)
|
||||
|
||||
// CollectLocal reads sensor data from the local machine using lm-sensors.
|
||||
// Returns the raw JSON output from `sensors -j` or an error if sensors is not available.
|
||||
func CollectLocal(ctx context.Context) (string, error) {
|
||||
// Check if sensors command exists
|
||||
if _, err := exec.LookPath("sensors"); err != nil {
|
||||
return "", fmt.Errorf("lm-sensors not installed: %w", err)
|
||||
}
|
||||
|
||||
// Create context with timeout
|
||||
cmdCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
|
||||
defer cancel()
|
||||
|
||||
// Run sensors -j command
|
||||
// sensors exits non-zero when optional subfeatures fail; "|| true" keeps the JSON for parsing
|
||||
cmd := exec.CommandContext(cmdCtx, "sh", "-c", "sensors -j 2>/dev/null || true")
|
||||
output, err := cmd.Output()
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("failed to execute sensors: %w", err)
|
||||
}
|
||||
|
||||
outputStr := strings.TrimSpace(string(output))
|
||||
if outputStr == "" || outputStr == "{}" {
|
||||
// Try Raspberry Pi temperature method as fallback
|
||||
cmd = exec.CommandContext(cmdCtx, "cat", "/sys/class/thermal/thermal_zone0/temp")
|
||||
if rpiOutput, rpiErr := cmd.Output(); rpiErr == nil {
|
||||
rpiTemp := strings.TrimSpace(string(rpiOutput))
|
||||
if rpiTemp != "" {
|
||||
// Convert to pseudo-sensors format for compatibility
|
||||
// Raspberry Pi reports in millidegrees Celsius
|
||||
return fmt.Sprintf(`{"cpu_thermal-virtual-0":{"temp1":{"temp1_input":%s}}}`, rpiTemp), nil
|
||||
}
|
||||
}
|
||||
return "", fmt.Errorf("sensors returned empty output")
|
||||
}
|
||||
|
||||
return outputStr, nil
|
||||
}
|
||||
248
internal/sensors/parser.go
Normal file
248
internal/sensors/parser.go
Normal file
@@ -0,0 +1,248 @@
|
||||
package sensors
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"math"
|
||||
"strings"
|
||||
|
||||
"github.com/rs/zerolog/log"
|
||||
)
|
||||
|
||||
// TemperatureData contains parsed temperature readings from sensors
|
||||
type TemperatureData struct {
|
||||
CPUPackage float64 // Overall CPU package temperature
|
||||
CPUMax float64 // Maximum CPU temperature
|
||||
Cores map[string]float64 // Per-core temperatures (e.g., "Core 0": 45.0)
|
||||
NVMe map[string]float64 // NVMe drive temperatures (e.g., "nvme0": 42.0)
|
||||
GPU map[string]float64 // GPU temperatures (e.g., "amdgpu-pci-0400": 55.0)
|
||||
Available bool // Whether any temperature data was found
|
||||
}
|
||||
|
||||
// Parse extracts temperature data from sensors -j JSON output
|
||||
func Parse(jsonStr string) (*TemperatureData, error) {
|
||||
if strings.TrimSpace(jsonStr) == "" {
|
||||
return nil, fmt.Errorf("empty sensors output")
|
||||
}
|
||||
|
||||
var sensorsData map[string]interface{}
|
||||
if err := json.Unmarshal([]byte(jsonStr), &sensorsData); err != nil {
|
||||
return nil, fmt.Errorf("failed to parse sensors JSON: %w", err)
|
||||
}
|
||||
|
||||
data := &TemperatureData{
|
||||
Cores: make(map[string]float64),
|
||||
NVMe: make(map[string]float64),
|
||||
GPU: make(map[string]float64),
|
||||
}
|
||||
|
||||
foundCPUChip := false
|
||||
|
||||
// Parse each sensor chip
|
||||
for chipName, chipData := range sensorsData {
|
||||
chipMap, ok := chipData.(map[string]interface{})
|
||||
if !ok {
|
||||
continue
|
||||
}
|
||||
|
||||
chipLower := strings.ToLower(chipName)
|
||||
|
||||
// Handle CPU temperature sensors
|
||||
if isCPUChip(chipLower) {
|
||||
foundCPUChip = true
|
||||
parseCPUTemps(chipMap, data)
|
||||
}
|
||||
|
||||
// Handle NVMe temperature sensors
|
||||
if strings.Contains(chipName, "nvme") {
|
||||
parseNVMeTemps(chipName, chipMap, data)
|
||||
}
|
||||
|
||||
// Handle GPU temperature sensors
|
||||
if strings.Contains(chipLower, "amdgpu") || strings.Contains(chipLower, "nouveau") {
|
||||
parseGPUTemps(chipName, chipMap, data)
|
||||
}
|
||||
}
|
||||
|
||||
// If we got CPU temps, calculate max from cores if package not available
|
||||
if data.CPUPackage == 0 && len(data.Cores) > 0 {
|
||||
for _, temp := range data.Cores {
|
||||
if temp > data.CPUMax {
|
||||
data.CPUMax = temp
|
||||
}
|
||||
}
|
||||
// Use max core temp as package temp if not available
|
||||
data.CPUPackage = data.CPUMax
|
||||
}
|
||||
|
||||
data.Available = foundCPUChip || len(data.NVMe) > 0 || len(data.GPU) > 0
|
||||
|
||||
log.Debug().
|
||||
Bool("available", data.Available).
|
||||
Float64("cpuPackage", data.CPUPackage).
|
||||
Float64("cpuMax", data.CPUMax).
|
||||
Int("coreCount", len(data.Cores)).
|
||||
Int("nvmeCount", len(data.NVMe)).
|
||||
Int("gpuCount", len(data.GPU)).
|
||||
Msg("Parsed temperature data")
|
||||
|
||||
return data, nil
|
||||
}
|
||||
|
||||
func isCPUChip(chipLower string) bool {
|
||||
cpuChips := []string{
|
||||
"coretemp", "k10temp", "zenpower", "k8temp", "acpitz",
|
||||
"it87", "nct6687", "nct6775", "nct6776", "nct6779",
|
||||
"nct6791", "nct6792", "nct6793", "nct6795", "nct6796",
|
||||
"nct6797", "nct6798", "w83627", "f71882",
|
||||
"cpu_thermal", "rpitemp",
|
||||
}
|
||||
|
||||
for _, chip := range cpuChips {
|
||||
if strings.Contains(chipLower, chip) {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func parseCPUTemps(chipMap map[string]interface{}, data *TemperatureData) {
|
||||
foundPackageTemp := false
|
||||
var chipletTemps []float64
|
||||
|
||||
for sensorName, sensorData := range chipMap {
|
||||
sensorMap, ok := sensorData.(map[string]interface{})
|
||||
if !ok {
|
||||
continue
|
||||
}
|
||||
|
||||
sensorNameLower := strings.ToLower(sensorName)
|
||||
|
||||
// Look for Package id (Intel) or Tdie/Tctl (AMD)
|
||||
if strings.Contains(sensorName, "Package id") ||
|
||||
strings.Contains(sensorName, "Tdie") ||
|
||||
strings.Contains(sensorNameLower, "tctl") {
|
||||
if tempVal := extractTempInput(sensorMap); !math.IsNaN(tempVal) {
|
||||
data.CPUPackage = tempVal
|
||||
foundPackageTemp = true
|
||||
if tempVal > data.CPUMax {
|
||||
data.CPUMax = tempVal
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Look for AMD chiplet temperatures
|
||||
if strings.HasPrefix(sensorName, "Tccd") {
|
||||
if tempVal := extractTempInput(sensorMap); !math.IsNaN(tempVal) && tempVal > 0 {
|
||||
chipletTemps = append(chipletTemps, tempVal)
|
||||
if tempVal > data.CPUMax {
|
||||
data.CPUMax = tempVal
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Look for SuperIO chip CPU temperature fields
|
||||
if strings.Contains(sensorNameLower, "cputin") ||
|
||||
strings.Contains(sensorNameLower, "cpu temperature") ||
|
||||
(strings.Contains(sensorNameLower, "temp") && strings.Contains(sensorNameLower, "cpu")) {
|
||||
if tempVal := extractTempInput(sensorMap); !math.IsNaN(tempVal) && tempVal > 0 {
|
||||
if !foundPackageTemp {
|
||||
data.CPUPackage = tempVal
|
||||
foundPackageTemp = true
|
||||
}
|
||||
if tempVal > data.CPUMax {
|
||||
data.CPUMax = tempVal
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Look for individual core temperatures
|
||||
if strings.Contains(sensorName, "Core ") {
|
||||
if tempVal := extractTempInput(sensorMap); !math.IsNaN(tempVal) {
|
||||
data.Cores[sensorName] = tempVal
|
||||
if tempVal > data.CPUMax {
|
||||
data.CPUMax = tempVal
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// If no package temp but we have chiplet temps, use highest chiplet
|
||||
if !foundPackageTemp && len(chipletTemps) > 0 {
|
||||
for _, temp := range chipletTemps {
|
||||
if temp > data.CPUPackage {
|
||||
data.CPUPackage = temp
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func parseNVMeTemps(chipName string, chipMap map[string]interface{}, data *TemperatureData) {
|
||||
for sensorName, sensorData := range chipMap {
|
||||
sensorMap, ok := sensorData.(map[string]interface{})
|
||||
if !ok {
|
||||
continue
|
||||
}
|
||||
|
||||
// Look for Composite temperature (main NVMe temp)
|
||||
if strings.Contains(sensorName, "Composite") {
|
||||
if tempVal := extractTempInput(sensorMap); !math.IsNaN(tempVal) {
|
||||
data.NVMe[chipName] = tempVal
|
||||
log.Debug().
|
||||
Str("chip", chipName).
|
||||
Float64("temp", tempVal).
|
||||
Msg("Found NVMe temperature")
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func parseGPUTemps(chipName string, chipMap map[string]interface{}, data *TemperatureData) {
|
||||
for sensorName, sensorData := range chipMap {
|
||||
sensorMap, ok := sensorData.(map[string]interface{})
|
||||
if !ok {
|
||||
continue
|
||||
}
|
||||
|
||||
sensorNameLower := strings.ToLower(sensorName)
|
||||
|
||||
// Look for GPU temperature fields
|
||||
if strings.Contains(sensorNameLower, "edge") ||
|
||||
strings.Contains(sensorNameLower, "junction") ||
|
||||
strings.Contains(sensorNameLower, "mem") ||
|
||||
strings.Contains(sensorNameLower, "temp1") {
|
||||
if tempVal := extractTempInput(sensorMap); !math.IsNaN(tempVal) {
|
||||
// Use sensor name as key (e.g., "edge", "junction")
|
||||
key := fmt.Sprintf("%s_%s", chipName, sensorName)
|
||||
data.GPU[key] = tempVal
|
||||
log.Debug().
|
||||
Str("chip", chipName).
|
||||
Str("sensor", sensorName).
|
||||
Float64("temp", tempVal).
|
||||
Msg("Found GPU temperature")
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func extractTempInput(sensorMap map[string]interface{}) float64 {
|
||||
// Look for temp*_input field (the actual temperature reading)
|
||||
for key, value := range sensorMap {
|
||||
if strings.HasSuffix(key, "_input") {
|
||||
switch v := value.(type) {
|
||||
case float64:
|
||||
return v
|
||||
case int:
|
||||
return float64(v)
|
||||
case string:
|
||||
// Raspberry Pi reports in millidegrees as string
|
||||
var milliTemp float64
|
||||
if _, err := fmt.Sscanf(v, "%f", &milliTemp); err == nil {
|
||||
// Convert from millidegrees to degrees
|
||||
return milliTemp / 1000.0
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return math.NaN()
|
||||
}
|
||||
Reference in New Issue
Block a user