Files
Pulse/docs/operations/sensor-proxy-config-management.md

13 KiB

Sensor Proxy Configuration Management

This guide covers safe configuration management for pulse-sensor-proxy, including the new CLI tools introduced in v4.31.1+ to prevent config corruption.

Overview

Starting with v4.31.1, pulse-sensor-proxy uses a two-file configuration system:

  1. Main config: /etc/pulse-sensor-proxy/config.yaml - Contains all settings except allowed nodes
  2. Allowed nodes: /etc/pulse-sensor-proxy/allowed_nodes.yaml - Separate file for the authorized node list

This separation prevents corruption from concurrent updates by the installer, control-plane sync, and self-heal timer.

Architecture

Why Two Files?

Earlier versions stored allowed_nodes: inline in config.yaml, causing corruption when:

  • The installer updated node lists
  • The self-heal timer ran (every 5 minutes)
  • Control-plane sync modified the list
  • Version detection had edge cases

Multiple code paths (shell, Python, Go) would race to update the same YAML file, creating duplicate allowed_nodes: keys that broke YAML parsing.

New System (v4.31.1+)

Phase 1 (Migration):

  • Force file-based mode exclusively
  • Installer migrates inline blocks to allowed_nodes.yaml
  • Self-heal timer includes corruption detection and repair

Phase 2 (Atomic Operations):

  • Go CLI replaces all shell/Python config manipulation
  • File locking prevents concurrent writes
  • Atomic writes (temp file + rename) ensure consistency
  • systemd validation prevents startup with corrupt config

Configuration CLI Reference

Validate Configuration

Check config files for errors before restarting the service:

# Validate both config.yaml and allowed_nodes.yaml
pulse-sensor-proxy config validate

# Validate specific config file
pulse-sensor-proxy config validate --config /path/to/config.yaml

# Validate specific allowed_nodes file
pulse-sensor-proxy config validate --allowed-nodes /path/to/allowed_nodes.yaml

Exit codes:

  • 0 = valid
  • Non-zero = validation failed (check stderr for details)

Common validation errors:

  • "duplicate allowed_nodes blocks" - Run migration (see below)
  • "failed to parse YAML" - Syntax error in config file
  • "read_timeout must be positive" - Invalid timeout value

Manage Allowed Nodes

The CLI provides two modes:

Merge mode (default): Adds nodes to existing list

# Add single node
pulse-sensor-proxy config set-allowed-nodes --merge 192.168.0.10

# Add multiple nodes
pulse-sensor-proxy config set-allowed-nodes \
  --merge 192.168.0.1 \
  --merge 192.168.0.2 \
  --merge node1.local

Replace mode: Overwrites entire list

# Replace with new list
pulse-sensor-proxy config set-allowed-nodes --replace \
  --merge 192.168.0.1 \
  --merge 192.168.0.2

# Clear the list (empty is valid for IPC-only clusters)
pulse-sensor-proxy config set-allowed-nodes --replace

Custom paths:

# Use non-default path
pulse-sensor-proxy config set-allowed-nodes \
  --allowed-nodes /custom/path.yaml \
  --merge 192.168.0.10

How It Works

  1. File locking: Uses flock(LOCK_EX) on separate .lock file
  2. Atomic writes: Writes to temp file, syncs, then renames
  3. Deduplication: Automatically removes duplicate entries
  4. Normalization: Trims whitespace, sorts entries
  5. Empty lists allowed: Useful for security lockdown or IPC-based discovery

Common Tasks

Adding Nodes After Cluster Expansion

When you add a new node to your Proxmox cluster:

# Add the new node to allowed list
pulse-sensor-proxy config set-allowed-nodes --merge new-node.local

# Validate config
pulse-sensor-proxy config validate

# Restart proxy to apply
sudo systemctl restart pulse-sensor-proxy

# Verify in Pulse UI
# Check Settings → Diagnostics → Temperature Proxy

Removing Decommissioned Nodes

When removing a node from your cluster:

# Get current list
cat /etc/pulse-sensor-proxy/allowed_nodes.yaml

# Replace with updated list (without old node)
pulse-sensor-proxy config set-allowed-nodes --replace \
  --merge 192.168.0.1 \
  --merge 192.168.0.2
  # (omit the decommissioned node)

# Validate and restart
pulse-sensor-proxy config validate
sudo systemctl restart pulse-sensor-proxy

Note: The proxy cleanup system automatically removes SSH keys from deleted nodes. See temperature monitoring docs for details.

Migrating from Inline Config

If you're running an older version with inline allowed_nodes: in config.yaml:

# Upgrade to latest version (auto-migrates)
curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/install-sensor-proxy.sh | \
  sudo bash -s -- --standalone --pulse-server http://your-pulse:7655

# Verify migration
pulse-sensor-proxy config validate

# Check that allowed_nodes only appears in allowed_nodes.yaml
grep -n "allowed_nodes:" /etc/pulse-sensor-proxy/*.yaml
# Should show: allowed_nodes.yaml:3:allowed_nodes:
# Should NOT show duplicate entries in config.yaml

Changing Other Config Settings

For settings in config.yaml (not allowed_nodes):

# Stop the service first
sudo systemctl stop pulse-sensor-proxy

# Edit config.yaml manually
sudo nano /etc/pulse-sensor-proxy/config.yaml

# Validate before starting
pulse-sensor-proxy config validate

# Start service
sudo systemctl start pulse-sensor-proxy

# Check for errors
sudo systemctl status pulse-sensor-proxy
journalctl -u pulse-sensor-proxy -n 50

Safe to edit in config.yaml:

  • allowed_source_subnets
  • allowed_peers (UID/GID permissions)
  • rate_limit settings
  • metrics_address
  • http_* settings (HTTPS mode)
  • pulse_control_plane block

Never edit manually:

  • allowed_nodes: (use CLI instead, or it will be in allowed_nodes.yaml anyway)
  • Lock files (.lock)

Troubleshooting

Config Validation Fails

Symptom: pulse-sensor-proxy config validate returns error

Diagnosis:

# Run validation with full output
pulse-sensor-proxy config validate 2>&1

# Check for duplicate blocks
grep -n "allowed_nodes:" /etc/pulse-sensor-proxy/config.yaml

# Check YAML syntax
python3 -c "import yaml; yaml.safe_load(open('/etc/pulse-sensor-proxy/config.yaml'))"

Common fixes:

  • Duplicate blocks: Run migration (upgrade to v4.31.1+)
  • YAML syntax errors: Fix indentation, remove tabs, check colons
  • Missing required fields: Add read_timeout, write_timeout

Service Won't Start After Config Change

Diagnosis:

# Check systemd logs
journalctl -u pulse-sensor-proxy -n 100

# Look for validation errors
journalctl -u pulse-sensor-proxy | grep -i "validation\|corrupt\|duplicate"

# Try starting in foreground for better errors
sudo -u pulse-sensor-proxy /opt/pulse/sensor-proxy/bin/pulse-sensor-proxy  # legacy installs: /usr/local/bin/pulse-sensor-proxy

Fix:

# Validate config first
pulse-sensor-proxy config validate

# If validation passes but service fails, check permissions
ls -la /etc/pulse-sensor-proxy/
ls -la /var/lib/pulse-sensor-proxy/

# Ensure proxy user owns files
sudo chown -R pulse-sensor-proxy:pulse-sensor-proxy /etc/pulse-sensor-proxy/
sudo chown -R pulse-sensor-proxy:pulse-sensor-proxy /var/lib/pulse-sensor-proxy/

Lock File Errors

Symptom: failed to acquire file lock or failed to open lock file

Cause: Lock file has wrong permissions or process holds stale lock

Fix:

# Check lock file permissions (should be 0600)
ls -la /etc/pulse-sensor-proxy/*.lock

# Fix permissions
sudo chmod 0600 /etc/pulse-sensor-proxy/*.lock
sudo chown pulse-sensor-proxy:pulse-sensor-proxy /etc/pulse-sensor-proxy/*.lock

# If stale lock, identify holder
sudo lsof /etc/pulse-sensor-proxy/allowed_nodes.yaml.lock

# Kill stale process if needed (use with caution)
sudo kill <PID>

Prevention: Locks are automatically released when process exits. Don't manually delete lock files.

Allowed Nodes List is Empty

Symptom: allowed_nodes.yaml exists but has no entries

Is this a problem? Not necessarily:

  • Empty list is valid for clusters using IPC discovery (pvecm status)
  • Control-plane mode populates the list automatically
  • Standalone nodes require manual node entries

To populate manually:

# Add your cluster nodes
pulse-sensor-proxy config set-allowed-nodes --replace \
  --merge 192.168.0.1 \
  --merge 192.168.0.2 \
  --merge 192.168.0.3

# Verify
cat /etc/pulse-sensor-proxy/allowed_nodes.yaml

Best Practices

General Guidelines

  1. Always validate before restarting:

    pulse-sensor-proxy config validate && sudo systemctl restart pulse-sensor-proxy
    
  2. Use the CLI for allowed_nodes changes:

    • Don't edit allowed_nodes.yaml manually
    • Use config set-allowed-nodes instead
  3. Stop service before editing config.yaml:

    • Prevents race conditions with running process
    • systemd validation will catch errors on startup
  4. Back up config before major changes:

    sudo cp /etc/pulse-sensor-proxy/config.yaml /etc/pulse-sensor-proxy/config.yaml.backup
    sudo cp /etc/pulse-sensor-proxy/allowed_nodes.yaml /etc/pulse-sensor-proxy/allowed_nodes.yaml.backup
    
  5. Monitor after changes:

    journalctl -u pulse-sensor-proxy -f
    # Check Pulse UI: Settings → Diagnostics → Temperature Proxy
    

Automation Scripts

When scripting config changes:

#!/bin/bash
set -euo pipefail

# Function to safely update allowed nodes
update_allowed_nodes() {
    local nodes=("$@")

    # Build command
    local cmd="pulse-sensor-proxy config set-allowed-nodes --replace"
    for node in "${nodes[@]}"; do
        cmd="$cmd --merge $node"
    done

    # Execute with validation
    if eval "$cmd"; then
        echo "Allowed nodes updated successfully"
    else
        echo "Failed to update allowed nodes" >&2
        return 1
    fi

    # Validate
    if ! pulse-sensor-proxy config validate; then
        echo "Config validation failed after update" >&2
        return 1
    fi

    # Restart service
    if sudo systemctl restart pulse-sensor-proxy; then
        echo "Service restarted successfully"
    else
        echo "Service restart failed" >&2
        return 1
    fi

    # Wait for service to be active
    sleep 2
    if systemctl is-active --quiet pulse-sensor-proxy; then
        echo "Service is running"
    else
        echo "Service failed to start" >&2
        journalctl -u pulse-sensor-proxy -n 20
        return 1
    fi
}

# Example usage
update_allowed_nodes "192.168.0.1" "192.168.0.2" "node3.local"

Monitoring Config Health

Add to your monitoring system:

# Check for config corruption (should return 0)
pulse-sensor-proxy config validate
echo $?

# Check for duplicate blocks (should be empty)
grep "allowed_nodes:" /etc/pulse-sensor-proxy/config.yaml | wc -l

# Check lock file permissions (should be 0600)
stat -c "%a" /etc/pulse-sensor-proxy/*.lock

# Check service is running
systemctl is-active pulse-sensor-proxy

Migration Path

Upgrading from Pre-v4.31.1

Automatic migration (recommended):

# Simply reinstall - migration runs automatically
curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/install-sensor-proxy.sh | \
  sudo bash -s -- --standalone --pulse-server http://your-pulse:7655

# Verify
pulse-sensor-proxy config validate
sudo systemctl status pulse-sensor-proxy

Manual migration (if needed):

# 1. Stop service
sudo systemctl stop pulse-sensor-proxy

# 2. Extract allowed_nodes from config.yaml
grep -A 100 "^allowed_nodes:" /etc/pulse-sensor-proxy/config.yaml > /tmp/nodes.txt

# 3. Parse and add to allowed_nodes.yaml
# (Example for simple list - adjust for your format)
pulse-sensor-proxy config set-allowed-nodes --replace \
  --merge node1.local \
  --merge node2.local

# 4. Remove allowed_nodes from config.yaml
# Edit manually or use sed:
sudo sed -i '/^allowed_nodes:/,/^[a-z_]/d' /etc/pulse-sensor-proxy/config.yaml

# 5. Add reference to allowed_nodes.yaml
echo "allowed_nodes_file: /etc/pulse-sensor-proxy/allowed_nodes.yaml" | \
  sudo tee -a /etc/pulse-sensor-proxy/config.yaml

# 6. Validate
pulse-sensor-proxy config validate

# 7. Start service
sudo systemctl start pulse-sensor-proxy

Support

If config management issues persist after following this guide:

  1. Collect diagnostics:

    pulse-sensor-proxy config validate 2>&1 > /tmp/validate.log
    sudo systemctl status pulse-sensor-proxy > /tmp/status.log
    journalctl -u pulse-sensor-proxy -n 200 > /tmp/journal.log
    grep -n "allowed_nodes:" /etc/pulse-sensor-proxy/*.yaml > /tmp/grep.log
    
  2. File an issue at https://github.com/rcourtman/Pulse/issues

  3. Include:

    • Pulse version
    • Sensor proxy version (pulse-sensor-proxy --version)
    • Output from diagnostic commands above
    • Steps that led to the issue