mirror of https://github.com/rcourtman/Pulse.git synced 2026-02-18 00:17:39 +01:00

Files

rcourtman 8ca31003a0 docs: document TLS certificate file permissions for HTTPS setup

Add comprehensive documentation for HTTPS/TLS configuration including:
- File ownership and permission requirements (pulse user)
- Common troubleshooting steps for startup failures
- Complete setup examples for systemd and Docker
- Validation commands for certificate/key verification

Related to discussion #634

2025-11-05 23:08:02 +00:00

20 KiB

Raw Blame History

Pulse Troubleshooting Guide

Common Issues and Solutions

Correlate API Calls with Logs

Every API response includes an X-Request-ID header. When escalating issues, capture that value and use it to search the backend logs or log file. The same identifier is emitted as request_id in structured logs.

# Capture a request ID
curl -i https://pulse.example.com/api/state | grep X-Request-ID

# Search the rotating log file
grep 'request_id=abc123' /var/log/pulse/pulse.log

# Docker / kubectl example
docker logs pulse | grep 'request_id=abc123'

Include the X-Request-ID in support tickets or incident notes so responders can jump straight to the relevant log lines.

Authentication Problems

Forgot Password / Lost Access

Solution: Use the built-in recovery endpoint

Pulse ships with a guarded recovery API that lets you regain access without wiping configuration.

From the Pulse host (localhost only)
Generate a short-lived recovery token or temporarily disable auth:

# Create a 30 minute recovery token (returns JSON with the token value)
curl -s -X POST http://localhost:7655/api/security/recovery \
  -H 'Content-Type: application/json' \
  -d '{"action":"generate_token","duration":30}'

# OR force local-only recovery access (writes .auth_recovery in the data dir)
curl -s -X POST http://localhost:7655/api/security/recovery \
  -H 'Content-Type: application/json' \
  -d '{"action":"disable_auth"}'

If you generated a token, use it from a trusted workstation:

curl -s -X POST https://pulse.example.com/api/security/recovery \
  -H 'Content-Type: application/json' \
  -H 'X-Recovery-Token: YOUR_TOKEN' \
  -d '{"action":"disable_auth"}'

The token is single-use and expires automatically.

Log in and reset credentials using Settings → Security, then re-enable auth:
```
curl -s -X POST http://localhost:7655/api/security/recovery \
  -H 'Content-Type: application/json' \
  -d '{"action":"enable_auth"}'
```
Alternatively, delete /etc/pulse/.auth_recovery (or /data/.auth_recovery for Docker) and restart Pulse.

Only fall back to nuking /etc/pulse if the recovery endpoint is unreachable.

Prevention:

Use a password manager
Store exported configuration backups securely
Generate API tokens for automation instead of sharing passwords

Symptoms: "Invalid username or password" error despite correct credentials

Common causes and solutions:

Truncated bcrypt hash (most common)
- Check hash is exactly 60 characters: echo -n "$PULSE_AUTH_PASS" | wc -c
- Look for error in logs: Bcrypt hash appears truncated!
- Solution: Use full 60-character hash or Quick Security Setup
Docker Compose $ character issue
- Docker Compose interprets $ as variable expansion
- Wrong: PULSE_AUTH_PASS='$2a$12$hash...'
- Right: PULSE_AUTH_PASS='$$2a$$12$$hash...' (escape with $$)
- Alternative: Use a .env file where no escaping is needed
Environment variable not loaded
- Check if variable is set: docker exec pulse env | grep PULSE_AUTH
- Verify quotes around hash: Must use single quotes
- Restart container after changes

Password change fails

Error: exec: "sudo": executable file not found

Solution: Update to v4.3.8+ which removes sudo requirement. For older versions:

# Manually update .env file
docker exec pulse sh -c "echo \"PULSE_AUTH_PASS='new-hash'\" >> /data/.env"
docker restart pulse

Symptoms: Can't access Pulse after upgrade, no credentials work

Solution:

If upgrading from pre-v4.5.0, you need to complete security setup first
Clear browser cache and cookies
Access http://your-ip:7655 to see setup wizard
Complete setup, then restart container

Docker-Specific Issues

No .env file in /data

This is expected behavior when using environment variables. The .env file is only created by:

Quick Security Setup wizard
Password change through UI
Manual creation

If you provide auth via -e flags or docker-compose environment section, no .env is created.

Container won't start

Check logs: docker logs pulse

Common issues:

Port already in use: Change port mapping
Volume permissions: Ensure volume is writable
Invalid environment variables: Check syntax

Port change didn't take effect

Confirm which service name is active
```
systemctl status pulse 2>/dev/null \\
  || systemctl status pulse-backend 2>/dev/null \\
  || systemctl status pulse-hot-dev
```
- Docker: the container port mapping controls the public port (-p host:7655).
- Kubernetes (Helm chart): service is svc/pulse; update service.port in your values file and run helm upgrade.
Verify configuration/environment overrides
```
sudo systemctl show pulse --property=Environment
```
Helm users can run kubectl get svc pulse -n <namespace> -o yaml to confirm the current port.
Check for port conflicts
```
sudo lsof -i :8080
```
Post-change validation
- Restart the service (systemctl restart, docker restart, or helm upgrade).
- v4.24.0 logs these restarts/upgrades in Settings → System → Updates and /api/updates/history; capture the event_id for your change notes.

Installation Issues

Binary not found (v4.3.7)

Error: /opt/pulse/pulse: No such file or directory

Cause: v4.3.7 install script bug

Solution: Update to v4.3.8 or manually fix:

sudo mkdir -p /opt/pulse/bin
sudo mv /opt/pulse/pulse /opt/pulse/bin/pulse
sudo systemctl daemon-reload
sudo systemctl restart pulse

Service name confusion

Pulse uses different service names depending on installation method:

Default systemd install: pulse
Legacy installs (pre-v4.7): pulse-backend
Hot dev environment: pulse-hot-dev
Docker: N/A (container name)

To check which you have:

systemctl status pulse 2>/dev/null \
  || systemctl status pulse-backend 2>/dev/null \
  || systemctl status pulse-hot-dev

Notification Issues

Emails not sending

Check email configuration in Settings → Alerts
Verify SMTP settings and credentials
Check logs for errors: docker logs pulse | grep -i email
Test with a simple webhook first

Webhook not working

Verify URL is accessible from Pulse server
Check for SSL certificate issues
Try a test service like webhook.site
Check logs for response codes (temporarily set LOG_LEVEL=debug via Settings → System → Logging or export LOG_LEVEL=debug and restart; review webhook.delivery entries, then revert to info)

HTTPS/TLS Configuration Issues

Pulse fails to start after enabling HTTPS

Symptoms: Service exits with status code 1, continuous restart attempts, or permission denied errors in logs

Common causes and solutions:

Certificate files not readable by pulse user (most common)
- Pulse runs as the pulse user and needs read access to certificate files
- Check file ownership: ls -l /etc/pulse/*.pem
- Solution:
```
sudo chown pulse:pulse /etc/pulse/cert.pem /etc/pulse/key.pem
sudo chmod 644 /etc/pulse/cert.pem   # Certificate
sudo chmod 600 /etc/pulse/key.pem    # Private key
```
Invalid certificate or key file
- Verify certificate format: openssl x509 -in /etc/pulse/cert.pem -text -noout
- Verify private key: openssl rsa -in /etc/pulse/key.pem -check -noout
- Ensure certificate and key match:
```
openssl x509 -noout -modulus -in /etc/pulse/cert.pem | openssl md5
openssl rsa -noout -modulus -in /etc/pulse/key.pem | openssl md5
# Both should output the same hash
```
File paths incorrect
- Verify paths in environment variables match actual file locations
- Check for typos in TLS_CERT_FILE and TLS_KEY_FILE
- Use absolute paths, not relative paths

Check startup logs

# View recent service logs
journalctl -u pulse -n 50

# Follow logs in real-time during restart
journalctl -u pulse -f

Complete HTTPS setup example:

# 1. Place certificate files
sudo cp mycert.pem /etc/pulse/cert.pem
sudo cp mykey.pem /etc/pulse/key.pem

# 2. Set ownership and permissions
sudo chown pulse:pulse /etc/pulse/cert.pem /etc/pulse/key.pem
sudo chmod 644 /etc/pulse/cert.pem
sudo chmod 600 /etc/pulse/key.pem

# 3. Configure systemd service
sudo systemctl edit pulse
# Add:
# [Service]
# Environment="HTTPS_ENABLED=true"
# Environment="TLS_CERT_FILE=/etc/pulse/cert.pem"
# Environment="TLS_KEY_FILE=/etc/pulse/key.pem"

# 4. Reload and restart
sudo systemctl daemon-reload
sudo systemctl restart pulse

# 5. Verify service started successfully
sudo systemctl status pulse

See Configuration Guide for complete HTTPS setup documentation.

Temperature Monitoring Issues

Temperature data flickers after adding nodes

Symptoms: Dashboard temperatures alternate between values and --, or new nodes never show readings. Proxy logs contain limiter.rejection messages.

Diagnosis:

Confirm you are running a build with commit 46b8b8d or later (defaults are 1 rps, burst 5). Older binaries throttle multi-node clusters aggressively.

Check limiter metrics:

curl -s http://127.0.0.1:9127/metrics \
  | grep -E 'pulse_proxy_limiter_(rejects|penalties)_total'

Any recent increment indicates rate-limit saturation.

Inspect scheduler health for temperature pollers (breaker.state should be closed and deadLetter.present must be false).

Fix: Increase the proxy burst/interval in /etc/pulse-sensor-proxy/config.yaml:

rate_limit:
  per_peer_interval_ms: 500   # medium cluster (≈10 nodes)
  per_peer_burst: 10

Restart pulse-sensor-proxy, verify limiter counters stop increasing, and confirm the dashboard stabilises. Document the change in your operations log.

VM Disk Monitoring Issues

VMs show "-" for disk usage

This is normal and expected - VMs require QEMU Guest Agent to report disk usage.

Quick fix:

Install guest agent in VM: apt install qemu-guest-agent (Linux) or virtio-win tools (Windows)
Enable in Proxmox: VM → Options → QEMU Guest Agent → Enable
Restart the VM
Wait 10 seconds for Pulse to poll again

Detailed troubleshooting:

See VM Disk Monitoring Guide for full setup instructions.

How to diagnose VM disk issues

Step 1: Check if guest agent is running

On Proxmox host:

# Check if agent is enabled in VM config
qm config <VMID> | grep agent

# Test if agent responds
qm agent <VMID> ping

# Get filesystem info (what Pulse uses)
qm agent <VMID> get-fsinfo

Inside the VM:

# Linux
systemctl status qemu-guest-agent

# Windows (PowerShell)
Get-Service QEMU-GA

Step 2: Run diagnostic script

# On Proxmox host
curl -sSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/test-vm-disk.sh | bash

Or if Pulse is installed:

/opt/pulse/scripts/test-vm-disk.sh

Ceph Cluster Data Missing

Symptoms: Ceph pools or health section missing in Storage view even though the cluster uses Ceph.

Checklist:

Confirm the Proxmox node exposes Ceph-backed storage (Datacenter → Storage). Types must be rbd, cephfs, or ceph.
Ensure Pulse has permission to call /cluster/ceph/status (Pulse’s Proxmox account needs Sys.Audit as part of PVEAuditor, provided by the setup script).
Check the backend logs for Ceph status unavailable – preserving previous Ceph state. Intermittent errors are usually network timeouts; steady errors point to permissions.

Run from the Pulse host:

curl -sk https://pve-node:8006/api2/json/cluster/ceph/status \
  -H "Authorization: PVEAPIToken=pulse-monitor@pam!token=<value>"

If this fails, verify firewall / token scope.

Tip: Pulse polls Ceph after storage refresh. If you recently added Ceph storage, wait one poll cycle or restart the backend to force detection.

Backup View Filters Not Working

Symptoms: Backup chart does not highlight the selected time range or the grid ignores the picker.

Checklist:

Make sure you are running Pulse v4.29.0 or newer (the interactive picker was introduced alongside the new timeline). Check Settings → System → About.
Verify your browser is not forcing Legacy mode – if the top-right toggle shows “Lightweight UI”, switch back to default.
When filters appear stuck:
- Click Reset Filters in the toolbar.
- Clear any search chips under the chart.
- Pick a preset (24h / 7d / 30d) to re-seed the view, then move back to Custom.
If the grid still shows stale data, open DevTools console and ensure no errors mentioning chartsSelection appear. Any error here usually means a stale service worker; hard refresh (Ctrl+Shift+R) clears it.

Tip: Selecting bars in the chart cross-highlights matching rows. If that does not happen, confirm you do not have browser extensions that block pointer events on canvas elements.

Container Agent Shows Hosts Offline

Symptoms: /containers tab marks hosts as offline or missing container metrics.

Checklist:

Run the agent manually with verbose logs:
```
sudo /usr/local/bin/pulse-docker-agent --interval 15s --debug
```
Look for HTTP 401 (token mismatch) or socket errors.
Confirm the host sees Docker:
```
sudo docker info | head -n 20
```
Make sure the agent ID is stable. If running inside transient containers, set --agent-id explicitly so Pulse does not treat each restart as a new host.
Verify Pulse shows a recent heartbeat (lastSeen) in /api/state → dockerHosts. Hosts are marked offline after 4× the configured interval with no update.
For reverse proxies/TLS issues, append --insecure temporarily to confirm whether certificate validation is the culprit.

Restart loops: The Containers workspace Issues column lists the last exit codes. Investigate recurring non-zero codes in docker logs <container> and adjust restart policy if needed.

Step 3: Check Pulse logs

# Docker
docker logs pulse | grep -i "guest agent\|fsinfo"

# Systemd
journalctl -u pulse -f | grep -i "guest agent\|fsinfo"

Look for specific error reasons:

agent-not-running - Agent service not started in VM
agent-disabled - Not enabled in VM config
agent-timeout - Agent not responding (may need restart)
permission-denied - Check permissions (see below)
no-filesystems - Agent returned no usable filesystem data

Permission denied errors

If Pulse logs show permission denied when querying guest agent:

Check permissions:

# On Proxmox host
pveum user permissions pulse-monitor@pam

Required permissions:

Proxmox 9: VM.GuestAgent.Audit privilege (Pulse setup adds this via the PulseMonitor role)
Proxmox 8: VM.Monitor privilege (Pulse setup adds this via the PulseMonitor role)
All versions: Sys.Audit is recommended for Ceph metrics and applied when available

Fix permissions:

Re-run the Pulse setup script on the Proxmox node:

curl -sSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/setup-pve.sh | bash

Or manually:

# Shared read-only access
pveum aclmod / -user pulse-monitor@pam -role PVEAuditor

# Extra privileges for guest metrics and Ceph
EXTRA_PRIVS=()

# Sys.Audit (Ceph, cluster status)
if pveum role list 2>/dev/null | grep -q "Sys.Audit"; then
  EXTRA_PRIVS+=(Sys.Audit)
else
  if pveum role add PulseTmpSysAudit -privs Sys.Audit 2>/dev/null; then
    EXTRA_PRIVS+=(Sys.Audit)
    pveum role delete PulseTmpSysAudit 2>/dev/null
  fi
fi

# VM guest agent / monitor privileges
VM_PRIV=""
if pveum role list 2>/dev/null | grep -q "VM.Monitor"; then
  VM_PRIV="VM.Monitor"
elif pveum role list 2>/dev/null | grep -q "VM.GuestAgent.Audit"; then
  VM_PRIV="VM.GuestAgent.Audit"
else
  if pveum role add PulseTmpVMMonitor -privs VM.Monitor 2>/dev/null; then
    VM_PRIV="VM.Monitor"
    pveum role delete PulseTmpVMMonitor 2>/dev/null
  elif pveum role add PulseTmpGuestAudit -privs VM.GuestAgent.Audit 2>/dev/null; then
    VM_PRIV="VM.GuestAgent.Audit"
    pveum role delete PulseTmpGuestAudit 2>/dev/null
  fi
fi

if [ -n "$VM_PRIV" ]; then
  EXTRA_PRIVS+=("$VM_PRIV")
fi

if [ ${#EXTRA_PRIVS[@]} -gt 0 ]; then
  PRIV_STRING="${EXTRA_PRIVS[*]}"
  pveum role delete PulseMonitor 2>/dev/null
  pveum role add PulseMonitor -privs "$PRIV_STRING"
  pveum aclmod / -user pulse-monitor@pam -role PulseMonitor
fi

Important: Both API tokens and passwords work fine for guest agent access. If you see permission errors, it's a permission configuration issue, not an authentication method limitation.

Guest agent installed but no disk data

If agent responds to ping but returns no filesystem info:

Check agent version - Update to latest:

# Linux
apt update && apt install --only-upgrade qemu-guest-agent
systemctl restart qemu-guest-agent

Check filesystem permissions - Agent needs read access to filesystem data
Windows VMs - Ensure VirtIO drivers are up to date from latest virtio-win ISO
Special filesystems only - If VM only has special filesystems (tmpfs, ISO mounts), this is normal for Live systems

Specific VM types

Cloud images:

Most have guest agent pre-installed but disabled
Enable with: systemctl enable --now qemu-guest-agent

Windows VMs:

Must install VirtIO guest tools
Ensure "QEMU Guest Agent" service is running
May need "QEMU Guest Agent VSS Provider" for full functionality

Container-based VMs (Docker/Kubernetes hosts):

Will show high disk usage due to container layers
This is accurate - containers consume real disk space
Consider monitoring container disk separately

Performance Issues

High CPU usage

Polling interval is fixed at 10 seconds (matches Proxmox update cycle)
Check number of monitored nodes
Disable unused features (snapshots, backups monitoring)

High memory usage

Normal for monitoring many nodes
Check metrics retention settings
Restart container to clear any memory leaks

Network Issues

Cannot connect to Proxmox nodes

Verify Proxmox API is accessible:
```
curl -k https://proxmox-ip:8006
```
Check credentials have proper permissions (PVEAuditor minimum)
Verify network connectivity between Pulse and Proxmox
Check for firewall rules blocking port 8006

PBS connection issues

Ensure API token has Datastore.Audit permission
Check PBS is accessible on port 8007
Verify token format: user@realm!tokenid=secret

Update Issues

Updates not showing

Check update channel in Settings → System
Verify internet connectivity
Check GitHub API rate limits
Manual update: Pull latest Docker image or run install script

Update fails to apply

Docker: Pull new image and recreate container Native: Run install script again or check logs

Data Recovery

Lost authentication

See Forgot Password / Lost Access section above.

Recommended approach: Start fresh. Delete your Pulse data and restart.

Corrupt configuration

Restore from backup or delete config files to start fresh:

# Docker
docker exec pulse rm /data/*.json /data/*.enc
docker restart pulse

# Native
sudo rm /etc/pulse/*.json /etc/pulse/*.enc
sudo systemctl restart pulse

Getting Help

Collect diagnostic information

# Version
curl http://localhost:7655/api/version

# Logs (last 100 lines)
docker logs --tail 100 pulse  # Docker
journalctl -u pulse -n 100    # Native

# Environment
docker exec pulse env | grep -E "PULSE|API"  # Docker
systemctl show pulse --property=Environment  # Native

Report issues

When reporting issues, include:

Pulse version
Deployment type (Docker/LXC/Manual)
Error messages from logs
Steps to reproduce
Expected vs actual behavior

Report at: https://github.com/rcourtman/Pulse/issues

20 KiB Raw Blame History Unescape Escape

Pulse Troubleshooting Guide

Common Issues and Solutions

Correlate API Calls with Logs

Authentication Problems

Forgot Password / Lost Access

Cannot login after setting up security

Password change fails

Can't access Pulse - stuck at login

Docker-Specific Issues

No .env file in /data

Container won't start

Port change didn't take effect

Installation Issues

Binary not found (v4.3.7)

Service name confusion

Notification Issues

Emails not sending

Webhook not working

HTTPS/TLS Configuration Issues

Pulse fails to start after enabling HTTPS

Temperature Monitoring Issues

Temperature data flickers after adding nodes

VM Disk Monitoring Issues

VMs show "-" for disk usage

How to diagnose VM disk issues

Ceph Cluster Data Missing

Backup View Filters Not Working

Container Agent Shows Hosts Offline

Permission denied errors

Guest agent installed but no disk data

Specific VM types

Performance Issues

High CPU usage

High memory usage

Network Issues

Cannot connect to Proxmox nodes

PBS connection issues

Update Issues

Updates not showing

Update fails to apply

Data Recovery

Lost authentication

Corrupt configuration

Getting Help

Collect diagnostic information

Report issues

20 KiB

Raw Blame History