Commit Graph

12 Commits

Author SHA1 Message Date
courtmanr@gmail.com
d3d06bb32c Refactor FAQ and API docs to be concise and modern 2025-11-25 00:14:22 +00:00
rcourtman
a1dc451ed4 Document alert reliability features and DLQ API
Add comprehensive documentation for new alert system reliability features:

**API Documentation (docs/API.md):**
- Dead Letter Queue (DLQ) API endpoints
  - GET /api/notifications/dlq - Retrieve failed notifications
  - GET /api/notifications/queue/stats - Queue statistics
  - POST /api/notifications/dlq/retry - Retry DLQ items
  - POST /api/notifications/dlq/delete - Delete DLQ items
- Prometheus metrics endpoint documentation
  - 18 metrics covering alerts, notifications, and queue health
  - Example Prometheus configuration
  - Example PromQL queries for common monitoring scenarios

**Configuration Documentation (docs/CONFIGURATION.md):**
- Alert TTL configuration
  - maxAlertAgeDays, maxAcknowledgedAgeDays, autoAcknowledgeAfterHours
- Flapping detection configuration
  - flappingEnabled, flappingWindowSeconds, flappingThreshold, flappingCooldownMinutes
- Usage examples and common scenarios
- Best practices for preventing notification storms

All new features are fully documented with examples and default values.
2025-11-06 17:34:05 +00:00
rcourtman
becda56897 Fix critical rollback download URL bug and doc inconsistencies
Issues found during systematic audit after #642:

1. CRITICAL BUG - Rollback downloads were completely broken:
   - Code constructed: pulse-linux-amd64 (no version, no .tar.gz)
   - Actual asset name: pulse-v4.26.1-linux-amd64.tar.gz
   - This would cause 404 errors on all rollback attempts
   - Fixed: Construct correct tarball URL with version
   - Added: Extract tarball after download to get binary

2. TEMPERATURE_MONITORING.md referenced non-existent v4.27.0:
   - Changed to use /latest/download/ for future-proof docs

3. API.md example had wrong filename format:
   - Changed pulse-linux-amd64.tar.gz to pulse-v4.30.0-linux-amd64.tar.gz
   - Ensures example matches actual release asset naming

The rollback bug would have affected any user attempting to roll back
to a previous version via the UI or API.
2025-11-06 14:25:32 +00:00
rcourtman
6eb1a10d9b Refactor: Code cleanup and localStorage consolidation
This commit includes comprehensive codebase cleanup and refactoring:

## Code Cleanup
- Remove dead TypeScript code (types/monitoring.ts - 194 lines duplicate)
- Remove unused Go functions (GetClusterNodes, MigratePassword, GetClusterHealthInfo)
- Clean up commented-out code blocks across multiple files
- Remove unused TypeScript exports (helpTextClass, private tag color helpers)
- Delete obsolete test files and components

## localStorage Consolidation
- Centralize all storage keys into STORAGE_KEYS constant
- Update 5 files to use centralized keys:
  * utils/apiClient.ts (AUTH, LEGACY_TOKEN)
  * components/Dashboard/Dashboard.tsx (GUEST_METADATA)
  * components/Docker/DockerHosts.tsx (DOCKER_METADATA)
  * App.tsx (PLATFORMS_SEEN)
  * stores/updates.ts (UPDATES)
- Benefits: Single source of truth, prevents typos, better maintainability

## Previous Work Committed
- Docker monitoring improvements and disk metrics
- Security enhancements and setup fixes
- API refactoring and cleanup
- Documentation updates
- Build system improvements

## Testing
- All frontend tests pass (29 tests)
- All Go tests pass (15 packages)
- Production build successful
- Zero breaking changes

Total: 186 files changed, 5825 insertions(+), 11602 deletions(-)
2025-11-04 21:50:46 +00:00
rcourtman
5a2d808aa1 Harden setup token flow and enforce encrypted persistence 2025-10-25 16:00:37 +00:00
rcourtman
cee24ff7e0 docs: refresh API token scope guidance 2025-10-23 13:44:19 +00:00
rcourtman
e0396c1362 docs: update documentation for diagnostics improvements
Add comprehensive operator documentation for the new observability features
introduced in the previous commit.

**New Documentation:**
- docs/monitoring/PROMETHEUS_METRICS.md - Complete reference for all 18 new
  Prometheus metrics with alert suggestions

**Updated Documentation:**
- docs/API.md - Document X-Request-ID and X-Diagnostics-Cached-At headers,
  explain diagnostics endpoint caching behavior
- docs/TROUBLESHOOTING.md - Add section on correlating API calls with logs
  using request IDs
- docs/operations/ADAPTIVE_POLLING_ROLLOUT.md - Update monitoring checklists
  with new per-node and scheduler metrics
- docs/CONFIGURATION.md - Clarify LOG_FILE dual-output behavior and rotation
  defaults

These updates ensure operators understand:
- How to set up monitoring/alerting for new metrics
- How to configure file logging with rotation
- How to troubleshoot using request correlation
- What metrics are available for dashboards

Related to: 495e6c794 (feat: comprehensive diagnostics improvements)
2025-10-21 12:45:19 +00:00
rcourtman
fd0a4f2b0a docs: update documentation for v4.24.0 features
Updates documentation to reflect features implemented in recent commits:

**Security & API Enhancements:**
- Rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, Retry-After)
- Audit logging for rollback actions and scheduler health
- Runtime logging configuration tracking

**Scheduler Health API:**
- Document new v4.24.0 endpoint features
- Per-instance circuit breaker status
- Dead-letter queue tracking
- Staleness metrics
- Enhanced response format with backward compatibility

**Version & Health Endpoints:**
- Updated /api/version response fields
- Optional health endpoint fields
- Deployment type and update availability

**Configuration & Installation:**
- HTTP config fetch via PULSE_INIT_CONFIG_URL
- Updated environment variable documentation
- Enhanced FAQ entries

**Monitoring & Operations:**
- Adaptive polling architecture documentation
- Rollback procedure references
- Production deployment guidance

All documentation changes align with implemented features from commits:
- 656ae0d25 (PMG test fix)
- dec85a4ef (PBS/PMG stubs + HTTP config)
- Earlier commits: scheduler health API, rollback, rate limiting
2025-10-20 16:08:10 +00:00
rcourtman
3a4fc044ea Add guest agent caching and update doc hints (refs #560) 2025-10-16 08:15:49 +00:00
rcourtman
4838793677 feat: enhance alerts system with tests and improved thresholds
- Add comprehensive test coverage for alerts package with 285+ new tests
- Implement ThresholdsTable component with metric thresholds display
- Enhance Alerts page UI with improved layout and metric filtering
- Add frontend component tests for Alerts page and ThresholdsTable
- Set up Vitest testing infrastructure for SolidJS components
- Improve config persistence with better validation
- Expand discovery tests with 333+ test cases
- Update API, configuration, and Docker monitoring documentation
2025-10-15 22:25:04 +00:00
rcourtman
261bd7ac74 Adopt multi-token auth across docs, UI, and tooling 2025-10-14 15:47:49 +00:00
rcourtman
f46ff1792b Fix settings security tab navigation 2025-10-11 23:29:47 +00:00