Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-02-18 00:17:39 +01:00

Author	SHA1	Message	Date
rcourtman	ee0e89871d	fix: reduce metrics memory 86x by reverting buffer and adding LTTB downsampling The in-memory metrics buffer was changed from 1000 to 86400 points per metric to support 30-day sparklines, but this pre-allocated ~18 MB per guest (7 slices × 86400 × 32 bytes). With 50 guests that's 920 MB — explaining why users needed to double their LXC memory after upgrading to 5.1.0. - Revert in-memory buffer to 1000 points / 24h retention - Remove eager slice pre-allocation (use append growth instead) - Add LTTB (Largest Triangle Three Buckets) downsampling algorithm - Chart endpoints now use a two-tier strategy: in-memory for ranges ≤ 2h, SQLite persistent store + LTTB for longer ranges - Reduce frontend ring buffer from 86400 to 2000 points Related to #1190	2026-02-04 19:49:52 +00:00
rcourtman	80cdfab536	Update metrics docs with canonical resourceType values - Use canonical types (vm, container, dockerContainer) instead of aliases (guest, docker) in examples - Document that guest/docker aliases are accepted by the API - Clarify persistent store type mapping in data flow doc	2026-02-01 22:26:04 +00:00
rcourtman	ad4acf1222	chore: add frontend utilities and metrics documentation - Add useResizeObserver and useTooltip React hooks - Add utility functions for anomaly colors, error extraction, text width, and threshold colors - Add METRICS_DATA_FLOW.md documentation - Ignore SQLite temp files (.db-shm, .db-wal)	2026-01-22 13:48:41 +00:00
rcourtman	0ca6001bad	docs: update documentation after sensor proxy deprecation Update docs to reflect the simplified temperature monitoring architecture: - Remove references to pulse-sensor-proxy throughout - Update TEMPERATURE_MONITORING.md to focus on unified agent approach - Update CONFIGURATION.md, DEPLOYMENT_MODELS.md, FAQ.md - Remove SECURITY_CHANGELOG.md (proxy-specific security notes) - Clarify current recommended setup in various guides	2026-01-21 12:00:59 +00:00
rcourtman	ee63d438cc	docs: standardize markdown syntax and remove deprecated sensor-proxy docs	2026-01-20 09:43:49 +00:00
rcourtman	3f0808e9f9	docs: comprehensive core and Pro documentation overhaul - Major updates to README.md and docs/README.md for Pulse v5 - Added technical deep-dives for Pulse Pro (docs/PULSE_PRO.md) and AI Patrol (docs/AI.md) - Updated Prometheus metrics documentation and Helm schema for metrics separation - Refreshed security, installation, and deployment documentation for unified agent models - Cleaned up legacy summary files	2026-01-07 17:38:27 +00:00
rcourtman	dcdbee3c5c	feat: Add in-app help system with HelpIcon component Add contextual help icons throughout the UI to improve feature discoverability. Users can click (?) icons to see explanations with examples for settings they might not understand. - HelpIcon component with click-to-open popover - Centralized help content registry in /content/help/ - FeatureTip component for dismissible contextual tips - Help added to: alert delay, AI endpoints, update channel	2026-01-07 09:22:23 +00:00
rcourtman	773376fa5d	docs: add deep dive summaries for notifications, discovery, and agent exec	2026-01-02 11:18:28 +00:00
rcourtman	2b48b0a459	feat: add --kube-include-all-deployments flag for Kubernetes agent Adds IncludeAllDeployments option to show all deployments, not just problem ones (where replicas don't match desired). This provides parity with the existing --kube-include-all-pods flag. - Add IncludeAllDeployments to kubernetesagent.Config - Add --kube-include-all-deployments flag and PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS env var - Update collectDeployments to respect the new flag - Add test for IncludeAllDeployments functionality - Update UNIFIED_AGENT.md documentation Addresses feedback from PR #855	2025-12-18 20:58:30 +00:00
courtmanr@gmail.com	fd39196166	refactor: finalize documentation overhaul - Refactor specialized docs for conciseness and clarity - Rename files to UPPER_CASE.md convention - Verify accuracy against codebase - Fix broken links	2025-11-25 00:45:20 +00:00
rcourtman	6eb1a10d9b	Refactor: Code cleanup and localStorage consolidation This commit includes comprehensive codebase cleanup and refactoring: ## Code Cleanup - Remove dead TypeScript code (types/monitoring.ts - 194 lines duplicate) - Remove unused Go functions (GetClusterNodes, MigratePassword, GetClusterHealthInfo) - Clean up commented-out code blocks across multiple files - Remove unused TypeScript exports (helpTextClass, private tag color helpers) - Delete obsolete test files and components ## localStorage Consolidation - Centralize all storage keys into STORAGE_KEYS constant - Update 5 files to use centralized keys: * utils/apiClient.ts (AUTH, LEGACY_TOKEN) * components/Dashboard/Dashboard.tsx (GUEST_METADATA) * components/Docker/DockerHosts.tsx (DOCKER_METADATA) * App.tsx (PLATFORMS_SEEN) * stores/updates.ts (UPDATES) - Benefits: Single source of truth, prevents typos, better maintainability ## Previous Work Committed - Docker monitoring improvements and disk metrics - Security enhancements and setup fixes - API refactoring and cleanup - Documentation updates - Build system improvements ## Testing - All frontend tests pass (29 tests) - All Go tests pass (15 packages) - Production build successful - Zero breaking changes Total: 186 files changed, 5825 insertions(+), 11602 deletions(-)	2025-11-04 21:50:46 +00:00
rcourtman	e0396c1362	docs: update documentation for diagnostics improvements Add comprehensive operator documentation for the new observability features introduced in the previous commit. New Documentation: - docs/monitoring/PROMETHEUS_METRICS.md - Complete reference for all 18 new Prometheus metrics with alert suggestions Updated Documentation: - docs/API.md - Document X-Request-ID and X-Diagnostics-Cached-At headers, explain diagnostics endpoint caching behavior - docs/TROUBLESHOOTING.md - Add section on correlating API calls with logs using request IDs - docs/operations/ADAPTIVE_POLLING_ROLLOUT.md - Update monitoring checklists with new per-node and scheduler metrics - docs/CONFIGURATION.md - Clarify LOG_FILE dual-output behavior and rotation defaults These updates ensure operators understand: - How to set up monitoring/alerting for new metrics - How to configure file logging with rotation - How to troubleshoot using request correlation - What metrics are available for dashboards Related to: `495e6c794` (feat: comprehensive diagnostics improvements)	2025-10-21 12:45:19 +00:00
rcourtman	2f43d67af9	docs: simplify Mermaid diagrams for better readability The previous diagrams were too complex and overwhelming. Simplified all diagrams to show core concepts clearly: - Adaptive polling: reduced to basic scheduler→queue→workers flow - Temperature proxy: simplified to 3-box trust boundary view - Sensor proxy sequence: simplified to essential request flow - Webhook pipeline: reduced to template→send→retry flow - Script library: simplified to code→test→bundle→dist flow Fixed parsing error in temperature proxy diagram (parentheses in edge label causing render failure). Diagrams should clarify architecture, not recreate implementation.	2025-10-21 10:50:40 +00:00
rcourtman	10d52244f8	docs: remove internal 'Phase 2' reference from adaptive polling docs Replace internal development phase reference with clear description of what the adaptive polling scheduler does. 'Phase 2' is internal jargon that provides no value to users.	2025-10-21 10:45:46 +00:00
rcourtman	85ffe10aed	docs: add Mermaid diagrams to improve visual documentation Enhance documentation with six Mermaid diagrams to better explain complex system implementations: - Adaptive polling lifecycle flowchart showing enqueue→execute→feedback cycle with scheduler, priority queue, and worker interactions - Circuit breaker state machine diagram illustrating Closed↔Open↔Half-open transitions with triggers and recovery paths - Temperature proxy architecture diagram highlighting trust boundaries, security controls, and data flow between host/container/cluster - Sensor proxy request flow sequence diagram showing auth, rate limiting, validation, and SSH execution pipeline - Alert webhook pipeline flowchart detailing template resolution, URL rendering, HTTP dispatch, and retry logic - Script library workflow diagram illustrating dev→test→bundle→distribute lifecycle emphasizing modular design These visualizations make it easier for operators and contributors to understand Pulse's sophisticated architectural patterns.	2025-10-21 10:40:33 +00:00
rcourtman	fd0a4f2b0a	docs: update documentation for v4.24.0 features Updates documentation to reflect features implemented in recent commits: Security & API Enhancements: - Rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, Retry-After) - Audit logging for rollback actions and scheduler health - Runtime logging configuration tracking Scheduler Health API: - Document new v4.24.0 endpoint features - Per-instance circuit breaker status - Dead-letter queue tracking - Staleness metrics - Enhanced response format with backward compatibility Version & Health Endpoints: - Updated /api/version response fields - Optional health endpoint fields - Deployment type and update availability Configuration & Installation: - HTTP config fetch via PULSE_INIT_CONFIG_URL - Updated environment variable documentation - Enhanced FAQ entries Monitoring & Operations: - Adaptive polling architecture documentation - Rollback procedure references - Production deployment guidance All documentation changes align with implemented features from commits: - `656ae0d25` (PMG test fix) - `dec85a4ef` (PBS/PMG stubs + HTTP config) - Earlier commits: scheduler health API, rollback, rate limiting	2025-10-20 16:08:10 +00:00
rcourtman	160adeb3b8	feat: add scheduler health API endpoint (Phase 2 Task 8) Task 8 of 10 complete. Exposes read-only scheduler health data including: - Queue depth and distribution by instance type - Dead-letter queue inspection (top 25 tasks with error details) - Circuit breaker states (instance-level) - Staleness scores per instance New API endpoint: GET /api/monitoring/scheduler/health (requires authentication) New snapshot methods: - StalenessTracker.Snapshot() - exports all staleness data - TaskQueue.Snapshot() - queue depth & per-type distribution - TaskQueue.PeekAll() - dead-letter task inspection - circuitBreaker.State() - exports state, failures, retryAt - Monitor.SchedulerHealth() - aggregates all health data Documentation updated with API spec, field descriptions, and usage examples.	2025-10-20 15:13:38 +00:00
rcourtman	5fbdf6099f	docs: add adaptive polling architecture guide (Phase 2 Task 10) Comprehensive documentation for Phase 2 adaptive polling: - Architecture overview with component diagram - Configuration guide (env vars, defaults, feature flag) - Prometheus metrics reference (7 new metrics) - Circuit breaker & backoff behavior explanation - Dead-letter queue operational guidance - Rollout plan (dev/QA → staged → full) - Troubleshooting guide for common issues Task 10 of 10 complete. Phase 2: 8/10 tasks implemented (80%).	2025-10-20 15:13:37 +00:00

18 Commits