Files
Pulse/cmd/eval
rcourtman a04d41ce2c Add end-to-end evaluation framework for AI assistant testing
Implement comprehensive eval framework for testing Pulse Assistant:

Core components:
- Runner: Executes scenarios against live API with SSE stream parsing
- Assertions: Reusable checks (tool usage, content, duration, errors)
- Scenarios: Multi-step test workflows with configurable assertions

Basic scenarios:
- QuickSmokeTest: Minimal functionality verification
- ReadOnlyInfrastructure: List, logs, status operations
- RoutingValidation: Command routing to correct targets
- LogTailing: Bounded log commands complete properly
- Discovery: Infrastructure discovery capabilities

Advanced scenarios:
- TroubleshootingScenario: Multi-step investigation workflow
- DeepDiveScenario: Thorough single-service investigation
- ConfigInspectionScenario: Reading configuration files
- ResourceAnalysisScenario: Cross-container resource comparison
- MultiNodeScenario: Operations across Proxmox nodes
- DockerInDockerScenario: Docker containers inside LXCs
- ContextChainScenario: Context retention across turns

Usage: go test ./internal/ai/eval -live -run TestQuickSmokeTest
2026-01-28 16:49:24 +00:00
..