Pulse

gitea-mirror/Pulse

Fork 0

mirror of https://github.com/rcourtman/Pulse.git synced 2026-02-18 00:17:39 +01:00

Commit Graph

Author	SHA1	Message	Date
rcourtman	17208cbf9d	docs: update AI evaluation matrix and approval workflow documentation	2026-01-30 19:00:40 +00:00
rcourtman	0e880f3c89	feat(eval): improve patrol eval with polling-based completion Refactor patrol eval runner to use a dual approach: 1. Poll GET /api/ai/patrol/status until Running=false (primary signal) 2. Best-effort SSE stream connection for tool event visibility Changes: - Add status polling loop with configurable timeout - Make SSE stream optional (may not connect in time) - Add Completed flag to PatrolRunResult - Improve assertion error messages - Add new scenarios and assertions This is more reliable than relying solely on SSE stream which may timeout waiting for headers during slow patrol initialization.	2026-01-29 08:20:39 +00:00
rcourtman	44fecc37c0	feat(eval): enhance AI eval harness with retries and reporting - Add retry logic for transient failures (phantom, stream, empty response) - Add environment variable overrides for infrastructure naming - Add JSON report output per scenario - Expand assertions with new validation types - Add more comprehensive test scenarios - Add docs/EVAL.md with usage documentation The eval harness now better handles flaky AI responses and provides detailed reports for debugging.	2026-01-28 21:24:12 +00:00

Author

SHA1

Message

Date

rcourtman

17208cbf9d

docs: update AI evaluation matrix and approval workflow documentation

2026-01-30 19:00:40 +00:00

rcourtman

0e880f3c89

feat(eval): improve patrol eval with polling-based completion

Refactor patrol eval runner to use a dual approach:
1. Poll GET /api/ai/patrol/status until Running=false (primary signal)
2. Best-effort SSE stream connection for tool event visibility

Changes:
- Add status polling loop with configurable timeout
- Make SSE stream optional (may not connect in time)
- Add Completed flag to PatrolRunResult
- Improve assertion error messages
- Add new scenarios and assertions

This is more reliable than relying solely on SSE stream which
may timeout waiting for headers during slow patrol initialization.

2026-01-29 08:20:39 +00:00

rcourtman

44fecc37c0

feat(eval): enhance AI eval harness with retries and reporting

- Add retry logic for transient failures (phantom, stream, empty response)
- Add environment variable overrides for infrastructure naming
- Add JSON report output per scenario
- Expand assertions with new validation types
- Add more comprehensive test scenarios
- Add docs/EVAL.md with usage documentation

The eval harness now better handles flaky AI responses and provides
detailed reports for debugging.

2026-01-28 21:24:12 +00:00

3 Commits