- Discovery: classify transient errors (429, timeout, connection refused, etc.)
and return IsError:true so models stop retrying rate-limited calls
- Agentic loop: detect identical tool calls repeated >3 times and block with
LOOP_DETECTED error, forcing the model to try a different approach
- OpenAI provider: skip tool_choice for DeepSeek Reasoner which doesn't support it
- Read-only classifier: fix curl -I case sensitivity (uppercase flags lowered),
add iostat/vmstat/mpstat/sar/lxc-ls/lxc-info/nc -z to allowlist,
fix 2>&1 false positive in input redirect detection
The previous reconciliation logic (issue #1052) used per-dataset statfs
values for Total and Used. On Proxmox systems, statfs on a mounted
dataset (e.g. rpool/ROOT/pve-1) only reports that dataset's own usage,
completely missing zvols (VM disk images) and other datasets. This caused
storage bars to show ~0% usage (a few GB of OS files) when the pool
actually had terabytes of VM data allocated.
Fix: derive usable pool capacity from the ratio of dataset Free (usable
pool-available from statfs) to zpool Free (raw pool-available from zpool
list). This ratio converts raw zpool Size to usable total, and Used is
computed as Total - Free. This captures all pool consumers including
zvols, handles RAIDZ parity overhead and mirrors uniformly, and produces
correct usage percentages.
Verified with tests for RAIDZ, mirrors, and both with zvols present.
Fix URL path matching order in deriveTabFromPath() where clicking
'System Logs' would incorrectly navigate to 'General'. The generic
/settings/system check was matching before /settings/system-logs
because the latter contains the former as a substring.
Moved specific system-* path checks before the generic fallback.
Refactor patrol eval runner to use a dual approach:
1. Poll GET /api/ai/patrol/status until Running=false (primary signal)
2. Best-effort SSE stream connection for tool event visibility
Changes:
- Add status polling loop with configurable timeout
- Make SSE stream optional (may not connect in time)
- Add Completed flag to PatrolRunResult
- Improve assertion error messages
- Add new scenarios and assertions
This is more reliable than relying solely on SSE stream which
may timeout waiting for headers during slow patrol initialization.
Send an SSE comment immediately when a client connects to the patrol
stream endpoint. This flushes HTTP headers so clients receive the
200 response right away, rather than blocking until the first event.
This fixes eval tests where the stream connection would time out
waiting for headers while patrol was still initializing.
When pruning older messages to fit context limits, we may cut off
a user message that preceded an assistant message with tool calls.
This leaves an orphaned tool call sequence at the start.
Extend pruneMessagesForModel to:
- Skip leading assistant messages with tool calls
- Also skip their following tool results
- Ensures clean message sequence for all providers
Gemini requires that model messages with function calls must be
immediately followed by user messages with function responses.
When message pruning or errors leave orphaned function calls,
Gemini rejects the request.
Add sanitizeGeminiContents() to:
- Strip orphaned function calls (keeping text content)
- Remove orphaned function responses without preceding calls
- Log when sanitization occurs for debugging
Add comprehensive patrol evaluation framework:
- patrol.go: Runner for patrol scenarios with streaming support
- patrol_assertions.go: Assertions for tool usage, findings, timing
- patrol_scenarios.go: Scenarios for basic, investigation, finding quality
- eval_test.go: Unit tests for patrol eval runner
Scenarios:
- patrol-basic: Verifies patrol completes with tools and findings
- patrol-investigation: Ensures investigation before reporting
- patrol-finding-quality: Validates finding structure and evidence
Run with: go run ./cmd/eval -scenario patrol
- Replace output-parsing approach with tool-based finding creation
- PatrolService now uses runAIAnalysis with proper scope handling
- Add tool event streaming (tool_start, tool_end) to patrol events
- Expose GetExecutor() on chat.Service for patrol integration
- Remove regex-based finding extraction in favor of patrol tools
The patrol now uses the same agentic loop as chat, with the LLM calling
patrol_report_finding to create findings rather than outputting JSON
that gets parsed. This is more reliable and consistent with the tool model.
Add three new patrol tools that enable the LLM to create findings via
tool calls instead of relying on output parsing:
- patrol_report_finding: Create a structured finding with validation
- patrol_resolve_finding: Mark a finding as resolved
- patrol_get_findings: Query active findings for a resource
These tools are only functional during a patrol run when PatrolFindingCreator
is set on the executor. This approach is more reliable than parsing
JSON from LLM output.
Remove files that were consolidated into other modules:
- chat/patrol.go, patrol_test.go → moved to chat/service.go
- tools_infrastructure.go → merged into tools_storage.go
- tools_intelligence.go → merged into tools_metrics.go
- tools_patrol.go → merged into tools_alerts.go
- tools_profiles.go, tools_profiles_test.go → removed (unused)
Update related test file references.
- Remove hardcoded line numbers from enforcement references
- Update tool classification table with all current tools
- Reflect consolidated tool structure
- Update patrol.go to use chat service for AI execution
- Update service.go with chat service provider integration
- Add patrol streaming endpoint to router
- Add retry logic for transient failures (phantom, stream, empty response)
- Add environment variable overrides for infrastructure naming
- Add JSON report output per scenario
- Expand assertions with new validation types
- Add more comprehensive test scenarios
- Add docs/EVAL.md with usage documentation
The eval harness now better handles flaky AI responses and provides
detailed reports for debugging.
- Add ExecutePatrolStream method to chat.Service for patrol-specific execution
- Create chat_service_adapter.go to bridge chat.Service to ai.ChatServiceProvider
- Remove standalone patrol.go and patrol_test.go from chat package
- Add PatrolRequest/PatrolResponse types to chat service
- Add context injection for recent message context
This allows patrol to use an isolated agentic loop with its own system prompt
while leveraging the common chat infrastructure.
- Merge tools_infrastructure.go, tools_intelligence.go, tools_patrol.go,
tools_profiles.go into their respective domain tools
- Expand tools_control.go with command execution logic
- Expand tools_discovery.go with resource discovery handlers
- Expand tools_storage.go with storage-related operations
- Expand tools_metrics.go with metrics functionality
- Update tests to match new structure
This consolidation reduces file count and groups related functionality together.
Adds zerolog debug statements throughout the ZFS collection pipeline
(collector.go and zfs.go) to trace partition discovery, dataset
collection, zpool stats fetching, and pool summarization. This will
help diagnose issues like empty storage bars on mirror-vdev pools.
These memos and helpers are prepared for the patrol run detail
panel but not yet wired up. Commenting out to fix TypeScript
strict unused variable checks.
AI Chat Improvements:
- MentionAutocomplete for @-mentioning resources
- Better tool execution display
- Enhanced chat interface
New Components:
- FindingsPanel for AI findings display
- DiscoveryTab for infrastructure discovery
- PatrolActivitySection for patrol monitoring
- StorageConfigPanel for storage management
API Updates:
- Discovery API integration
- Enhanced AI chat API
- Patrol API improvements
- Monitoring API updates
UI/UX:
- Better AI status indicator
- Improved investigation drawer
- Enhanced settings page
- Better guest drawer integration
Types:
- New discovery types
- Enhanced AI types
- API type improvements
Removed deprecated UnifiedFindingsPanel in favor of new FindingsPanel.
The aidiscovery package has been superseded by the consolidated
tools approach in internal/ai/tools/. Discovery functionality is
now handled through:
- pulse_query tool for resource search and discovery
- pulse_discovery tool for infrastructure scanning
- Better integration with the main AI chat pipeline
Removing:
- commands.go and related tests
- deep_scanner.go and tests
- formatters.go and tests
- service.go and tests
- store.go and tests
- tools_adapter.go
- types.go and tests
Provider updates across all supported backends:
- Anthropic: Better tool call handling
- OpenAI: Improved response parsing
- Gemini: Enhanced compatibility
- Ollama: Local model support improvements
Includes test updates for OpenAI provider.
Implement ResolvedContext to track pinned resources during chat sessions:
- ResolvedTarget captures resource ID, type, node, and provenance info
- Provenance tracking records how targets were resolved (user mention,
tool result, or implicit context)
- Session maintains pinned targets that persist across conversation turns
Add routing contract tests to verify:
- Commands routed to correct container vs host targets
- Provenance properly recorded for different resolution methods
- Context maintained across multi-turn conversations
This provides audit trail for which resources were accessed and how
they were identified, supporting safety verification and debugging.
Implement a state machine that enforces structural safety guarantees:
- RESOLVING: Initial state, must discover resources before writing
- READING: Read tools allowed after discovery
- WRITING: Transitions to VERIFYING after any write operation
- VERIFYING: Must perform read verification before next write
This prevents:
- Write operations without resource discovery
- Consecutive writes without verification
- Final answers without post-write verification
The FSM is enforced at the tool execution layer, providing defense-in-depth
that doesn't rely on prompt instructions alone.
- #1163: Add node badges to storage resources in threshold tables
(ResourceTable.tsx, ResourceCard.tsx)
- #1162: Fix PBS backup alerts showing datastore as node name
(alerts.go - use "Unknown" for orphaned backups)
- #1153: Fix memory leaks in tracking maps
- Add max 48 sample limit for pmgQuarantineHistory
- Add max 10 entry limit for flappingHistory
- Add cleanup for dockerUpdateFirstSeen
- Add cleanupTrackingMaps() for auth, polling, and circuit breaker maps
Note: #1149 fix (chat sessions null check) is in AISettings.tsx
which has other pending changes - will be committed separately.
- Fixed --disable-docker not being passed to systemd service file. Related to #1151
- Added init: true requirement to HTTPS/TLS docs for Docker. Related to #1166
Remove API functions that are defined but never called:
- ai.ts: OAuth flow, execute/executeStream, chat session sync
- charts.ts: getStorageCharts, getMetricsStoreStats
- notifications.ts: queue/DLQ management, health check
- updates.ts: update history functions
Also removes unused type definitions (MetricsStoreStats, UpdateHistoryEntry)
- Remove deprecated config.ModelInfo type (use providers.ModelInfo)
- Remove deprecated GetAvailableModels function (always returned nil)
- Remove associated test
- Update AISettingsResponse to use providers.ModelInfo
- Remove unused animations.css (all classes were unused)
- Replace console.log with logger in UnifiedHistoryChart
- Remove deprecated isEnterprise export from license store