feat: auto-fallback Gemini→Ollama + model warmup on chat open

Dual-provider architecture:
- Both Gemini and Ollama initialize at startup (if configured)
- Primary (Gemini) tried first for every request
- On any error (429, 503, timeout), automatically falls back to Ollama
- No manual switching needed — completely transparent to the user
- Log shows: "Primary failed (gemini: ...), falling back to ollama: ..."

Warmup:
- POST /api/chat/warmup called silently when chat panel opens
- Pre-loads Ollama model in background (10-15s) while user reads welcome
- By the time user types, model is ready for instant response
- Warms up fallback provider specifically (Gemini doesn't need it)

Timeout:
- Agent context increased to 60s (Ollama first response can be slow)
- Each request creates a fresh session (stateless for fallback compat)
This commit is contained in:
juanatsap
2026-04-08 14:57:38 +01:00
parent 8205a22972
commit 160be31b31
3 changed files with 142 additions and 92 deletions
+1
View File
@@ -23,6 +23,7 @@ func Setup(cvHandler *handlers.CVHandler, healthHandler *handlers.HealthHandler,
// Chat endpoint with rate limiting (30 requests/hour per IP)
chatRateLimiter := middleware.NewRateLimiter(c.RateLimitChatRequests, c.RateLimitChatWindow)
mux.Handle("/api/chat", chatRateLimiter.Middleware(http.HandlerFunc(chatHandler.HandleChat)))
mux.HandleFunc("/api/chat/warmup", chatHandler.HandleWarmup) // Pre-load model on chat open
// Public routes
mux.HandleFunc("/", cvHandler.Home)