feat: auto-fallback Gemini→Ollama + model warmup on chat open

Dual-provider architecture: - Both Gemini and Ollama initialize at startup (if configured) - Primary (Gemini) tried first for every request - On any error (429, 503, timeout), automatically falls back to Ollama - No manual switching needed — completely transparent to the user - Log shows: "Primary failed (gemini: ...), falling back to ollama: ..." Warmup: - POST /api/chat/warmup called silently when chat panel opens - Pre-loads Ollama model in background (10-15s) while user reads welcome - By the time user types, model is ready for instant response - Warms up fallback provider specifically (Gemini doesn't need it) Timeout: - Agent context increased to 60s (Ollama first response can be slow) - Each request creates a fresh session (stateless for fallback compat)
2026-04-08 14:57:38 +01:00
parent 8205a22972
commit 160be31b31
3 changed files with 142 additions and 92 deletions
@@ -23,6 +23,7 @@ func Setup(cvHandler *handlers.CVHandler, healthHandler *handlers.HealthHandler,
 	// Chat endpoint with rate limiting (30 requests/hour per IP)
 	chatRateLimiter := middleware.NewRateLimiter(c.RateLimitChatRequests, c.RateLimitChatWindow)
 	mux.Handle("/api/chat", chatRateLimiter.Middleware(http.HandlerFunc(chatHandler.HandleChat)))
+	mux.HandleFunc("/api/chat/warmup", chatHandler.HandleWarmup) // Pre-load model on chat open

 	// Public routes
 	mux.HandleFunc("/", cvHandler.Home)