feat: auto-fallback Gemini→Ollama + model warmup on chat open
Dual-provider architecture: - Both Gemini and Ollama initialize at startup (if configured) - Primary (Gemini) tried first for every request - On any error (429, 503, timeout), automatically falls back to Ollama - No manual switching needed — completely transparent to the user - Log shows: "Primary failed (gemini: ...), falling back to ollama: ..." Warmup: - POST /api/chat/warmup called silently when chat panel opens - Pre-loads Ollama model in background (10-15s) while user reads welcome - By the time user types, model is ready for instant response - Warms up fallback provider specifically (Gemini doesn't need it) Timeout: - Agent context increased to 60s (Ollama first response can be slow) - Each request creates a fresh session (stateless for fallback compat)
This commit is contained in:
@@ -23,6 +23,7 @@ func Setup(cvHandler *handlers.CVHandler, healthHandler *handlers.HealthHandler,
|
||||
// Chat endpoint with rate limiting (30 requests/hour per IP)
|
||||
chatRateLimiter := middleware.NewRateLimiter(c.RateLimitChatRequests, c.RateLimitChatWindow)
|
||||
mux.Handle("/api/chat", chatRateLimiter.Middleware(http.HandlerFunc(chatHandler.HandleChat)))
|
||||
mux.HandleFunc("/api/chat/warmup", chatHandler.HandleWarmup) // Pre-load model on chat open
|
||||
|
||||
// Public routes
|
||||
mux.HandleFunc("/", cvHandler.Home)
|
||||
|
||||
Reference in New Issue
Block a user