Performance Optimization Report¶
Date: 2025-12-08 Objective: Instrument and reduce latency for the Sage AI service.
🚀 Key Improvements¶
| Scenario | Original Latency | Final Latency | Speedup |
|---|---|---|---|
| Simple Greeting | 12.47s | 3.56s - 4.60s | ~3x Faster |
| Complex Command | 22.36s | 6.85s - 8.07s | ~3x Faster |
| Context Query | 17.36s | 3.60s - 4.73s | ~4x Faster |
🛠️ Changes Implemented¶
1. SmartRouter & EntityExtractor Upgrade¶
- Old:
gpt-5-mini(Router) +gpt-5-nano(Extractor) - New:
gemini-2.5-flash-litefor both. - Impact: Reduced combined routing + extraction overhead from ~8s to ~2-4s.
2. Database Optimization¶
- Problem:
RelationshipStateservice was using slow Supabase HTTP calls, causing a 5s delay on every first message. - Fix: Refactored to
RelationshipStateServiceSQLusing direct SQLAlchemy connection. - Impact: Context loading is now sub-100ms (warm) / ~700ms (cold).
3. Stability Fixes¶
- Fixed a
ForeignKeyViolationthat was causing 500 errors during session creation (usinguuid5instead of DB ID). - Fixed
AttributeErrorin entity extraction context.
4. Instrumentation¶
- Added granular timing logs to
SmartMessageRouterandContextAggregator. - Enabled manual Keywords AI tracing for Gemini calls to ensure observability.
🔍 Bottleneck Analysis (Remaining)¶
- External APIs: Operations involving Google Places (e.g., "add [Venue]") still incur ~5-8s latency due to external API round-trips and response generation.
- LLM Generation: Generating the final text response (Stage 2) still takes ~1-2s, which is unavoidable for high-quality responses.
✅ Next Steps¶
- Monitor Keywords AI dashboard to verify Gemini traces.
- Consider moving "Response Generation" (Stage 2) to Gemini 2.5 Flash if further speed is needed (currently GPT-5).