Skip to content

Performance Optimization Report

Date: 2025-12-08 Objective: Instrument and reduce latency for the Sage AI service.

🚀 Key Improvements

Scenario Original Latency Final Latency Speedup
Simple Greeting 12.47s 3.56s - 4.60s ~3x Faster
Complex Command 22.36s 6.85s - 8.07s ~3x Faster
Context Query 17.36s 3.60s - 4.73s ~4x Faster

🛠️ Changes Implemented

1. SmartRouter & EntityExtractor Upgrade

  • Old: gpt-5-mini (Router) + gpt-5-nano (Extractor)
  • New: gemini-2.5-flash-lite for both.
  • Impact: Reduced combined routing + extraction overhead from ~8s to ~2-4s.

2. Database Optimization

  • Problem: RelationshipState service was using slow Supabase HTTP calls, causing a 5s delay on every first message.
  • Fix: Refactored to RelationshipStateServiceSQL using direct SQLAlchemy connection.
  • Impact: Context loading is now sub-100ms (warm) / ~700ms (cold).

3. Stability Fixes

  • Fixed a ForeignKeyViolation that was causing 500 errors during session creation (using uuid5 instead of DB ID).
  • Fixed AttributeError in entity extraction context.

4. Instrumentation

  • Added granular timing logs to SmartMessageRouter and ContextAggregator.
  • Enabled manual Keywords AI tracing for Gemini calls to ensure observability.

🔍 Bottleneck Analysis (Remaining)

  • External APIs: Operations involving Google Places (e.g., "add [Venue]") still incur ~5-8s latency due to external API round-trips and response generation.
  • LLM Generation: Generating the final text response (Stage 2) still takes ~1-2s, which is unavoidable for high-quality responses.

✅ Next Steps

  • Monitor Keywords AI dashboard to verify Gemini traces.
  • Consider moving "Response Generation" (Stage 2) to Gemini 2.5 Flash if further speed is needed (currently GPT-5).