Skip to content

Archety Documentation

Optimization Report

rawrjustin/archety

Performance Optimization Report¶

Date: 2025-12-08 Objective: Instrument and reduce latency for the Sage AI service.

🚀 Key Improvements¶

Scenario	Original Latency	Final Latency	Speedup
Simple Greeting	12.47s	3.56s - 4.60s	~3x Faster
Complex Command	22.36s	6.85s - 8.07s	~3x Faster
Context Query	17.36s	3.60s - 4.73s	~4x Faster

🛠️ Changes Implemented¶

1. SmartRouter & EntityExtractor Upgrade¶

Old: gpt-5-mini (Router) + gpt-5-nano (Extractor)
New: gemini-2.5-flash-lite for both.
Impact: Reduced combined routing + extraction overhead from ~8s to ~2-4s.

2. Database Optimization¶

Problem: RelationshipState service was using slow Supabase HTTP calls, causing a 5s delay on every first message.
Fix: Refactored to RelationshipStateServiceSQL using direct SQLAlchemy connection.
Impact: Context loading is now sub-100ms (warm) / ~700ms (cold).

3. Stability Fixes¶

Fixed a ForeignKeyViolation that was causing 500 errors during session creation (using uuid5 instead of DB ID).
Fixed AttributeError in entity extraction context.

4. Instrumentation¶

Added granular timing logs to SmartMessageRouter and ContextAggregator.
Enabled manual Keywords AI tracing for Gemini calls to ensure observability.

🔍 Bottleneck Analysis (Remaining)¶

External APIs: Operations involving Google Places (e.g., "add [Venue]") still incur ~5-8s latency due to external API round-trips and response generation.
LLM Generation: Generating the final text response (Stage 2) still takes ~1-2s, which is unavoidable for high-quality responses.

✅ Next Steps¶

Monitor Keywords AI dashboard to verify Gemini traces.
Consider moving "Response Generation" (Stage 2) to Gemini 2.5 Flash if further speed is needed (currently GPT-5).