Skip to content

mem0 Ingestion Hardening

Incident Summary

  • mem0.add rejected payloads because end-to-end conversations were passed through untouched when Sage ingested Gmail threads.
  • request.text contained ~700k tokens and classifier metadata echoed the same body, blowing past mem0 limits (100k tokens, 2k metadata chars).

Immediate Safeguards

  • Instrumented MemoryService to log payload length, approximate tokens, and metadata size per namespace.
  • Sanitized orchestrator metadata to strip message bodies and trim verbose fields before delegating to mem0.
  • Added a shared limiter in MemoryService that:
  • Summarizes text above 8k chars with a head/tail excerpt and truncation notice.
  • Prunes or collapses metadata above ~1.9k chars while preserving type, timestamp, and confidence.
  • Emits warnings when truncation occurs so we can trace degradations.
  • Added regression tests so any future change that violates these guarantees fails in CI.
  1. Dual-rail ingestion
  2. Persist raw, unbounded artifacts (emails, attachments, transcripts) in low-cost object storage keyed by namespace + timestamp.
  3. Store only compact summaries and pointers inside mem0.
  4. Async summarization pipeline
  5. Use a background worker to summarize oversized captures via inexpensive LLM tier or heuristic compressor.
  6. Backfill mem0 with the condensed memory once ready; retry/backoff on failures without blocking the live conversation loop.
  7. Policy engine
  8. Define retention policies per source (e.g., email vs. chat) to decide which fields must be redacted, how aggressively to summarize, and when to discard.
  9. Observability
  10. Emit metrics for truncation counts, original payload sizes, and summary latency so the team can tune limits ahead of user reports.

This staged approach keeps mem0 lean for retrieval quality, prevents sudden quota violations, and preserves raw artifacts for future reprocessing or analytics without impacting conversational latency.