Skip to content

PRD — Memory System

Doc owner: Justin Audience: Eng, Design, Product, Security Status: v2 (February 2026 — all mem0 references replaced with Supermemory; multi-query retrieval documented; Event Tracker documented; nightly dedup specified; provider-agnostic forgetting/boundary interface; future_event memory type added; latency metrics split raw vs pipeline; GDPR mapped to endpoints; Section 6 rewritten from migration plan to current architecture) Depends on: PRD 3 (Progression & Loadout)

v1 changelog: Initial memory system spec — dual storage, namespace isolation, 8 memory types, classification pipeline, recall, forgetting, quality metrics, GDPR compliance


Implementation Status

Section Status Notes
Supermemory semantic search ✅ Shipped app/memory/supermemory_service.py — <300ms raw latency
Multi-query retrieval (Tolan-inspired) ✅ Shipped app/memory/query_synthesizer.py — GPT-5-nano auxiliary question generation
Namespace isolation ✅ Shipped Service-layer enforcement, not prompt-level
Memory classification pipeline ✅ Shipped Piggybacked on response generation, no extra LLM call
Photo memory extraction (GPT-5 Vision) ✅ Shipped Visual memory pipeline with trait extraction
"Forget that" command ✅ Shipped Soft-delete with 30-day hard-delete
Nightly deduplication ✅ Shipped app/memory/memory_maintenance.py — runs at 3am, cosine 0.85 threshold
Event Tracker ("I Remember") ✅ Shipped app/memory/event_tracker.py — future event detection + 3-touch proactive sequence
Persona memory (JSON + semantic) ✅ Shipped Dual storage for persona facts
GDPR data export ✅ Shipped app/api/account_routes.py — JSON export with provenance
Boundary enforcement ✅ Shipped boundary_manager.py — service-layer, provider-agnostic
Group memory extraction-only policy ❌ Not Shipped Needs logistics_only filter (see PRD 5 v2)
Memory health dashboard ❌ Not Shipped No admin UI for memory metrics
Confidence decay ❌ Not Shipped Old memories don't decay in priority (Phase 2 candidate)
Memory pinning ❌ Not Shipped No user-pinned high-priority memories (Phase 2 candidate)

References

This PRD uses standardized terminology, IDs, pricing, and model references defined in the companion documents:

Document What it Covers
REFERENCE_GLOSSARY_AND_IDS.md Canonical terms: workflow vs miniapp vs superpower, ID formats
REFERENCE_PRICING.md Canonical pricing: $7.99/mo + $50/yr, free tier limits
REFERENCE_MODEL_ROUTING.md Pipeline stage → model tier mapping
REFERENCE_DEPENDENCY_GRAPH.md PRD blocking relationships and priority order
REFERENCE_FEATURE_FLAGS.md All feature flags by category
REFERENCE_TELEMETRY.md Amplitude event catalog and gaps

Executive Summary

Memory is Ikiro's core differentiator. Every other AI assistant forgets you between sessions. Sage remembers your inside jokes, your roommate's name, your exam schedule, your fear of deep water, and the time you cried about your parents' divorce. This PRD is the unified specification for how memories are created, classified, stored, searched, recalled, and deleted.

The system has three distinct memory domains — User Memories (what Sage knows about you), Persona Memories (what Sage knows about herself), and Group Memories (what Sage knows about a group's plans) — each with strict namespace isolation. Memories carry provenance (where they came from), sensitivity flags (how carefully to handle them), and validity windows (when they expire).

Current state: Supermemory-powered semantic search operational at <300ms raw latency (25x faster than previous mem0 system). Multi-query retrieval via auxiliary question synthesis improves recall breadth. Namespace isolation enforced at service layer. Photo memory extraction live. "Forget that" command functional. Event Tracker detects future events for proactive surfacing. Nightly deduplication runs at 3am. Persona memory system with JSON + Supermemory dual storage deployed.


1) Memory Taxonomy

1.1 User Memories (About the User)

Type Examples Sensitivity Typical Lifespan
Emotional "scared about the job interview," "fought with roommate," "cried about parents" High Indefinite (decays in recall priority, never auto-deleted)
Factual "studies CS at UCLA," "allergic to peanuts," "drives a Honda Civic" Medium Indefinite
Deadline "STAT210 quiz on Nov 7," "rent due the 1st," "internship app deadline March 15" Medium Until valid_to passes
Inside Joke "the 'quack' thing from the duck incident," "calling Thursdays 'struggle bus day'" Low Indefinite
Preference "hates being asked 'how are you,'" "prefers morning check-ins," "vegetarian" Low Indefinite
People "best friend Kai," "boss named Maria," "toxic ex named Jake" High Indefinite
Visual "photo of sunset at Malibu," "messy bookshelf in dorm room" Low-Medium Indefinite
Future Event "bar exam on March 20," "friend's wedding June 12," "job interview next Thursday" Medium Until event passes (then deprioritized, not deleted). Proactively resurfaced via Event Tracker.
Boundary "don't bring up my weight," "stop checking my calendar" Critical Until explicitly revoked

1.2 Persona Memories (About Sage/Echo)

Type Examples Source
Experience "spent a summer at a coffee shop," "once tried to learn guitar" JSON file (authored)
Preference "thinks pineapple pizza is amazing" JSON file (authored)
Opinion "believes people overthink career decisions" JSON file (authored)
Learned Fact "someone shared they're afraid of deep water" Conversation (auto-classified, anonymized)
Interest "obsessed with true crime podcasts" JSON file (authored)

1.3 Group Memories (About Group Plans)

Type Examples Lifespan
Plan State "Saturday dinner at 7pm, Kai is driving" Until event passes
Poll Results "voted 7pm: Jess, Kai. voted 8pm: Marcus" Until poll closed + 7 days
Checklist "Vegas list: chargers (Kai), gum (Jess)" Until trip ends
Commitment "Marcus said he'd make the reservation" Until fulfilled or expired
Group Boundary "User A: don't bring up my work stuff" Indefinite

Group memory extraction policy: Group memories use an extraction-only policy — only logistics-relevant content (plan details, poll results, checklists, roles, logistics facts) is stored. Emotional disclosures, casual banter, and sensitive personal content from group members are seen but NOT extracted. See PRD 5 v2 Section 4.1 for the full policy. Gated by group_memory_logistics_only feature flag.


2) Memory Lifecycle

2.1 Creation

Automatic extraction (primary path):

User message → Intent/Emotion classifier → Memory classifier → Store if significant

The memory classifier runs on every user message (piggybacked on the response generation, no extra LLM call). It evaluates: - Is this memorizable? (factual, emotional, deadline, preference, or noise?) - Importance score (1-10) — must be ≥4 to store - Memory type — which category from the taxonomy - Sensitivity level — low/medium/high/critical - Existing memory check — does this duplicate or update an existing memory?

Manual creation paths: - Photo upload → GPT-5 Vision extracts visual memories - OAuth data → Calendar events become deadline memories, email patterns become people memories - Admin panel → Direct memory injection for persona memories

2.2 Classification Pipeline

class MemoryClassifier:
    async def classify(self, message: str, context: ConversationContext) -> MemoryCandidate | None:
        # Stage 1: Quick filter (regex + keyword, no LLM)
        if self.is_noise(message):  # "lol", "ok", "haha", single emoji
            return None

        # Stage 2: LLM classification (piggybacked on response generation)
        classification = await self.llm_classify(message, context)

        # Stage 3: Deduplication check
        existing = await self.search_similar(classification.text, threshold=0.85)
        if existing:
            return self.merge_or_skip(existing, classification)

        # Stage 4: Importance threshold
        if classification.importance < 4:
            return None

        return classification

Deduplication: Before storing, search Supermemory for semantically similar memories (cosine similarity >0.85). If found: - Same fact, same timeframe → skip (duplicate) - Same topic, new information → update existing memory - Same topic, contradicting information → store new, flag old as potentially outdated

2.3 Storage

Dual storage model:

[Supermemory (Semantic Search)]    [Supabase PostgreSQL (Structured)]
├── Vector embeddings              ├── Memory metadata
├── Semantic search index          ├── Provenance records
├── Full text content              ├── Sensitivity flags
└── Namespace isolation            ├── Validity windows
                                   ├── Boundary records
                                   └── Audit trail

Why dual? Supermemory excels at semantic search ("find memories about the user's family") with <300ms latency. PostgreSQL excels at structured queries ("all memories created this week," "all boundaries for user X," "all high-sensitivity memories for GDPR export").

Namespace strategy: | Context | Namespace Pattern | Example | |---------|------------------|---------| | User direct chat | user_{phone}_persona_{persona_id} | user_+15551234567_persona_sage | | Persona's own memories | persona_life_{persona_id} | persona_life_sage | | Group chat | group_{chat_guid} | group_iMessage_abc123 |

When memories are recalled: Every user message triggers a memory search. The orchestrator queries Supermemory with the user's message + recent conversation context as the search query.

Search parameters:

async def recall_memories(user_id: str, persona_id: str, query: str, mode: str) -> list[Memory]:
    namespace = f"user_{user_id}_persona_{persona_id}" if mode == "direct" else f"group_{chat_guid}"

    results = await supermemory.search(
        query=query,
        namespace=namespace,
        limit=10,
        threshold=0.6  # minimum relevance score
    )

    # Post-filter
    results = filter_by_validity(results)      # Remove expired memories
    results = filter_by_boundary(results)       # Respect "forget" commands
    results = rank_by_recency_and_relevance(results)  # Recent + relevant first

    return results[:5]  # Top 5 injected into context

Context injection: Top 5 recalled memories are injected into the system prompt as a "WHAT YOU KNOW ABOUT THIS PERSON" section. The LLM decides naturally whether and how to reference them.

Persona memory recall: Separately, 3-5 relevant persona memories are injected as "ABOUT YOU" section — enabling the companion to reference their own experiences naturally.

2.4.1 Multi-Query Retrieval (Tolan-Inspired)

Single-query search often misses relevant memories when the user's message is ambiguous or touches multiple topics. The multi-query retrieval pipeline synthesizes auxiliary search queries for broader recall coverage.

Pipeline:

User message: "i'm so nervous about tomorrow"
  → GPT-5-nano generates 2-3 auxiliary questions:
      Q1: "What events does the user have coming up?"
      Q2: "Has the user expressed anxiety or nervousness before?"
      Q3: "What deadlines or stressful situations has the user mentioned?"
  → Parallel Supermemory searches (Q_original + Q1 + Q2 + Q3)
  → Deduplicate results across all queries (cosine similarity >0.9 = same memory)
  → Rank by combined relevance score + recency
  → Top 5 injected into context

Implementation: app/memory/query_synthesizer.py generates auxiliary questions using GPT-5-nano (~50ms). All queries run in parallel against Supermemory. Results are deduplicated and re-ranked.

Performance: Multi-query adds ~100-150ms to the recall pipeline (synthesis + parallel search + dedup) but significantly improves recall breadth. The total multi-query pipeline target is <500ms (p50), <800ms (p95). Can be disabled via enable_multi_query_retrieval feature flag for latency-critical paths.

When multi-query fires: All 1:1 conversations. Group mode uses single-query only (logistics queries are typically unambiguous, and latency matters more in fast-moving group chats).

2.5 Forgetting (Deletion)

User-initiated: - "Forget that" → Soft-delete the most recent memory or the memory most relevant to the preceding conversation - "Forget everything about [topic]" → Search and soft-delete all memories matching topic - "Stop checking my calendar" → Revoke OAuth + store boundary + delete OAuth-derived memories

Provider-agnostic deletion interface:

The forgetting system is defined as a service-layer contract, not coupled to any specific storage provider. All deletion operations go through the MemoryService interface:

class MemoryService:
    async def delete_by_id(self, memory_id: str) -> bool:
        """Soft-delete a specific memory. Returns True if found and deleted."""

    async def delete_by_topic(self, user_id: str, topic: str) -> int:
        """Search and soft-delete all memories matching topic. Returns count deleted."""

    async def delete_all(self, user_id: str) -> int:
        """Soft-delete all memories for a user (account deletion path). Returns count."""

    async def hard_delete_expired(self) -> int:
        """Hard-delete all memories past the 30-day grace period. Called by nightly job."""

Soft delete behavior (enforced regardless of provider): - Soft-deleted memories are excluded from all search results immediately - Hard-deleted after 30 days (GDPR compliance window) - User can request immediate hard deletion via account settings (POST /account/delete) - Deletion context is logged in the audit trail for compliance

System-initiated: - Deadline and future_event memories auto-expire when valid_to passes (not deleted, just deprioritized in recall) - Group memories for completed events expire after 7 days - No automatic deletion of emotional or factual memories (users control their own data)

2.6 Nightly Deduplication

Implementation: app/memory/memory_maintenance.py runs at 3am UTC via the background scheduler.

Algorithm: 1. For each active user, retrieve all non-deleted memories from Supermemory 2. Compute pairwise cosine similarity between memory embeddings 3. Memories with similarity >0.85 (memory_dedup_similarity_threshold) are candidates for merging 4. Conflict resolution: - Same fact, same timeframe → keep the newer one, delete the older - Same topic, different details → merge into a single updated memory (newer detail wins) - Emotional memories with different contexts → keep both (emotional memories are context-sensitive) - Contradicting facts → keep the newer one, mark the older as superseded 5. Log all dedup actions to memory_audit table and fire memory_deduplicated telemetry event

Performance: Per-user batching. Average user with ~50 memories completes in <2 seconds. Users with 500+ memories may take 10-15 seconds. Non-blocking — does not affect real-time conversation latency.

Safeguards: Boundary memories are never deduplicated. Pinned memories (future) are never deduplicated. The dedup job is gated by enable_memory_deduplication feature flag.


3) Memory Quality & Evaluation

3.1 Recall Quality Metrics

Metric Target Measurement
Precision@5 ≥0.8 Of the top 5 recalled memories, ≥4 are relevant to the query
Recall accuracy ≥85% Semantic similarity between recalled fact and ground truth
Raw search latency (p50) <300ms Single Supermemory search (no multi-query)
Raw search latency (p95) <600ms Single Supermemory search (no multi-query)
Multi-query pipeline latency (p50) <500ms Full pipeline: synthesis + parallel search + dedup + rank
Multi-query pipeline latency (p95) <800ms Full pipeline end-to-end
False positive rate <10% Memories recalled that are irrelevant or outdated
Boundary violation rate 0% Zero tolerance — deleted/boundary memories must never appear

3.2 Evaluation Framework

Seeded eval set: 200 test scenarios with known-correct memory recalls:

{
  "scenario": "User asks about upcoming deadlines",
  "user_message": "what do I have due this week?",
  "expected_memories": ["STAT210 quiz Thursday", "essay draft due Friday"],
  "should_not_recall": ["birthday party last month", "fight with roommate"]
}

Automated testing: Run eval set nightly against production Supermemory instance. Alert if Precision@5 drops below 0.75.

Manual review: Weekly sample of 50 real conversations, scored by: - Did Sage reference a memory when she should have? - Was the memory reference accurate? - Was it contextually appropriate (not forced or awkward)? - Did Sage avoid referencing memories she shouldn't have?

3.3 Memory Health Dashboard (Admin)

  • Total memories per user (distribution)
  • Memory type breakdown (emotional vs factual vs deadline)
  • Deduplication rate (% of candidate memories that were duplicates)
  • Recall hit rate (% of conversations where at least one memory was recalled)
  • Boundary enforcement audit (verify no deleted memories appeared in responses)
  • Stale memory count (memories older than 90 days never recalled)

4) Privacy Model

4.1 Sensitivity Levels

Level Examples Handling
Critical Boundaries, OAuth revocations, explicit "don't mention" requests Hard enforcement — checked before every response. Violation = P0 bug.
High Emotional disclosures, mental health, relationships, financial stress Referenced only in 1:1. Never in groups. Never in logs without redaction.
Medium Schedule, deadlines, school/work info, people's names Referenced in 1:1. Not in groups unless user explicitly shares.
Low Preferences, inside jokes, general interests Can be referenced freely in 1:1. Inside jokes never leak to groups.

4.2 Namespace Isolation (Enforced at Service Layer)

class MemoryService:
    async def search(self, user_id, persona_id, query, mode, chat_guid=None):
        if mode == "group":
            # ONLY search group namespace — never user's private memories
            namespace = f"group_{chat_guid}"
        elif mode == "direct":
            # Search user's private namespace
            namespace = f"user_{user_id}_persona_{persona_id}"
        else:
            raise ValueError(f"Unknown mode: {mode}")

        # This is the ONLY place namespace is determined
        # The orchestrator and persona engine cannot override this
        return await self.supermemory.search(query=query, namespace=namespace, limit=10)

Design principle: Namespace isolation is enforced at the service layer, not the prompt layer. Even if the LLM is instructed to ignore boundaries, it physically cannot access memories outside its namespace because the search API never returns them.

4.3 GDPR Compliance

Right Implementation Endpoint
Right to access Export all memories as JSON with provenance metadata GET /account/data-export (app/api/account_routes.py)
Right to rectification User says "actually that's wrong" → memory updated. Admin panel override for manual corrections. Conversational (orchestrator). Admin: PUT /memories/{id} (future)
Right to erasure "Forget everything" → soft-delete all → hard-delete at 30 days. Account deletion → immediate hard-delete of all data. DELETE /memories/all (soft) / POST /account/delete (hard, 30-day grace)
Right to portability Same export as Right to access — JSON with provenance, sensitivity, timestamps GET /account/data-export
Right to object Boundaries system — "don't track my [topic]" creates a user_boundary record that blocks future extraction and recall Conversational ("stop tracking X"). API: POST /boundaries (future)

4.4 Persona Memory Privacy

When Sage learns from conversations, strict anonymization applies:

User Said Sage Stores
"John at Google told me about the layoffs" "Had a conversation about tech industry layoffs"
"I'm terrified of deep water" "Someone shared they're afraid of deep water"
"My therapist says I should journal" "Heard that journaling can be therapeutic"
"I live at 123 Main St" NOT STORED (PII filter catches address patterns)

5) Memory-Driven Features

5.1 Persona Modifiers (PRD 3 Integration)

Before each conversation, derive_persona_modifiers() analyzes the user's memory profile and injects behavioral instructions. See PRD 3 Section 3 for full specification.

5.2 Emotional Hook Rate

Target: ≥50% of replies reference a past shared moment. Tracked by:

def has_memory_reference(response: str, recalled_memories: list) -> bool:
    # Check if response contains natural reference to any recalled memory
    # Uses embedding similarity between response and memory content
    for memory in recalled_memories:
        if cosine_similarity(embed(response), embed(memory.text)) > 0.7:
            return True
    return False

5.3 Inside Joke Detection

Inside jokes are detected and stored with special handling: - Triggered when a callback to a shared moment gets a positive reaction (laughter, "lmao," "I can't") - Stored with type: "inside_joke" and high recall priority - Referenced at higher frequency than other memory types (drives perceived intimacy) - Weighted 5x in the progression formula (PRD 3)

5.4 Photo Memory Pipeline

User sends photo in iMessage
  → Edge agent forwards to backend
  → GPT-5 Vision analysis:
      - Scene description
      - Objects detected
      - People (count, not identity)
      - Mood/emotion of scene
      - Location cues
  → Memory classifier determines if memorable
  → Store as visual memory with extracted context
  → Companion acknowledges: "saved! I'll remember this"
  → Later recall: "remember that sunset photo you sent? that was gorgeous"

5.5 Event Tracker ("I Remember" Proactivity)

Implementation: app/memory/event_tracker.py

The Event Tracker detects future events mentioned in conversation and stores them as type: future_event memories with proactive surfacing schedules. This bridges PRD 7 (memory) and PRD 10 (proactive intelligence).

Detection: During memory classification, the classifier identifies future-looking statements: - "my bar exam is on March 20" - "friend's wedding is June 12" - "job interview next Thursday" - "finals start in two weeks"

These are stored with type: future_event, an importance_score (1-10 based on emotional weight), and a valid_to date (the event date).

3-Touch Proactive Sequence ("Hype Person"):

For high-importance events (importance ≥7), the Event Tracker schedules a 3-touch sequence via the proactive scheduler (PRD 10):

Touch 1 — Day before: "hey just so you know i'm thinking about
           your [event] tomorrow. you've got this 💪"

Touch 2 — Morning of: "today's the day! go crush your [event].
           i'm rooting for you 🎉"

Touch 3 — Day after:  "sooo how'd [event] go?? tell me everything"

Each touch is companion-specific (Sage = warm encouragement, Vex = hype energy, Echo = calm support). The sequence is non-blocking — if the user messages organically before a touch fires, the companion weaves the event into natural conversation instead.

Lower-importance events (importance 4-6): Single-touch reminder the day before, no follow-up sequence.

Telemetry: event_tracker_detected (detection) and event_tracker_resurfaced (proactive touch fired).


6) Current Architecture & Optimization Roadmap

6.1 Current: Supermemory (Shipped)

  • Provider: Supermemory cloud
  • Performance: <300ms raw search latency (p50), <500ms multi-query pipeline (p50)
  • Strengths: 25x faster than previous mem0 system, managed infrastructure, namespace isolation, semantic search with high recall accuracy
  • Migration: Completed. mem0 exists as legacy behind a feature flag (enable_legacy_mem0) but is not used in production.
  • Cost: ~$0.15/user/month at current usage patterns

6.2 Architecture

[Supermemory (Semantic Search)]    [Supabase PostgreSQL (Structured)]    [Background Jobs]
├── Vector embeddings               ├── memory_metadata table             ├── Nightly dedup (3am)
├── Semantic search index            ├── user_boundaries table            ├── Event Tracker scheduling
├── Full text content                ├── memory_audit table               ├── Hard-delete expired (30d)
├── Namespace isolation              ├── Provenance records               └── Memory health checks
└── Multi-query parallel search      └── GDPR operations

6.3 Optimization Roadmap (Future)

Optimization Trigger Benefit
pgvector in Supabase Supermemory costs exceed $500/month or need sub-100ms recall Self-hosted vector search, eliminates external dependency
Redis hot-cache P95 latency exceeds 800ms Cache last 10 recalled memories per user, 5-min TTL, reduces repeat searches
Graph relationships Memory-driven features need "related memories" traversal "roommate fight → apartment stress → deadline anxiety" as connected graph
Tiered storage Users with 1000+ memories Hot memories (recalled in last 30d) in fast tier, cold memories in cheap storage

7) Data Model

Core Tables

-- Memory metadata (Supabase — complements Supermemory)
CREATE TABLE memory_metadata (
    id UUID PRIMARY KEY,
    supermemory_id TEXT,  -- Reference to Supermemory record
    user_id UUID REFERENCES users(id),
    namespace TEXT NOT NULL,
    memory_type TEXT CHECK (memory_type IN ('emotional', 'factual', 'deadline', 'future_event', 'inside_joke', 'preference', 'people', 'visual', 'boundary', 'persona_experience', 'persona_opinion', 'group_plan')),
    sensitivity TEXT CHECK (sensitivity IN ('low', 'medium', 'high', 'critical')),
    importance INTEGER CHECK (importance BETWEEN 1 AND 10),
    content_preview TEXT,  -- First 100 chars (for admin UI)
    provenance JSONB,  -- {source, snippet, confidence}
    valid_from TIMESTAMPTZ,
    valid_to TIMESTAMPTZ,
    pii_tags TEXT[] DEFAULT '{}',
    recall_count INTEGER DEFAULT 0,
    last_recalled_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT now(),
    deleted_at TIMESTAMPTZ,
    deleted_by TEXT  -- 'user_command', 'system_expiry', 'admin', 'gdpr_request'
);

-- Boundary records (enforced before every response)
CREATE TABLE user_boundaries (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID REFERENCES users(id),
    scope TEXT CHECK (scope IN ('global', 'group', 'topic')),
    chat_guid TEXT,  -- NULL for global boundaries
    boundary_text TEXT NOT NULL,
    boundary_type TEXT CHECK (boundary_type IN ('forget_topic', 'revoke_oauth', 'no_proactive', 'custom')),
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMPTZ DEFAULT now()
);

-- Memory audit trail
CREATE TABLE memory_audit (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    memory_id UUID,
    action TEXT CHECK (action IN ('created', 'recalled', 'updated', 'soft_deleted', 'hard_deleted', 'exported')),
    actor TEXT,  -- 'system', 'user', 'admin'
    context TEXT,
    created_at TIMESTAMPTZ DEFAULT now()
);

8) API Endpoints

Endpoint Auth Purpose
GET /memories User List user's memories (paginated, filterable by type/date)
GET /memories/search User Semantic search across user's memories
GET /memories/export User GDPR export — all memories as JSON
DELETE /memories/{id} User Soft-delete specific memory
DELETE /memories/topic/{topic} User Forget all memories about a topic
DELETE /memories/all User Soft-delete everything (account deletion path)
GET /memories/health Admin Memory system health dashboard data
GET /memories/eval Admin Run evaluation suite, return metrics

9) Success Metrics

Metric Target Why
Recall accuracy ≥85% Core quality bar
Precision@5 ≥0.8 Relevant memories surfaced
Emotional hook rate ≥50% of responses reference a memory Drives perceived intimacy
Raw search latency (p50) <300ms Single Supermemory search baseline
Multi-query pipeline latency (p50) <500ms Full recall pipeline including auxiliary question synthesis
Boundary violation rate 0% Zero tolerance
Memory per active user (30d) 30-100 Healthy memory accumulation
Deduplication rate >80% Not storing redundant memories
"Forget that" success rate 100% User trust

Feature Flags & Gating

Flag Key Default Purpose
enable_memory_system true Master switch for memory storage and recall
memory_importance_threshold 4 Minimum importance score (1-10) to store
memory_search_limit 10 Max memories returned per search
memory_top_k_injected 5 Top-K memories injected into context
enable_multi_query_retrieval true Tolan-inspired auxiliary question synthesis
enable_event_tracker true Proactive "I Remember" event detection
enable_memory_deduplication true Nightly deduplication at 3am
enable_photo_memory true GPT-5 Vision memory extraction
memory_dedup_similarity_threshold 0.85 Cosine similarity for dedup matching

See REFERENCE_FEATURE_FLAGS.md for the full catalog.


Telemetry

Event Trigger Properties
memory_created New memory stored user_id, memory_type, importance, sensitivity
memory_recalled Memory surfaced in conversation user_id, memory_id, relevance_score, age_days
memory_deleted User says "forget that" user_id, memory_id, deletion_type (specific/topic/all)
memory_deduplicated Nightly dedup removes duplicate user_id, original_id, duplicate_id, similarity
memory_search Recall query executed user_id, query_count (multi-query), results_count, latency_ms
event_tracker_detected Future event identified in conversation user_id, event_type, importance_score
event_tracker_resurfaced "I Remember" proactively recalls event user_id, event_id, days_until_event
photo_memory_extracted Photo processed into memory user_id, traits_extracted, processing_time_ms
boundary_set User sets a "don't mention X" boundary user_id, boundary_type, scope
boundary_enforced Boundary prevented a recall user_id, boundary_id, blocked_memory_id

Needed but not yet tracked: - memory_health_check — periodic system-wide recall quality metrics - memory_cap_reachedRemoved. PRD 12 v2 eliminated the memory cap. Memory is unlimited for all users.

See REFERENCE_TELEMETRY.md for the full event catalog.


Definition of Done

  • Raw search latency: <300ms (p50), <600ms (p95)
  • Multi-query pipeline latency: <500ms (p50), <800ms (p95)
  • Precision@5: ≥0.8 (of top 5 recalled, ≥4 relevant)
  • Boundary violation rate: 0% (deleted/boundary memories never appear)
  • "Forget that" succeeds 100% of the time with immediate effect
  • Namespace isolation enforced at service layer (not prompt level)
  • Nightly deduplication runs without errors and reduces redundancy >80%
  • Multi-query retrieval improves recall accuracy over single-query baseline
  • GDPR export includes all memories with provenance metadata
  • Photo memory pipeline processes images within 5 seconds
  • All memory operations tracked via telemetry events
  • Provider-agnostic interfaces: PRD references memory operations, not mem0/Supermemory specifics

10) Open Questions

Resolved

  • When should we migrate from mem0 to Supermemory or self-hosted? Resolved. Migration to Supermemory completed. mem0 is legacy behind enable_legacy_mem0 flag. See Section 6 for optimization roadmap if Supermemory costs exceed thresholds.
  • How do we handle contradictory memories? Resolved. Nightly deduplication (Section 2.6) handles this: contradicting facts → keep the newer one, mark the older as superseded. Real-time classification also flags contradictions at ingest (Section 2.2).

Phase 2 Candidates

  • Confidence decay: Should old memories be recalled with less certainty over time? Currently all memories have equal recall weight regardless of age. Decay would deprioritize stale memories in search ranking without deleting them. Tracked in Implementation Status as Phase 2 candidate.
  • Memory pinning: Should users be able to "pin" important memories that always have high recall priority? Would require UI in a future memory viewer. Tracked in Implementation Status as Phase 2 candidate.

Still Open

  • Should the memory viewer show everything, or curate a "highlights" view? (Depends on admin dashboard timeline — Phase 7.)
  • How do we handle memories about other people? ("My friend Kai is going through a breakup" — is this Kai's data or the user's?) Current approach: stored as the user's memory about their social context, with people type. If Kai is also an Ikiro user, their own namespace is separate. No cross-user deduplication.