Skip to content

PRD — Gateway API & Developer Platform

Doc owner: Justin Audience: Eng, Design, Product, GTM, Partnerships Status: v1 (February 2026) Depends on: prd.md (core platform), Portable_Persona_PRD.md, PRD 1 (Composable MiniApp System)


Implementation Status

Section Status Notes
Internal orchestrator ✅ Shipped app/orchestrator/two_stage_handler.py — classification + generation
FastAPI backend ✅ Shipped app/main.py — production-deployed
Supabase JWT auth ✅ Shipped app/identity/supabase_client.py
Public /v1/chat endpoint ❌ Not Shipped Internal only — no public API
OpenAI/Anthropic compat endpoints ❌ Not Shipped No drop-in compatibility layer
MCP Adapter ❌ Not Shipped No MCP tool exposure
Precompose endpoint ❌ Not Shipped No prompt-pack export
Proposals system (approve/reject) ❌ Not Shipped No write proposals
Audit API ❌ Not Shipped No audit event API
Webhooks ❌ Not Shipped No webhook system
API key management ❌ Not Shipped No org-scoped API keys
Developer console ❌ Not Shipped No self-service portal
SDKs (TypeScript, Python) ❌ Not Shipped No client libraries
Multi-tenancy (org isolation) ❌ Not Shipped Single-tenant consumer only

Note: This is entirely future-phase work. The internal orchestrator exists but nothing has been extracted into a public platform.


References

This PRD uses standardized terminology, IDs, pricing, and model references defined in the companion documents:

Document What it Covers
REFERENCE_GLOSSARY_AND_IDS.md Canonical terms: workflow vs miniapp vs superpower, ID formats
REFERENCE_PRICING.md Canonical pricing: $7.99/mo + $50/yr, free tier limits
REFERENCE_MODEL_ROUTING.md Pipeline stage → model tier mapping
REFERENCE_DEPENDENCY_GRAPH.md PRD blocking relationships and priority order
REFERENCE_FEATURE_FLAGS.md All feature flags by category
REFERENCE_TELEMETRY.md Amplitude event catalog and gaps

Executive Summary

The Gateway API is how Ikiro becomes a platform. It exposes the same Persona Passport + Memory Vault + Superpower Runtime that powers Sage in iMessage as a public API any application can call. One /v1/chat request; the Gateway auto-hydrates persona, recalls memories, enforces policy, calls the model, and returns the response with citations and proposed actions.

The MCP Adapter extends this into Claude, ChatGPT, and any MCP-compatible client — making custom personas available as tools inside existing AI assistants.

Business case: Consumer revenue proves the model. The Gateway is where enterprise revenue scales — brands, support teams, HR, and dev teams that want consistent persona + memory across their products without building infrastructure.

Current state: Internal orchestrator handles all processing. No public API. The Portable Persona PRD defines target contracts. This PRD specifies the migration from internal to public platform.


1) Product Surfaces

1.1 Gateway API (Primary)

Single HTTP endpoint wrapping the full orchestration:

POST /v1/chat → [Gateway]
  → Load persona → Recall memories → Apply policy
  → Compose prompt → Call model → Extract proposed writes
  → Return response + citations + proposals

One call replaces: prompt engineering + memory retrieval + persona management + policy enforcement + action extraction + audit logging.

1.2 Compatibility Endpoints (Drop-In)

Endpoint Compatible With Extra Headers
/v1/chat/completions OpenAI API x-persona-id, x-subject-id, x-memory-scopes
/v1/messages Anthropic API x-persona-id, x-subject-id, x-memory-scopes

Migration: change base URL, add 3 headers. Response shapes match vendor formats with added citations and proposedWrites.

1.3 MCP Adapter

Primary tool: orchestrate.answer — single call returns answer + citations + proposed writes.

Advanced tools: memory.recall, persona.get, memory.upsert(proposal), memory.forget.

1.4 Precompose (Fallback)

POST /v1/packs/precompose returns {system_prompt, context_messages[], ttl} for developers who want to call LLM providers directly. Loses centralized approvals, caching, and audit.


2) API Contracts

2.1 POST /v1/chat

Request:

{
  "model": "anthropic/claude-sonnet-4-5-20250929",
  "persona_id": "brand-default",
  "subject_id": "usr_123",
  "memory_scopes": ["org_kb", "gmail", "calendar"],
  "messages": [{"role": "user", "content": "What's due this week?"}],
  "stream": true,
  "redact_mode": "default"
}

Response:

{
  "id": "chat_abc",
  "choices": [{"index": 0, "message": {"role": "assistant", "content": "Two items..."}}],
  "citations": [{"factId": "a1", "uri": "gmail://msg_123", "confidence": 0.92}],
  "proposedWrites": [{
    "proposal_id": "prop_789",
    "type": "calendar.create",
    "payload": {"title": "STAT210 Quiz", "when": "2025-11-07T10:00:00-08:00"},
    "provenance": {"from": "gmail:msg_123", "confidence": 0.91}
  }],
  "usage": {"prompt_tokens": 1234, "completion_tokens": 567, "memory_recalls": 3, "cache_hit": false}
}

2.2 Proposals

POST /v1/proposals/:id/approve — execute write, persist assertion, log audit. POST /v1/proposals/:id/reject — discard, log rejection.

2.3 Audit

GET /v1/audit?subject_id=usr_123&action=write.approved&limit=50 — paginated audit events with provenance.

2.4 Webhooks

Events: proposal.created, write.approved, write.rejected, write.failed, ingestion.error, persona.updated, memory.created, memory.deleted. Delivered with HMAC signature.


3) Authentication & Multi-Tenancy

API Keys

  • Org API Key: Full access within org scope
  • Scoped API Key: Limited to specific persona_ids, subject_ids, or scopes
  • Rotation: Old key valid 24h after rotation (graceful migration)

Capability Tokens (Client-Side)

Short-lived tokens (15min) for browser/mobile SDKs. Created via server-side key exchange. Scoped to a single subject_id and persona_id.

Multi-Tenancy

  • Tenant isolation via Row Level Security (Supabase) or per-tenant schema
  • API key → org_id mapping at gateway edge
  • Rate limits per org (configurable)
  • Memory namespaces include org_id prefix: org_{org_id}_user_{subject_id}

4) Developer Experience

4.1 Onboarding Flow

  1. Sign up at developer console → create org → get API key
  2. Create first persona via Persona Studio (visual) or API
  3. Connect data sources (Gmail/Calendar/Drive) for the org or per-subject
  4. Make first /v1/chat call (quickstart: curl, Node, Python — see Portable Persona PRD Appendix A)
  5. Handle proposals (approve/reject) or configure auto-approve policies

4.2 SDKs

Language Package Priority
TypeScript/Node @ikiro/sdk P0 (ships with API)
Python ikiro P0 (ships with API)
Go ikiro-go P1

SDK features: typed request/response, streaming support, webhook verification, proposal helpers, retry logic.

4.3 Documentation

  • Quickstart: 5-minute guide from API key to first response
  • Concepts: Persona Passport, Memory Vault, Policy Overlays, Proposals
  • API Reference: OpenAPI 3.1 spec, auto-generated from code
  • Guides: "Build a support bot with memory," "Add persona to your existing ChatGPT app," "MCP integration for Claude"
  • Changelog: Versioned API changes with migration guides

4.4 Developer Console

Web UI for API management: - API key creation and rotation - Usage dashboard (calls, tokens, cost, latency) - Persona Studio (visual persona editor) - Memory Viewer (browse subject memories) - Webhook configuration - Audit log viewer - Policy Manager (org → team → user overlays)


5) Pricing Model

5.1 Tiers

Tier Price Includes Target
Free $0/mo 1,000 calls/mo, 1 persona, 100 memories/subject Hobbyists, testing
Pro $49/mo 50,000 calls/mo, unlimited personas, 10,000 memories/subject, webhooks Startups, small teams
Business $199/mo 500,000 calls/mo, policy overlays, audit API, priority support, SSO Mid-market
Enterprise Custom Unlimited, dedicated infra, SLA, VPC, CMK, HIPAA BAA Large orgs

5.2 Usage-Based Components

  • LLM tokens: Pass-through cost + 20% markup (developer sees total cost in dashboard)
  • Memory operations: Included in tier limits; overage at $0.001/operation
  • Connectors: Gmail/Calendar/Drive included; Slack/Notion/custom at $10/mo each (P1)

6) Migration Path (Internal → Public)

Phase 1: Internal Refactor (Weeks 1-3)

  • Extract orchestrator into standalone Gateway service
  • Add API key authentication layer
  • Add rate limiting and usage tracking
  • Ensure all internal iMessage/Telegram paths call Gateway (dogfood)

Phase 2: Beta API (Weeks 4-6)

  • Deploy /v1/chat and compat endpoints
  • TypeScript + Python SDKs
  • Developer console (key management, usage dashboard)
  • Documentation site
  • 10 beta partners (hand-picked)

Phase 3: MCP Adapter (Weeks 7-8)

  • orchestrate.answer MCP tool
  • Advanced MCP tools (recall, forget)
  • Testing with Claude Desktop and ChatGPT plugins
  • MCP installation guide

Phase 4: GA Launch (Weeks 9-12)

  • Public sign-up
  • Pricing enforcement
  • Webhook system
  • Audit API
  • SOC 2 Type I in progress
  • Developer marketing (blog posts, example apps, community)

7) Success Metrics

Metric Target Timeframe
Beta partners activated 10 Month 1
API calls / month 100K Month 3
Developer NPS >40 Month 3
Time to first successful call <10 minutes Ongoing
p95 latency (/v1/chat) <4 seconds Ongoing
Paid tier conversion >5% of free users Month 6
MCP adapter installs 500 Month 6

Feature Flags & Gating

Flag Key Default Purpose
enable_gateway_api false Master switch for public API endpoints
enable_mcp_adapter false MCP tool exposure
enable_precompose false Prompt-pack export endpoint
enable_proposals false Write proposal system (approve/reject)
enable_webhooks false Webhook delivery system
gateway_rate_limit_free 1000 Monthly call limit for free API tier
gateway_rate_limit_pro 50000 Monthly call limit for Pro tier
gateway_rate_limit_business 500000 Monthly call limit for Business tier
enable_audit_api false Audit event query endpoint
enable_multi_tenancy false Org-level isolation and API key scoping

See REFERENCE_FEATURE_FLAGS.md for the full catalog.


Telemetry

Event Trigger Properties
gateway_chat_request /v1/chat called org_id, persona_id, model, stream, latency_ms
gateway_chat_cached Cache hit on memory recall org_id, cache_key, ttl_remaining
gateway_proposal_created Write proposal generated org_id, proposal_type, confidence
gateway_proposal_resolved Proposal approved/rejected org_id, proposal_id, decision, latency_ms
gateway_webhook_sent Webhook delivered org_id, event_type, status_code, retry_count
gateway_auth_failed API key authentication fails org_id, reason (invalid/expired/rate_limited)
gateway_rate_limited Rate limit enforced org_id, tier, limit, current_usage
mcp_tool_called MCP adapter tool invoked tool_name, persona_id, latency_ms

Needed but not yet tracked (all — nothing is shipped): All events above are speculative. Implementation should include these from day one.

See REFERENCE_TELEMETRY.md for the full event catalog.


Definition of Done

  • /v1/chat endpoint returns responses with citations and proposed writes
  • OpenAI/Anthropic compatibility endpoints pass conformance tests
  • MCP adapter exposes orchestrate.answer tool and passes Claude Desktop testing
  • API key management: create, rotate (24h grace), scope to persona/subject
  • Multi-tenancy: org isolation via RLS, no data leakage between orgs
  • Rate limiting enforced per org per tier
  • Webhook delivery with HMAC signatures and retry logic
  • Audit API returns paginated events with provenance
  • SDKs (TypeScript + Python) with typed requests, streaming, and retry
  • p95 latency: <4 seconds for /v1/chat
  • Developer console: key management, usage dashboard, persona studio access
  • All endpoints gated by feature flags

8) Open Questions

  • Should the Gateway support model routing (auto-select cheapest model that meets quality threshold)?
  • How do we handle persona versioning in the API? (Persona changes mid-conversation)
  • Should proposals auto-approve based on confidence threshold, or always require explicit approval?
  • What's the right caching strategy for memory recalls? (Same subject, similar query within 5 min → cache hit)
  • Should the MCP adapter expose superpowers as individual MCP tools, or only through orchestrate.answer?
  • Data residency: when do we need EU-hosted clusters? (Trigger: first EU enterprise customer)
  • Should we offer a self-hosted Gateway for high-security customers? (VPC-deployed Docker image)