PRD — Gateway API & Developer Platform¶

Doc owner: Justin Audience: Eng, Design, Product, GTM, Partnerships Status: v1 (February 2026) Depends on: prd.md (core platform), Portable_Persona_PRD.md, PRD 1 (Composable MiniApp System)

Implementation Status¶

Section	Status	Notes
Internal orchestrator	✅ Shipped	`app/orchestrator/two_stage_handler.py` — classification + generation
FastAPI backend	✅ Shipped	`app/main.py` — production-deployed
Supabase JWT auth	✅ Shipped	`app/identity/supabase_client.py`
Public /v1/chat endpoint	❌ Not Shipped	Internal only — no public API
OpenAI/Anthropic compat endpoints	❌ Not Shipped	No drop-in compatibility layer
MCP Adapter	❌ Not Shipped	No MCP tool exposure
Precompose endpoint	❌ Not Shipped	No prompt-pack export
Proposals system (approve/reject)	❌ Not Shipped	No write proposals
Audit API	❌ Not Shipped	No audit event API
Webhooks	❌ Not Shipped	No webhook system
API key management	❌ Not Shipped	No org-scoped API keys
Developer console	❌ Not Shipped	No self-service portal
SDKs (TypeScript, Python)	❌ Not Shipped	No client libraries
Multi-tenancy (org isolation)	❌ Not Shipped	Single-tenant consumer only

Note: This is entirely future-phase work. The internal orchestrator exists but nothing has been extracted into a public platform.

References¶

This PRD uses standardized terminology, IDs, pricing, and model references defined in the companion documents:

Document	What it Covers
REFERENCE_GLOSSARY_AND_IDS.md	Canonical terms: workflow vs miniapp vs superpower, ID formats
REFERENCE_PRICING.md	Canonical pricing: $7.99/mo + $50/yr, free tier limits
REFERENCE_MODEL_ROUTING.md	Pipeline stage → model tier mapping
REFERENCE_DEPENDENCY_GRAPH.md	PRD blocking relationships and priority order
REFERENCE_FEATURE_FLAGS.md	All feature flags by category
REFERENCE_TELEMETRY.md	Amplitude event catalog and gaps

Executive Summary¶

The Gateway API is how Ikiro becomes a platform. It exposes the same Persona Passport + Memory Vault + Superpower Runtime that powers Sage in iMessage as a public API any application can call. One /v1/chat request; the Gateway auto-hydrates persona, recalls memories, enforces policy, calls the model, and returns the response with citations and proposed actions.

The MCP Adapter extends this into Claude, ChatGPT, and any MCP-compatible client — making custom personas available as tools inside existing AI assistants.

Business case: Consumer revenue proves the model. The Gateway is where enterprise revenue scales — brands, support teams, HR, and dev teams that want consistent persona + memory across their products without building infrastructure.

Current state: Internal orchestrator handles all processing. No public API. The Portable Persona PRD defines target contracts. This PRD specifies the migration from internal to public platform.

1) Product Surfaces¶

1.1 Gateway API (Primary)¶

Single HTTP endpoint wrapping the full orchestration:

POST /v1/chat → [Gateway]
  → Load persona → Recall memories → Apply policy
  → Compose prompt → Call model → Extract proposed writes
  → Return response + citations + proposals

One call replaces: prompt engineering + memory retrieval + persona management + policy enforcement + action extraction + audit logging.

1.2 Compatibility Endpoints (Drop-In)¶

Endpoint	Compatible With	Extra Headers
`/v1/chat/completions`	OpenAI API	`x-persona-id`, `x-subject-id`, `x-memory-scopes`
`/v1/messages`	Anthropic API	`x-persona-id`, `x-subject-id`, `x-memory-scopes`

Migration: change base URL, add 3 headers. Response shapes match vendor formats with added citations and proposedWrites.

1.3 MCP Adapter¶

Primary tool: orchestrate.answer — single call returns answer + citations + proposed writes.

Advanced tools: memory.recall, persona.get, memory.upsert(proposal), memory.forget.

1.4 Precompose (Fallback)¶

POST /v1/packs/precompose returns {system_prompt, context_messages[], ttl} for developers who want to call LLM providers directly. Loses centralized approvals, caching, and audit.

2) API Contracts¶

2.1 POST /v1/chat¶

Request:

{
  "model": "anthropic/claude-sonnet-4-5-20250929",
  "persona_id": "brand-default",
  "subject_id": "usr_123",
  "memory_scopes": ["org_kb", "gmail", "calendar"],
  "messages": [{"role": "user", "content": "What's due this week?"}],
  "stream": true,
  "redact_mode": "default"
}

Response:

{
  "id": "chat_abc",
  "choices": [{"index": 0, "message": {"role": "assistant", "content": "Two items..."}}],
  "citations": [{"factId": "a1", "uri": "gmail://msg_123", "confidence": 0.92}],
  "proposedWrites": [{
    "proposal_id": "prop_789",
    "type": "calendar.create",
    "payload": {"title": "STAT210 Quiz", "when": "2025-11-07T10:00:00-08:00"},
    "provenance": {"from": "gmail:msg_123", "confidence": 0.91}
  }],
  "usage": {"prompt_tokens": 1234, "completion_tokens": 567, "memory_recalls": 3, "cache_hit": false}
}

2.2 Proposals¶

POST /v1/proposals/:id/approve — execute write, persist assertion, log audit. POST /v1/proposals/:id/reject — discard, log rejection.

2.3 Audit¶

GET /v1/audit?subject_id=usr_123&action=write.approved&limit=50 — paginated audit events with provenance.

2.4 Webhooks¶

Events: proposal.created, write.approved, write.rejected, write.failed, ingestion.error, persona.updated, memory.created, memory.deleted. Delivered with HMAC signature.

3) Authentication & Multi-Tenancy¶

API Keys¶

Org API Key: Full access within org scope
Scoped API Key: Limited to specific persona_ids, subject_ids, or scopes
Rotation: Old key valid 24h after rotation (graceful migration)

Capability Tokens (Client-Side)¶

Short-lived tokens (15min) for browser/mobile SDKs. Created via server-side key exchange. Scoped to a single subject_id and persona_id.

Multi-Tenancy¶

Tenant isolation via Row Level Security (Supabase) or per-tenant schema
API key → org_id mapping at gateway edge
Rate limits per org (configurable)
Memory namespaces include org_id prefix: org_{org_id}_user_{subject_id}

4) Developer Experience¶

4.1 Onboarding Flow¶

Sign up at developer console → create org → get API key
Create first persona via Persona Studio (visual) or API
Connect data sources (Gmail/Calendar/Drive) for the org or per-subject
Make first /v1/chat call (quickstart: curl, Node, Python — see Portable Persona PRD Appendix A)
Handle proposals (approve/reject) or configure auto-approve policies

4.2 SDKs¶

Language	Package	Priority
TypeScript/Node	`@ikiro/sdk`	P0 (ships with API)
Python	`ikiro`	P0 (ships with API)
Go	`ikiro-go`	P1

SDK features: typed request/response, streaming support, webhook verification, proposal helpers, retry logic.

4.3 Documentation¶

Quickstart: 5-minute guide from API key to first response
Concepts: Persona Passport, Memory Vault, Policy Overlays, Proposals
API Reference: OpenAPI 3.1 spec, auto-generated from code
Guides: "Build a support bot with memory," "Add persona to your existing ChatGPT app," "MCP integration for Claude"
Changelog: Versioned API changes with migration guides

4.4 Developer Console¶

Web UI for API management: - API key creation and rotation - Usage dashboard (calls, tokens, cost, latency) - Persona Studio (visual persona editor) - Memory Viewer (browse subject memories) - Webhook configuration - Audit log viewer - Policy Manager (org → team → user overlays)

5) Pricing Model¶

5.1 Tiers¶

Tier	Price	Includes	Target
Free	$0/mo	1,000 calls/mo, 1 persona, 100 memories/subject	Hobbyists, testing
Pro	$49/mo	50,000 calls/mo, unlimited personas, 10,000 memories/subject, webhooks	Startups, small teams
Business	$199/mo	500,000 calls/mo, policy overlays, audit API, priority support, SSO	Mid-market
Enterprise	Custom	Unlimited, dedicated infra, SLA, VPC, CMK, HIPAA BAA	Large orgs

5.2 Usage-Based Components¶

LLM tokens: Pass-through cost + 20% markup (developer sees total cost in dashboard)
Memory operations: Included in tier limits; overage at $0.001/operation
Connectors: Gmail/Calendar/Drive included; Slack/Notion/custom at $10/mo each (P1)

6) Migration Path (Internal → Public)¶

Phase 1: Internal Refactor (Weeks 1-3)¶

Extract orchestrator into standalone Gateway service
Add API key authentication layer
Add rate limiting and usage tracking
Ensure all internal iMessage/Telegram paths call Gateway (dogfood)

Phase 2: Beta API (Weeks 4-6)¶

Deploy /v1/chat and compat endpoints
TypeScript + Python SDKs
Developer console (key management, usage dashboard)
Documentation site
10 beta partners (hand-picked)

Phase 3: MCP Adapter (Weeks 7-8)¶

orchestrate.answer MCP tool
Advanced MCP tools (recall, forget)
Testing with Claude Desktop and ChatGPT plugins
MCP installation guide

Phase 4: GA Launch (Weeks 9-12)¶

Public sign-up
Pricing enforcement
Webhook system
Audit API
SOC 2 Type I in progress
Developer marketing (blog posts, example apps, community)

7) Success Metrics¶

Metric	Target	Timeframe
Beta partners activated	10	Month 1
API calls / month	100K	Month 3
Developer NPS	>40	Month 3
Time to first successful call	<10 minutes	Ongoing
p95 latency (/v1/chat)	<4 seconds	Ongoing
Paid tier conversion	>5% of free users	Month 6
MCP adapter installs	500	Month 6

Feature Flags & Gating¶

Flag Key	Default	Purpose
`enable_gateway_api`	`false`	Master switch for public API endpoints
`enable_mcp_adapter`	`false`	MCP tool exposure
`enable_precompose`	`false`	Prompt-pack export endpoint
`enable_proposals`	`false`	Write proposal system (approve/reject)
`enable_webhooks`	`false`	Webhook delivery system
`gateway_rate_limit_free`	`1000`	Monthly call limit for free API tier
`gateway_rate_limit_pro`	`50000`	Monthly call limit for Pro tier
`gateway_rate_limit_business`	`500000`	Monthly call limit for Business tier
`enable_audit_api`	`false`	Audit event query endpoint
`enable_multi_tenancy`	`false`	Org-level isolation and API key scoping

See REFERENCE_FEATURE_FLAGS.md for the full catalog.

Telemetry¶

Event	Trigger	Properties
`gateway_chat_request`	/v1/chat called	`org_id`, `persona_id`, `model`, `stream`, `latency_ms`
`gateway_chat_cached`	Cache hit on memory recall	`org_id`, `cache_key`, `ttl_remaining`
`gateway_proposal_created`	Write proposal generated	`org_id`, `proposal_type`, `confidence`
`gateway_proposal_resolved`	Proposal approved/rejected	`org_id`, `proposal_id`, `decision`, `latency_ms`
`gateway_webhook_sent`	Webhook delivered	`org_id`, `event_type`, `status_code`, `retry_count`
`gateway_auth_failed`	API key authentication fails	`org_id`, `reason` (invalid/expired/rate_limited)
`gateway_rate_limited`	Rate limit enforced	`org_id`, `tier`, `limit`, `current_usage`
`mcp_tool_called`	MCP adapter tool invoked	`tool_name`, `persona_id`, `latency_ms`

Needed but not yet tracked (all — nothing is shipped): All events above are speculative. Implementation should include these from day one.

See REFERENCE_TELEMETRY.md for the full event catalog.

Definition of Done¶

8) Open Questions¶

Should the Gateway support model routing (auto-select cheapest model that meets quality threshold)?
How do we handle persona versioning in the API? (Persona changes mid-conversation)
Should proposals auto-approve based on confidence threshold, or always require explicit approval?
What's the right caching strategy for memory recalls? (Same subject, similar query within 5 min → cache hit)
Should the MCP adapter expose superpowers as individual MCP tools, or only through orchestrate.answer?
Data residency: when do we need EU-hosted clusters? (Trigger: first EU enterprise customer)
Should we offer a self-hosted Gateway for high-security customers? (VPC-deployed Docker image)