PRD — Gateway API & Developer Platform¶
Doc owner: Justin Audience: Eng, Design, Product, GTM, Partnerships Status: v1 (February 2026) Depends on: prd.md (core platform), Portable_Persona_PRD.md, PRD 1 (Composable MiniApp System)
Implementation Status¶
| Section | Status | Notes |
|---|---|---|
| Internal orchestrator | ✅ Shipped | app/orchestrator/two_stage_handler.py — classification + generation |
| FastAPI backend | ✅ Shipped | app/main.py — production-deployed |
| Supabase JWT auth | ✅ Shipped | app/identity/supabase_client.py |
| Public /v1/chat endpoint | ❌ Not Shipped | Internal only — no public API |
| OpenAI/Anthropic compat endpoints | ❌ Not Shipped | No drop-in compatibility layer |
| MCP Adapter | ❌ Not Shipped | No MCP tool exposure |
| Precompose endpoint | ❌ Not Shipped | No prompt-pack export |
| Proposals system (approve/reject) | ❌ Not Shipped | No write proposals |
| Audit API | ❌ Not Shipped | No audit event API |
| Webhooks | ❌ Not Shipped | No webhook system |
| API key management | ❌ Not Shipped | No org-scoped API keys |
| Developer console | ❌ Not Shipped | No self-service portal |
| SDKs (TypeScript, Python) | ❌ Not Shipped | No client libraries |
| Multi-tenancy (org isolation) | ❌ Not Shipped | Single-tenant consumer only |
Note: This is entirely future-phase work. The internal orchestrator exists but nothing has been extracted into a public platform.
References¶
This PRD uses standardized terminology, IDs, pricing, and model references defined in the companion documents:
| Document | What it Covers |
|---|---|
| REFERENCE_GLOSSARY_AND_IDS.md | Canonical terms: workflow vs miniapp vs superpower, ID formats |
| REFERENCE_PRICING.md | Canonical pricing: $7.99/mo + $50/yr, free tier limits |
| REFERENCE_MODEL_ROUTING.md | Pipeline stage → model tier mapping |
| REFERENCE_DEPENDENCY_GRAPH.md | PRD blocking relationships and priority order |
| REFERENCE_FEATURE_FLAGS.md | All feature flags by category |
| REFERENCE_TELEMETRY.md | Amplitude event catalog and gaps |
Executive Summary¶
The Gateway API is how Ikiro becomes a platform. It exposes the same Persona Passport + Memory Vault + Superpower Runtime that powers Sage in iMessage as a public API any application can call. One /v1/chat request; the Gateway auto-hydrates persona, recalls memories, enforces policy, calls the model, and returns the response with citations and proposed actions.
The MCP Adapter extends this into Claude, ChatGPT, and any MCP-compatible client — making custom personas available as tools inside existing AI assistants.
Business case: Consumer revenue proves the model. The Gateway is where enterprise revenue scales — brands, support teams, HR, and dev teams that want consistent persona + memory across their products without building infrastructure.
Current state: Internal orchestrator handles all processing. No public API. The Portable Persona PRD defines target contracts. This PRD specifies the migration from internal to public platform.
1) Product Surfaces¶
1.1 Gateway API (Primary)¶
Single HTTP endpoint wrapping the full orchestration:
POST /v1/chat → [Gateway]
→ Load persona → Recall memories → Apply policy
→ Compose prompt → Call model → Extract proposed writes
→ Return response + citations + proposals
One call replaces: prompt engineering + memory retrieval + persona management + policy enforcement + action extraction + audit logging.
1.2 Compatibility Endpoints (Drop-In)¶
| Endpoint | Compatible With | Extra Headers |
|---|---|---|
/v1/chat/completions |
OpenAI API | x-persona-id, x-subject-id, x-memory-scopes |
/v1/messages |
Anthropic API | x-persona-id, x-subject-id, x-memory-scopes |
Migration: change base URL, add 3 headers. Response shapes match vendor formats with added citations and proposedWrites.
1.3 MCP Adapter¶
Primary tool: orchestrate.answer — single call returns answer + citations + proposed writes.
Advanced tools: memory.recall, persona.get, memory.upsert(proposal), memory.forget.
1.4 Precompose (Fallback)¶
POST /v1/packs/precompose returns {system_prompt, context_messages[], ttl} for developers who want to call LLM providers directly. Loses centralized approvals, caching, and audit.
2) API Contracts¶
2.1 POST /v1/chat¶
Request:
{
"model": "anthropic/claude-sonnet-4-5-20250929",
"persona_id": "brand-default",
"subject_id": "usr_123",
"memory_scopes": ["org_kb", "gmail", "calendar"],
"messages": [{"role": "user", "content": "What's due this week?"}],
"stream": true,
"redact_mode": "default"
}
Response:
{
"id": "chat_abc",
"choices": [{"index": 0, "message": {"role": "assistant", "content": "Two items..."}}],
"citations": [{"factId": "a1", "uri": "gmail://msg_123", "confidence": 0.92}],
"proposedWrites": [{
"proposal_id": "prop_789",
"type": "calendar.create",
"payload": {"title": "STAT210 Quiz", "when": "2025-11-07T10:00:00-08:00"},
"provenance": {"from": "gmail:msg_123", "confidence": 0.91}
}],
"usage": {"prompt_tokens": 1234, "completion_tokens": 567, "memory_recalls": 3, "cache_hit": false}
}
2.2 Proposals¶
POST /v1/proposals/:id/approve — execute write, persist assertion, log audit.
POST /v1/proposals/:id/reject — discard, log rejection.
2.3 Audit¶
GET /v1/audit?subject_id=usr_123&action=write.approved&limit=50 — paginated audit events with provenance.
2.4 Webhooks¶
Events: proposal.created, write.approved, write.rejected, write.failed, ingestion.error, persona.updated, memory.created, memory.deleted. Delivered with HMAC signature.
3) Authentication & Multi-Tenancy¶
API Keys¶
- Org API Key: Full access within org scope
- Scoped API Key: Limited to specific persona_ids, subject_ids, or scopes
- Rotation: Old key valid 24h after rotation (graceful migration)
Capability Tokens (Client-Side)¶
Short-lived tokens (15min) for browser/mobile SDKs. Created via server-side key exchange. Scoped to a single subject_id and persona_id.
Multi-Tenancy¶
- Tenant isolation via Row Level Security (Supabase) or per-tenant schema
- API key → org_id mapping at gateway edge
- Rate limits per org (configurable)
- Memory namespaces include org_id prefix:
org_{org_id}_user_{subject_id}
4) Developer Experience¶
4.1 Onboarding Flow¶
- Sign up at developer console → create org → get API key
- Create first persona via Persona Studio (visual) or API
- Connect data sources (Gmail/Calendar/Drive) for the org or per-subject
- Make first
/v1/chatcall (quickstart: curl, Node, Python — see Portable Persona PRD Appendix A) - Handle proposals (approve/reject) or configure auto-approve policies
4.2 SDKs¶
| Language | Package | Priority |
|---|---|---|
| TypeScript/Node | @ikiro/sdk |
P0 (ships with API) |
| Python | ikiro |
P0 (ships with API) |
| Go | ikiro-go |
P1 |
SDK features: typed request/response, streaming support, webhook verification, proposal helpers, retry logic.
4.3 Documentation¶
- Quickstart: 5-minute guide from API key to first response
- Concepts: Persona Passport, Memory Vault, Policy Overlays, Proposals
- API Reference: OpenAPI 3.1 spec, auto-generated from code
- Guides: "Build a support bot with memory," "Add persona to your existing ChatGPT app," "MCP integration for Claude"
- Changelog: Versioned API changes with migration guides
4.4 Developer Console¶
Web UI for API management: - API key creation and rotation - Usage dashboard (calls, tokens, cost, latency) - Persona Studio (visual persona editor) - Memory Viewer (browse subject memories) - Webhook configuration - Audit log viewer - Policy Manager (org → team → user overlays)
5) Pricing Model¶
5.1 Tiers¶
| Tier | Price | Includes | Target |
|---|---|---|---|
| Free | $0/mo | 1,000 calls/mo, 1 persona, 100 memories/subject | Hobbyists, testing |
| Pro | $49/mo | 50,000 calls/mo, unlimited personas, 10,000 memories/subject, webhooks | Startups, small teams |
| Business | $199/mo | 500,000 calls/mo, policy overlays, audit API, priority support, SSO | Mid-market |
| Enterprise | Custom | Unlimited, dedicated infra, SLA, VPC, CMK, HIPAA BAA | Large orgs |
5.2 Usage-Based Components¶
- LLM tokens: Pass-through cost + 20% markup (developer sees total cost in dashboard)
- Memory operations: Included in tier limits; overage at $0.001/operation
- Connectors: Gmail/Calendar/Drive included; Slack/Notion/custom at $10/mo each (P1)
6) Migration Path (Internal → Public)¶
Phase 1: Internal Refactor (Weeks 1-3)¶
- Extract orchestrator into standalone Gateway service
- Add API key authentication layer
- Add rate limiting and usage tracking
- Ensure all internal iMessage/Telegram paths call Gateway (dogfood)
Phase 2: Beta API (Weeks 4-6)¶
- Deploy
/v1/chatand compat endpoints - TypeScript + Python SDKs
- Developer console (key management, usage dashboard)
- Documentation site
- 10 beta partners (hand-picked)
Phase 3: MCP Adapter (Weeks 7-8)¶
orchestrate.answerMCP tool- Advanced MCP tools (recall, forget)
- Testing with Claude Desktop and ChatGPT plugins
- MCP installation guide
Phase 4: GA Launch (Weeks 9-12)¶
- Public sign-up
- Pricing enforcement
- Webhook system
- Audit API
- SOC 2 Type I in progress
- Developer marketing (blog posts, example apps, community)
7) Success Metrics¶
| Metric | Target | Timeframe |
|---|---|---|
| Beta partners activated | 10 | Month 1 |
| API calls / month | 100K | Month 3 |
| Developer NPS | >40 | Month 3 |
| Time to first successful call | <10 minutes | Ongoing |
| p95 latency (/v1/chat) | <4 seconds | Ongoing |
| Paid tier conversion | >5% of free users | Month 6 |
| MCP adapter installs | 500 | Month 6 |
Feature Flags & Gating¶
| Flag Key | Default | Purpose |
|---|---|---|
enable_gateway_api |
false |
Master switch for public API endpoints |
enable_mcp_adapter |
false |
MCP tool exposure |
enable_precompose |
false |
Prompt-pack export endpoint |
enable_proposals |
false |
Write proposal system (approve/reject) |
enable_webhooks |
false |
Webhook delivery system |
gateway_rate_limit_free |
1000 |
Monthly call limit for free API tier |
gateway_rate_limit_pro |
50000 |
Monthly call limit for Pro tier |
gateway_rate_limit_business |
500000 |
Monthly call limit for Business tier |
enable_audit_api |
false |
Audit event query endpoint |
enable_multi_tenancy |
false |
Org-level isolation and API key scoping |
See REFERENCE_FEATURE_FLAGS.md for the full catalog.
Telemetry¶
| Event | Trigger | Properties |
|---|---|---|
gateway_chat_request |
/v1/chat called | org_id, persona_id, model, stream, latency_ms |
gateway_chat_cached |
Cache hit on memory recall | org_id, cache_key, ttl_remaining |
gateway_proposal_created |
Write proposal generated | org_id, proposal_type, confidence |
gateway_proposal_resolved |
Proposal approved/rejected | org_id, proposal_id, decision, latency_ms |
gateway_webhook_sent |
Webhook delivered | org_id, event_type, status_code, retry_count |
gateway_auth_failed |
API key authentication fails | org_id, reason (invalid/expired/rate_limited) |
gateway_rate_limited |
Rate limit enforced | org_id, tier, limit, current_usage |
mcp_tool_called |
MCP adapter tool invoked | tool_name, persona_id, latency_ms |
Needed but not yet tracked (all — nothing is shipped): All events above are speculative. Implementation should include these from day one.
See REFERENCE_TELEMETRY.md for the full event catalog.
Definition of Done¶
- /v1/chat endpoint returns responses with citations and proposed writes
- OpenAI/Anthropic compatibility endpoints pass conformance tests
- MCP adapter exposes orchestrate.answer tool and passes Claude Desktop testing
- API key management: create, rotate (24h grace), scope to persona/subject
- Multi-tenancy: org isolation via RLS, no data leakage between orgs
- Rate limiting enforced per org per tier
- Webhook delivery with HMAC signatures and retry logic
- Audit API returns paginated events with provenance
- SDKs (TypeScript + Python) with typed requests, streaming, and retry
- p95 latency: <4 seconds for /v1/chat
- Developer console: key management, usage dashboard, persona studio access
- All endpoints gated by feature flags
8) Open Questions¶
- Should the Gateway support model routing (auto-select cheapest model that meets quality threshold)?
- How do we handle persona versioning in the API? (Persona changes mid-conversation)
- Should proposals auto-approve based on confidence threshold, or always require explicit approval?
- What's the right caching strategy for memory recalls? (Same subject, similar query within 5 min → cache hit)
- Should the MCP adapter expose superpowers as individual MCP tools, or only through
orchestrate.answer? - Data residency: when do we need EU-hosted clusters? (Trigger: first EU enterprise customer)
- Should we offer a self-hosted Gateway for high-security customers? (VPC-deployed Docker image)