PRD — Composable MiniApp System (Superpowers Engine)¶

Doc owner: Justin Audience: Eng, Design, Product Status: v3 (February 2026 — All 13 analysis recommendations implemented) Depends on: prd.md (core platform), COMPOSABLE_MINIAPP_SYSTEM.md (implementation reference)

Implementation Status¶

Section	Status	Notes
Config-driven JSON superpowers	✅ Shipped	`app/miniapps/generator/app_generator.py` — 15 operation types, 45 filters
Multi-turn creation flow	✅ Shipped	`app/miniapps/generator/creation_handler.py` — confirm/refine/cancel
First-party miniapps (Trip Planner)	✅ Shipped	`app/miniapps/handlers/trip_planner.py`
Bill Split miniapp	✅ Shipped	`app/miniapps/handlers/bill_split.py`
Block-based UI system	✅ Shipped	Header, navigation, content blocks, stats grids, card lists
Capability hints	✅ Shipped	7-day cooldown, contextual discovery
Dry-run testing	⚠️ Partial	Mock responses implemented, needs UX polish
completion_criteria & escalation	⚠️ Specified	Schema defined, runtime evaluation pending
Version pinning	❌ Not Shipped	No superpower versioning system
Idempotency keys	❌ Not Shipped	No replay protection on operations
Fork/remix from marketplace	❌ Not Shipped	Depends on PRD 2

References¶

This PRD uses standardized terminology, IDs, pricing, and model references defined in the companion documents:

Document	What it Covers
REFERENCE_GLOSSARY_AND_IDS.md	Canonical terms: workflow vs miniapp vs superpower, ID formats
REFERENCE_PRICING.md	Canonical pricing: $7.99/mo + $50/yr, free tier limits
REFERENCE_MODEL_ROUTING.md	Pipeline stage → model tier mapping
REFERENCE_DEPENDENCY_GRAPH.md	PRD blocking relationships and priority order
REFERENCE_FEATURE_FLAGS.md	All feature flags by category
REFERENCE_TELEMETRY.md	Amplitude event catalog and gaps

Executive Summary¶

The Composable MiniApp System is the runtime engine that powers all "superpowers" inside Ikiro. It lets anyone — the team, creators, or Sage herself — define sophisticated mini-applications purely through JSON configuration. No code required.

Users create superpowers by chatting with Sage: "build me an app that splits restaurant bills from a photo." Sage translates that into a structured JSON definition, which the GenericConfigHandler interprets at runtime. The result: a receipt scanner → math engine → Venmo link generator → group message sender — all from a conversation.

This PRD covers the technical platform only. For how superpowers are discovered and distributed, see PRD 2 (Marketplace). For how users earn, equip, and level up superpowers, see PRD 3 (Progression & Loadout).

Terminology¶

"Superpower" is the user-facing term for a custom mini-application. Technically, superpowers are implemented as MiniApp records in the database and interpreted by the GenericConfigHandler. The codebase uses "MiniApp" for technical consistency across first-party Python apps and user-created JSON configs. PRDs and user-facing features use "superpower" for marketing clarity.

1) What Exists Today¶

Platform Capabilities (All Shipped)¶

15 Operation Types:

Category	Operations	What They Do
State	`state_update`, `emit_event`, `emit_events`	Session data, analytics events (single/batch)
AI/LLM	`llm_extract`, `vision_analysis`	GPT-5 structured extraction, image analysis
Integrations	`gmail_search`, `calendar_query`, `web_search`	OAuth-scoped data access
Flow Control	`conditional`, `loop`	If/else branching, iteration over collections
UGC Composable	`calculate`, `conversation_flow`, `build_url`, `send_message`, `schedule_task`	Math, multi-turn dialogs, deep links, cross-chat DMs, deferred actions

45 Template Filters across collections, strings, numbers, dates, logic, types, and UGC-specific transforms.

State Machines with auto-transitions for multi-phase apps (e.g., scanning → splitting → settling).

Security Controls: AST-based math (no eval), domain allowlists for URLs, recipient validation for messages, rate limits, HMAC webhook verification, payload size caps, action/operation limits.

Infrastructure: Webhook ingestion, app analytics, versioning with rollback, starter templates, dry-run testing sandbox.

Test Coverage: 437 unit tests across 31 test files.

Three-Tier Hierarchy¶

Tier	Description	Examples
First-Party	Python handlers with full framework access	Trip Planner, Calendar Stress, Gmail Mind Reader
UGC Superpowers	JSON definitions interpreted by GenericConfigHandler	User-created apps via Sage conversations
Marketplace Apps	Published UGC superpowers with admin review	Community-shared apps

2) What's Next: Remaining Phases¶

Phase 9: Inter-App Communication (P2 — High Impact, High Effort)¶

Problem: Apps are isolated silos. A Bill Split app can't tell a Budget Tracker "hey, this person just paid $47 for dinner."

Solution: Event bus for typed inter-app events with explicit permission model.

Example: Bill Split emits bill_settled → Budget Tracker records expense automatically

Full specification: See PRD_1B_INTER_APP_EVENTS.md for: - Event schema registry - Permission model (manifest declarations) - Subscription matching and routing - Security boundaries - Testing and debugging

Phase 10: Richer UI Components (P1 — High Impact, Medium Effort)¶

Current State: Block-based UI system ships with header blocks, navigation, content blocks, stats grids, and card lists. First-party miniapps use structured view-based components.

What's Missing: Advanced interactive components for complex data visualization and input collection.

What to build: - Input components: Date pickers, number steppers, toggles, radio buttons (renderable in iMessage via rich link micro-pages) - Data visualization: Bar/pie/line charts for budget breakdowns, mood trends, habit streaks - Image gallery: For photo-based apps (receipt scanning results, trip mood boards) - Map embed: Location pins for trip planners, restaurant finders, venue suggestions - Approval dialogs: Confirm/cancel patterns for destructive or financial actions - Progress indicators: Multi-step flows show where you are (Step 2 of 4)

Rendering strategy: - iMessage: Generate web preview cards with interactive elements via hosted micro-pages (user taps link → mobile-optimized mini-UI → result flows back to chat) - Web portal: Native React components - Fallback: Always degrade to current text/block-based UI for pure chat contexts

UI block schema extension:

{
  "type": "chart",
  "chart_type": "bar|pie|line",
  "data": "{{$state.spending_by_category | json}}",
  "title": "This Week's Spending"
}

Phase 13: Cron Triggers (P3 — Medium Impact, Medium Effort)¶

Problem: Apps can only react to user messages or webhooks. No way to run something every morning at 7am or every Monday at 9am.

Solution: Reuse existing SchedulerService (from PRD 10 Proactive Intelligence) to support superpower cron schedules.

Implementation: Superpowers use the same APScheduler infrastructure as workflows with shared: - Redis distributed locking (multi-worker safe) - Circuit breakers (failure tracking) - Budget enforcement (cost control) - Timezone awareness

Schema:

{
  "triggers": [
    {
      "type": "cron",
      "schedule": "0 7 * * 1-5",
      "action_id": "weekday_morning_brief"
    },
    {
      "type": "system_event",
      "event": "relationship_stage_change",
      "action_id": "celebrate_milestone"
    }
  ]
}

Resource Limits: - Max 3 cron triggers per superpower (cost control) - Max 5 executions per day per superpower (prevents runaway costs) - Respects user quiet hours (no triggers during sleep) - Disabled by default (user must opt-in per superpower)

Data Model:

class SuperpowerCronSchedule(Base):
    superpower_id = Column(String, ForeignKey("mini_apps.id"))
    user_id = Column(UUID, ForeignKey("users.id"))
    action_id = Column(String)  # Which action to trigger
    cron_expression = Column(String)
    enabled = Column(Boolean, default=False)  # Opt-in required
    executions_today = Column(Integer, default=0)
    last_execution = Column(DateTime)

Interaction with Edge Agent: Cron triggers generate schedule_message commands sent to the Mac mini edge agent, which handles local execution and delivery.

3) Vibe-Coding: How Users Create Superpowers¶

The creation flow is conversational. Users never see or write JSON.

Flow: 1. User tells Sage what they want: "I want something that checks my email every morning and tells me if anything is urgent" 2. Sage asks clarifying questions: "want me to check just unread, or everything from the last 24h? should I flag things from your boss differently?" 3. Sage generates the JSON definition behind the scenes 4. Sage offers a test run: "try it now — say 'test morning email scanner' and I'll run it in sandbox mode" 5. User approves → app is saved to their superpower loadout 6. Optionally: user submits to marketplace for others to use

What Sage generates (keyword triggers only — cron is Phase 13):

href="#__codelineno-3-1">{ "name": "Morning Email Scanner", "description": "Scans your inbox for urgent emails on demand", "completion_criteria": "User has received a categorized list of urgent vs. non-urgent emails from the last 24h with at least one actionable suggestion", "escalation": { "on_oauth_failure": "Tell user their email connection expired and offer to reconnect", "on_empty_result": "Confirm inbox was checked successfully", "on_llm_extraction_failure": "Fall back to showing raw email subjects with senders" }, "triggers": [ { "type": "keyword", "phrases": ["check my email", "anything urgent", "scan inbox"], "action_id": "scan_inbox" } ], "actions": [ { "id": "scan_inbox", "oauth_required": ["gmail"], "operations": [ { "type": "gmail_search", "query": "is:unread newer_than:24h", "output": "emails" }, { "type": "llm_extract", "input": "{{$op.emails}}", "output_schema": { "type": "object", "properties": { "urgent": {"type": "array"}, "can_wait": {"type": "array"} } }, "output": "sorted" }, { "type": "conditional", "condition": "{{$op.sorted.urgent | length}} > 0", "then": [ { "type": "state_update", "data": {"has_urgent": true} } ], "else": [ { "type": "state_update", "data": {"has_urgent": false} } ] } ], "response": { "template": "{{$state.has_urgent | ternary:'🚨 You have {{$op.sorted.urgent | length}} urgent emails':'✨ Inbox is chill — nothing urgent in the last 24h'}}" } } ] }

Note: Cron triggers ("type": "cron") are shown in examples above but won't be generated by Sage until Phase 13 ships.

Definition of Done & Escalation Rules (v2 — Schema Defined, Runtime Pending)¶

Every superpower must define two mandatory fields that govern quality and failure handling. This is inspired by agent role cards — the insight being that LLMs are smart enough to execute tasks, but they need explicit rules for when things go wrong and what "success" actually looks like.

{
  "name": "Morning Email Scanner",
  "completion_criteria": "User has received a categorized list of urgent vs. non-urgent emails from the last 24h with at least one actionable suggestion",
  "escalation": {
    "on_oauth_failure": "Tell user their email connection expired and offer to reconnect — don't retry silently or guess",
    "on_empty_result": "Confirm inbox was checked successfully, don't say 'no emails found' without verifying access worked",
    "on_ambiguous_input": "Ask one clarifying question about time range or sender — don't assume",
    "on_llm_extraction_failure": "Fall back to showing raw email subjects with senders instead of failing silently"
  },
  "actions": [...]
}

completion_criteria (required string): A plain-language statement of what constitutes a successful execution. The GenericConfigHandler evaluates this at the end of each action chain. If the response doesn't meet the criteria (e.g., the LLM extraction returned empty when emails existed), the handler retries once with a modified prompt before returning a degraded response. This prevents the common failure mode where a superpower runs to completion but produces garbage output.

escalation (required object): A map of failure scenarios to prescribed behaviors. Every key is a failure class, every value is what Sage should say or do. The handler checks escalation rules before generating error responses.

Standard escalation keys every superpower should address:

Key	When It Fires	Bad Default (without escalation)
`on_oauth_failure`	Token expired, revoked, or scope insufficient	Silent failure or cryptic error
`on_empty_result`	Query returned no data	"Nothing found" when the real issue is access
`on_ambiguous_input`	User's request could mean multiple things	Guessing wrong and acting on it
`on_rate_limit`	External API rate limited	Retry loop or timeout
`on_partial_failure`	Some operations succeeded, some failed	Showing partial results without context
`on_llm_extraction_failure`	LLM couldn't parse structured data	Empty response or hallucinated structure

Sage-generated superpowers auto-populate escalation rules based on the operations used. If an app uses gmail_search, Sage includes on_oauth_failure and on_empty_result automatically. Creators can override.

Guardrails on AI-generated apps: - All generated JSON is validated against the app schema before saving — including completion_criteria and at least 2 escalation rules - OAuth scopes must match what the user has already granted (or triggers consent flow) - Domain allowlist and security controls apply identically to AI-generated and hand-authored apps - Dry-run sandbox is mandatory before first save

Implementation Status: Schema defined in PRD v2. Runtime evaluation pending (see todo: build completion_criteria evaluation in GenericConfigHandler).

4) Platform Limits & Governance¶

Resource Limits (Per App)¶

Resource	Limit	Rationale
Actions	30 max	Prevents unbounded complexity
Operations per action	20 max	Keeps execution predictable
UI blocks per response	10 max	Prevents message spam
Scheduled tasks per session	10 pending max	Prevents queue abuse
Cron triggers per app	3 max	Cost control (Phase 13)
Webhook endpoints per app	5 max	Rate limit surface area
State size per session	64KB max	Memory pressure
Template nesting depth	5 levels max	Prevents infinite recursion

Operation-Specific Rate Limits¶

Operation	Rate Limit	Enforcement Location
`send_message`	5 messages per action execution	`SendMessageExecutor`
`schedule_task`	10 pending tasks per session, 168-hour max delay	`ScheduleTaskExecutor`
Cron triggers	5 executions per day per superpower	`SchedulerService` (Phase 13)

Security Model¶

All UGC apps run inside the GenericConfigHandler sandbox. There is no escape to arbitrary code execution. The security boundaries are:

No eval/exec — math uses AST-based SafeMathParser
No network access except through declared operations (gmail_search, calendar_query, web_search, build_url)
No filesystem access — apps operate purely on session state
OAuth scopes are declared and enforced — an app can't access Gmail without gmail scope in its manifest and the user's granted consent
All outbound URLs are domain-allowlisted — only approved payment/map/calendar domains (see Domain Allowlist below)
All cross-chat messages are recipient-validated — only session participants
Webhook payloads are HMAC-verified and size-capped (64KB)

Domain Allowlist (for build_url operation)¶

The build_url operation restricts deep links to the following domains for security (prevents phishing via UGC superpowers):

Payment platforms: - venmo.com - cash.app - paypal.com

Navigation/Maps: - maps.google.com - calendar.google.com - maps.apple.com

Messaging: - wa.me (WhatsApp)

To request additions: Submit via internal security review process. New domains must be: 1. HTTPS-only 2. From reputable providers (not user-controlled) 3. No user-generated content in path/params that could enable XSS 4. Business justification (which superpowers need it?)

App Review (for Marketplace)¶

See PRD 2 (Marketplace) for the full review workflow. Key points: - Admin review required before public listing - Automated schema validation + security lint - Version-level review (each update goes through review) - Suspension mechanism for post-publish issues

5) Install Semantics¶

1:1 Conversations¶

Superpowers are per-user installs (UserCustomMiniAppInstall.user_id)
Only the user's installed superpowers can be triggered
Subscription required for access (see PRD 12)

Group Conversations¶

Superpowers are room-scoped (room owner's installs apply to all members)
Any member can trigger the room owner's installed superpowers
Requires room owner subscription (enforced via subscription_access_service.py)
Free users in a paid owner's group get full superpower access
Viral mechanic: Free users see superpowers in action → want to subscribe for their own groups

First-Party MiniApps (Polls, Checklists, Trip Planner)¶

Enabled as room capabilities (not individual installs)
Gated by room owner subscription
Always available to all room members

6) Dry-Run Testing Sandbox¶

Before saving a superpower, users can test it in sandbox mode with mock responses.

User Flow: 1. User creates superpower via chat with Sage 2. Sage says: "want to test it? say 'test morning email scanner'" 3. User triggers test run 4. Sage executes with dry_run=True, returns mock data 5. Sage: "that was a test run — no real actions happened. say 'save it' to make it real"

Implementation:

# In GenericConfigHandler
async def _execute_operation(self, op, context, dry_run=False):
    if dry_run:
        if op.type == "send_message":
            return {"sent": False, "preview": op.message}
        elif op.type == "gmail_search":
            return {"emails": [{"subject": "Sample Email", "from": "boss@company.com"}]}
        elif op.type == "vision_analysis":
            return {"detected": "receipt", "merchant": "Sample Restaurant", "total": 47.52}
        # ... mock responses for all 15 operation types
    else:
        # Real execution via operation_registry

Scope: - Works for all 15 operations with predefined mock responses - No external API calls made - No messages sent, no state persisted - User sees realistic preview without side effects

Current Status: Mock response framework exists. Needs UX polish (explicit "test mode" indicator in responses).

7) Operation Definition of Done¶

Each operation type has explicit acceptance criteria that must be validated before shipping:

`calculate`¶

✅ Never uses eval — AST-based SafeMathParser only
✅ Supports distribute mode (splits total evenly so shares sum exactly)
✅ Handles division-by-zero gracefully (returns error, not crash)
✅ Supports functions: round(), ceil(), floor(), min(), max(), sum(), len(), abs()
✅ Returns numbers rounded to specified decimal places

`send_message`¶

✅ Rate limited to 5 messages per action execution
✅ Recipient validation (only session participants)
✅ No message sent without explicit user consent in superpower flow
✅ Respects user's notification preferences (quiet hours)

`gmail_search`¶

✅ OAuth scope gmail required and enforced
✅ Token expiry handled gracefully (escalation rule)
✅ Query results returned in structured format
✅ Empty results distinguished from access failures

`vision_analysis`¶

✅ GPT-5 Vision model used
✅ Image size limits enforced (max 20MB)
✅ Returns structured JSON (not free-form text)
✅ Graceful degradation on unrecognizable images

`conditional`¶

✅ Condition expression evaluated via template engine
✅ Both then and else branches supported
✅ Nested conditionals supported up to 5 levels
✅ No execution if condition evaluation fails

`loop`¶

✅ Iterates over collections in state or operation results
✅ Max 100 iterations per loop (prevents infinite loops)
✅ Supports break on condition
✅ Loop variable accessible in nested operations

`build_url`¶

✅ Domain allowlist enforced (no arbitrary URLs)
✅ Params are URL-encoded by default
✅ Malformed URLs return error (don't generate invalid links)
✅ HTTPS-only (no HTTP allowed)

`schedule_task`¶

✅ Max 10 pending tasks per session
✅ Max delay: 168 hours (7 days)
✅ Task context serialized and deserialized correctly
✅ Failed tasks retry up to max_retries times

All Operations (Universal Requirements)¶

✅ Template interpolation works on all input fields
✅ Output stored in $op.{output_name} for chaining
✅ Errors logged with operation_id and context
✅ Dry-run mode supported with realistic mock data

8) Forward-Compatibility & Schema Versioning¶

Schema Versioning (Ship in v1.1)¶

All superpower definitions include a schema_version field:

{
  "schema_version": "1.0",
  "metadata": {...}
}

Versioning Rules: - Breaking changes increment minor version (1.0 → 1.1) - Platform supports N-1 versions (current + previous) - Additive changes (new operations, new filters) don't break existing apps

Deprecation Timeline (16 Weeks)¶

When deprecating an operation, filter, or schema field:

Week 0: Announce deprecation in changelog and dev documentation
Week 4: Add warning logs when deprecated feature is used
Week 8: Show warning to app creator in marketplace dashboard
Week 12: Disable by default (feature flag to re-enable)
Week 16: Full removal from platform

Auto-Migration for Simple Changes¶

For renames and simple structure changes, provide auto-migration:

def migrate_v1_to_v1_1(definition):
    """Auto-migrate v1.0 definitions to v1.1"""
    if definition.get('schema_version') == '1.0':
        for action in definition.get('actions', []):
            for op in action.get('operations', []):
                # Example: Rename operation type
                if op['type'] == 'gmail_query':  # Old name
                    op['type'] = 'gmail_search'  # New name
        definition['schema_version'] = '1.1'
    return definition

Graceful Degradation¶

If an operation is removed entirely: - Replace with state_update that logs deprecation error - User sees: "This app uses a deprecated feature '[operation_name]'. Update needed." - Creator gets email: "Your app '[App Name]' needs updating" - App remains installed but shows error on invocation

Migration Tools¶

Command: /superpower upgrade [name] — auto-migrates app to latest schema version
Creator Dashboard: Shows apps needing updates with one-click upgrade button
Email Notifications: Sent at weeks 8, 12, and 15 for apps using deprecated features

9) Architecture¶

Execution Flow¶

User Message / Webhook POST / Cron Tick
  → SmartMessageRouter (intent classification) OR HMAC verification OR scheduler
  → MiniApp Handler Selection (first-party or GenericConfigHandler)
  → GenericConfigHandler
    1. Check conversation flow resumption (multi-turn)
    2. Filter actions by state machine (if defined)
    3. Match action via triggers (intent/phrase/keyword/state/routing_action/cron)
    4. Verify OAuth scopes
    5. Execute operations sequentially
       - Template interpolation on every field
       - Operation chaining via $op.{output}
       - Conditional branching and loops
       - Pause support for conversation_flow
    6. Fire inter-app events (if any emitted — Phase 9)
    7. Evaluate completion_criteria (if runtime evaluation ships)
    8. Check escalation rules on errors
    9. Evaluate auto-transitions (state machine)
    10. Interpolate response template
    11. Render UI blocks
  → MiniAppResponse (state + UI + messages)

Data Model¶

Terminology Note: Database uses MiniApp table names. User-facing features call them "superpowers."

Core tables (existing): - mini_apps — app definitions (JSON), metadata, creator - user_custom_miniapp_installs — user ↔ app relationship (per-user installs) - miniapp_sessions — per-user runtime state - miniapp_analytics_events — analytics event store - miniapp_versions — version history with review status - miniapp_templates — starter templates for common use cases - miniapp_webhooks — webhook configurations

New tables (for remaining phases): - superpower_event_subscriptions — inter-app event wiring (Phase 9) - superpower_cron_schedules — cron job definitions and next-fire timestamps (Phase 13)

10) Success Metrics¶

Metric	Target	Why
Apps created via vibe-coding	50+ in first month	Validates creation UX
App completion rate (start → save)	>60%	Sage is generating usable apps
Dry-run pass rate	>80% on first attempt	Schema validation is helpful not blocking
Sandbox test → save conversion	>70%	Users trust what they test
Operations per app (median)	4-8	Sweet spot of useful but not overwhelming
p95 execution time	<4s	Responsive enough for chat context
Security incidents from UGC apps	0	Sandbox is working

Feature Flags & Gating¶

Flag Key	Default	Purpose
`enable_superpower_system`	`true`	Master switch for superpower runtime
`max_superpower_slots_free`	`1`	Slot limit for free-tier users
`max_superpower_slots_paid`	`3`	Slot limit for Superpowers+ subscribers (grows to 6 at L10 per PRD 3)
`enable_capability_hints`	`true`	Contextual superpower discovery prompts
`superpower_creation_enabled`	`true`	Allow users to create new superpowers via chat
`enable_bill_split`	`true`	Bill Split miniapp availability
`enable_trip_planner`	`true`	Trip Planner miniapp availability

See REFERENCE_FEATURE_FLAGS.md for the full catalog.

Telemetry¶

Event	Trigger	Properties
`superpower_created`	User finishes creation flow	`superpower_id`, `operation_count`, `creation_method`
`superpower_installed`	User equips a superpower	`superpower_id`, `slot_index`, `source` (marketplace/created)
`superpower_triggered`	User invokes a superpower	`superpower_id`, `trigger_type`, `latency_ms`
`superpower_error`	Runtime error during execution	`superpower_id`, `error_type`, `operation_id`
`capability_hint_shown`	Hint displayed to user	`hint_type`, `superpower_id`
`capability_hint_acted`	User acts on a hint	`hint_type`, `action` (installed/dismissed)

Needed but not yet tracked: - superpower_tier_up — when a superpower advances from Bronze → Silver → Gold - superpower_uninstalled — when a user removes a superpower - superpower_creation_abandoned — when creation flow is started but not completed - superpower_dry_run_executed — when user tests an app in sandbox mode

See REFERENCE_TELEMETRY.md for the full event catalog.

Definition of Done¶

Open Questions¶

Should first-party miniapps (Python handlers) migrate to JSON definitions over time for consistency, or keep the hybrid model?
What's the right UX for "this app needs Gmail access but you haven't connected Gmail yet" — inline OAuth flow or redirect to settings?
Should inter-app events (Phase 9) be synchronous (blocking) or async (fire-and-forget)?
Template filter library: should we open-source it and accept community contributions?