Feature Flags Operations Guide¶

Last Updated: 2026-01-27 Owner: Backend Platform Team LaunchDarkly Project: Ikiro (key: default)

Overview¶

This guide covers the LaunchDarkly feature flags integration for the Archety backend. Feature flags enable runtime configuration changes, gradual rollouts, and emergency toggles without requiring code deployments.

Quick Start¶

Environments¶

Environment	LaunchDarkly Key	Railway Service	Superpowers Default
Development	`test`	archety-backend-dev	ON
Production	`production`	archety-backend-prod	OFF

Setup¶

Get LaunchDarkly SDK Key
Log into LaunchDarkly dashboard: https://app.launchdarkly.com
Navigate to Account Settings → Projects → Ikiro
Copy the SDK key for your environment:
- Development: sdk-fc1e5790-5356-425b-92b8-ad5eb8994be7
- Production: sdk-8d22b6a2-b8a0-4a10-84f2-fed696c451c6

Configure Environment Variable (Railway)

# Set in Railway dashboard or via CLI
railway variables set LAUNCHDARKLY_SDK_KEY=sdk-<your-key-here>

Verify Integration

# Check feature flags status (after deploy)
curl https://archety-backend-dev.up.railway.app/health

# Check logs for initialization
# Should see: "LaunchDarkly client initialized successfully"

Fallback Behavior¶

Important: The system is designed to work WITHOUT LaunchDarkly. If the SDK key is not configured or LaunchDarkly is unavailable, all flags will use their default values defined in code.

Management Scripts¶

Located in scripts/:

Script	Purpose
`create_launchdarkly_flags.py`	Create all flags in LaunchDarkly (run once)
`configure_launchdarkly_defaults.py`	Set environment-specific defaults
`update_launchdarkly_descriptions.py`	Update flag documentation

Usage:

# Requires API Access Token (not SDK key)
# Get from: Account Settings → Authorization → Access Tokens

LAUNCHDARKLY_API_KEY=api-xxxxx python scripts/create_launchdarkly_flags.py --skip-existing
LAUNCHDARKLY_API_KEY=api-xxxxx python scripts/configure_launchdarkly_defaults.py
LAUNCHDARKLY_API_KEY=api-xxxxx python scripts/update_launchdarkly_descriptions.py

Architecture¶

Components¶

LD Client (app/feature_flags/ld_client.py)
Lazy initialization with timeout
Circuit breaker for resilience
Automatic fallback to defaults
Flag Definitions (app/feature_flags/flags.py)
Centralized flag constants
Default values and metadata
Namespaced keys (e.g., llm.*, cache.*)
Integration Points
LLM Client: Token limits, emergency kill-switch
Cache Manager: TTL configuration
(Future) Workflow Engine: Feature gating

Circuit Breaker¶

The LD client includes a circuit breaker to prevent cascading failures:

Threshold: 3 consecutive failures
Timeout: 60 seconds (before attempting half-open)
Behavior: On open, return default values immediately
Logging: Warning logged when circuit opens

Flag Reference¶

LLM Token Limits¶

Control maximum token counts for different LLM contexts:

Flag Key	Default	Range	Purpose
`llm.max_tokens.reflex`	500	100-2000	Ultra-fast reflex responses
`llm.max_tokens.deep_reasoning`	2000	500-4000	Complex reasoning tasks
`llm.max_tokens.default`	1000	200-3000	Standard conversation
`llm.max_tokens.vision`	1500	500-3000	Vision/multimodal analysis

Example:

from app.feature_flags import get_config, LLMTokenLimits

max_tokens = get_config(
    LLMTokenLimits.REFLEX_MAX_TOKENS,
    default=500,
    min_value=100,
    max_value=2000
)

Pipeline Flags¶

Control LLM pipeline behavior and rollouts:

Flag Key	Type	Default	Purpose
`pipeline.classifier_v2_enabled`	Boolean	false	Enable combined classifier v2
`llm.force_basic_mode`	Boolean	false	Emergency kill-switch - disable expensive features

Emergency Kill-Switch: When llm.force_basic_mode is enabled: - All token limits capped at 200 - Expensive features disabled - Only basic responses allowed

Cache Flags¶

Control caching behavior and TTLs:

Flag Key	Default	Range	Purpose
`cache.response.enabled`	true	-	Enable Redis-backed caching
`cache.memory.ttl_seconds`	300	60-3600	Memory search cache TTL
`cache.llm_response.ttl_seconds`	3600	300-7200	LLM response cache TTL

Workflow Flags¶

Control workflow engine features:

Flag Key	Type	Default	Purpose
`workflow.calendar_v2`	Boolean	false	Canary deployment for calendar v2
`workflow.email_urgency`	Boolean	true	Email urgency detection

Context Management¶

Control conversation context size:

Flag Key	Default	Range	Purpose
`context.reduced_history_enabled`	false	-	Enable conversation truncation
`context.max_messages`	10	3-20	Max messages in context

Media Processing¶

Control photo and media handling:

Flag Key	Default	Range	Purpose
`media.photo_batching_enabled`	false	-	Batch photos for cost savings
`media.photo_batch_size`	3	1-10	Photos per batch

Observability¶

Control monitoring and logging:

Flag Key	Type	Default	Purpose
`metrics.token_monitoring_enabled`	Boolean	true	Token usage tracking
`metrics.detailed_logging_enabled`	Boolean	false	Verbose request logging

Payment & Monetization¶

Control payment processing and revenue features:

Flag Key	Type	Default	Purpose
`payment.stripe_enabled`	Boolean	true	Enable/disable Stripe payments
`payment.trial_enabled`	Boolean	true	Enable trial period for new users
`payment.trial_amount_cents`	Config	500	Trial payment amount ($5.00)
`payment.paywall_enabled`	Boolean	false	Require payment before features
`payment.stripe_test_mode`	Boolean	false	Use Stripe test keys
`payment.topup_enabled`	Boolean	true	Enable credit top-ups
`payment.topup_min_amount_cents`	Config	500	Minimum top-up ($5.00)
`payment.topup_max_amount_cents`	Config	10000	Maximum top-up ($100.00)

Use Cases: - Emergency Disable: Set payment.stripe_enabled=false if Stripe has issues - A/B Testing: Test different trial amounts (500¢ vs 1000¢) - Paywall Control: Enable paywall after beta period ends - Top-up Limits: Adjust min/max based on user feedback

Example:

from app.feature_flags import get_flag, get_config, PaymentFlags

# Check if payments are enabled
if get_flag(PaymentFlags.STRIPE_PAYMENTS_ENABLED, default=True):
    # Get trial amount from flag
    trial_amount = get_config(
        PaymentFlags.TRIAL_AMOUNT,
        default=500,
        min_value=100,
        max_value=5000
    )

Rate Limiting & Abuse Prevention¶

Control rate limits and abuse prevention:

Flag Key	Type	Default	Purpose
`rate_limit.enabled`	Boolean	true	Enable/disable rate limiting
`rate_limit.message_budget_threshold`	Config	5	Human messages before AI unprompted message
`rate_limit.commands_per_minute`	Config	10	Max commands per user per minute
`rate_limit.miniapp_actions_per_minute`	Config	20	Max mini-app actions per minute
`rate_limit.user_messages_per_minute`	Config	20	Max user messages per minute
`rate_limit.photos_per_hour`	Config	50	Max photo uploads per hour
`rate_limit.strict_mode_enabled`	Boolean	false	Enable stricter limits
`rate_limit.strict_mode_multiplier`	Config	0.5	Multiplier for strict mode (50% of normal)

Use Cases: - Abuse Response: Enable strict_mode_enabled during attack - Adjust Limits: Tune message budget based on user feedback - Cost Control: Reduce photo upload limits if storage costs spike - VIP Users: Use LaunchDarkly targeting to exempt specific users

Example:

from app.feature_flags import get_flag, get_config, RateLimitFlags

# Check if rate limiting is enabled
if get_flag(RateLimitFlags.RATE_LIMITING_ENABLED, default=True):
    # Get command rate limit
    limit = get_config(
        RateLimitFlags.COMMAND_RATE_LIMIT,
        default=10,
        min_value=1,
        max_value=1000
    )

    # Apply strict mode if enabled
    if get_flag(RateLimitFlags.STRICT_MODE_ENABLED, default=False):
        multiplier = get_config(
            RateLimitFlags.STRICT_MODE_MULTIPLIER,
            default=0.5
        )
        limit = int(limit * multiplier)  # 50% of normal

Free Tier Flags¶

Control free tier functionality and monetization:

Flag Key	Type	Default	Purpose
`free_tier.enabled`	Boolean	true	Master switch for free tier restrictions
`free_tier.daily_message_limit`	Config	50	Daily message limit for free users
`free_tier.superpowers_disabled`	Boolean	true	Disable superpowers for free users in 1:1 chats

Use Cases: - A/B Testing: Test different message limits (25 vs 50 vs 100) - Promotion: Temporarily disable free tier during launch - Conversion Optimization: Adjust limits based on conversion data

Superpower Flags¶

Control individual superpowers and workflows. All superpower flags follow the pattern superpower.{name}.enabled.

Master Switch: | Flag Key | Type | Dev Default | Prod Default | Purpose | |----------|------|-------------|--------------|---------| | superpower.enabled | Boolean | true | false | Global enable/disable for ALL superpowers |

⚠️ When superpower.enabled is OFF, all individual superpower flags are ignored.

Proactive Superpowers (Automatic/Scheduled)¶

Flag Key	Description	Trigger
`superpower.travel_brain.enabled`	Flight tracking and departure reminders	Calendar travel events
`superpower.reservation_prep.enabled`	Restaurant details before reservations	Calendar dining events
`superpower.tickets_vault.enabled`	Save and surface tickets before events	User shares tickets
`superpower.birthday_reminder.enabled`	Birthday notifications and gift help	Stored birthday dates
`superpower.post_event_checkin.enabled`	Follow up after important events	Important calendar events
`superpower.cancellation_protector.enabled`	Warn before cancellation deadlines	Bookings with policies
`superpower.membership_renewal.enabled`	Track subscription renewals	Stored memberships
`superpower.package_tracking.enabled`	Delivery notifications	Email tracking numbers
`superpower.call_prep.enabled`	Details before calls, follow-up prompts	Calendar call events
`superpower.focus_protector.enabled`	Guard deep work blocks	Calendar focus time

Daily Workflows (Scheduled)¶

Flag Key	Description	Schedule
`superpower.calendar_morning.enabled`	Morning calendar briefing	User's morning time
`superpower.calendar_events.enabled`	Monitor calendar for changes	Continuous
`superpower.email_urgency.enabled`	Scan email for urgent items	Periodic (requires Gmail OAuth)
`superpower.evening_prep.enabled`	Tomorrow's calendar and pending items	User's evening time
`superpower.destination_prep.enabled`	Trip preparation 2-3 days before	Before travel events

Agent Workflows (User-Triggered)¶

Flag Key	Description	Trigger
`superpower.calendar_stress_agent.enabled`	Analyze calendar for stressful patterns	User asks about stress
`superpower.calendar_query_agent.enabled`	Answer questions about schedule	User asks calendar questions
`superpower.calendar_list_agent.enabled`	List upcoming events	User asks to list events
`superpower.gmail_mindreader_agent.enabled`	Email insights and summaries	User asks about emails

MiniApps (Collaborative Features)¶

Flag Key	Description	Use Case
`superpower.trip_planner.enabled`	Collaborative trip planning with voting	Group trip planning
`superpower.bill_split.enabled`	Split expenses with friends	Splitting dinner bills
`superpower.todo_list.enabled`	Shared task management	Household/project tasks

Life/Wellness Workflows¶

Flag Key	Description
`superpower.mood_tracker.enabled`	Track emotional wellbeing over time
`superpower.budget_tracker.enabled`	Track spending and finances
`superpower.habit_tracker.enabled`	Build and track habits
`superpower.daily_summary.enabled`	Morning/evening briefings

User-Created Superpowers¶

Flag Key	Type	Default	Purpose
`superpower.user_created.enabled`	Boolean	false (prod)	Enable user-created custom superpowers

Risk Level: Medium - user-generated content requires monitoring.

Example - Checking Superpower Access:

from app.services.superpower_access_service import check_superpower_access

# Check if a specific superpower is enabled
result = check_superpower_access("trip_planner", user_id, is_group=False)
if result["allowed"]:
    # Execute superpower
    ...
else:
    # Handle blocked state
    print(result["message"])  # User-friendly explanation

Operations¶

Changing a Flag Value¶

In LaunchDarkly Dashboard:
Navigate to Feature Flags
Find the flag (e.g., llm.max_tokens.reflex)
Click "Edit" → "Variations"
Update value
Click "Save"

Verify Change:

# Check logs for flag evaluation
tail -f logs/app.log | grep "Flag '.*' evaluated"

# Or use admin endpoint
curl http://localhost:8000/admin/feature-flags/status

Rollback if Needed:
LaunchDarkly maintains flag history
Can rollback to previous value instantly
No code deployment required

Gradual Rollout Strategy¶

For new features (e.g., pipeline.classifier_v2_enabled):

Start with 0% (default off)
Roll out to internal users: 10% → 25% → 50%
Monitor metrics: token usage, latency, error rates
Full rollout: 100% if healthy
Make default: Update code default after 1 week stable

Emergency Response¶

Incident: High LLM costs detected

Immediate Action:
Enable llm.force_basic_mode in LaunchDarkly
Takes effect within ~5 seconds globally

Investigate:

# Check which endpoints are consuming tokens
curl http://localhost:8000/admin/metrics/token-usage

# Check feature flag status
curl http://localhost:8000/admin/feature-flags/status

Targeted Fix:
Lower specific token limits (e.g., llm.max_tokens.deep_reasoning)
Disable expensive features (e.g., workflow.calendar_v2)
Restore:
Disable llm.force_basic_mode
Monitor for 30 minutes
Document incident

Monitoring¶

Health Checks¶

Feature Flags Status Endpoint:

GET /admin/feature-flags/status

Response:

{
  "initialized": true,
  "is_connected": true,
  "circuit_breaker_open": false,
  "timestamp": "2025-01-14T10:30:00Z"
}

Alert Conditions: - initialized == false → LaunchDarkly not initialized (warn) - is_connected == false → Using fallback values (warn) - circuit_breaker_open == true → Multiple failures detected (critical)

Metrics to Track¶

Flag Evaluations:
Track log volume for flag changes
Alert on sudden default usage spike
Token Usage:
Monitor token consumption by endpoint
Compare before/after flag changes
Error Rates:
Watch for increased errors after rollout
Rollback if error rate > 1% increase

Recommended Alerts¶

# Example Sentry/Datadog alert rules
- name: "LaunchDarkly Circuit Breaker Open"
  condition: "circuit_breaker_open == true"
  severity: critical

- name: "LaunchDarkly Disconnected"
  condition: "is_connected == false for 5 minutes"
  severity: warning

- name: "High Token Usage"
  condition: "token_usage > 10M tokens/hour"
  severity: warning
  action: "Check llm.force_basic_mode"

Troubleshooting¶

Issue: Flags Always Using Defaults¶

Symptoms: - initialized == false in status endpoint - Logs show "LaunchDarkly SDK key not found"

Solution: 1. Verify LAUNCHDARKLY_SDK_KEY environment variable is set 2. Restart application 3. Check logs for initialization errors

Issue: Circuit Breaker Stuck Open¶

Symptoms: - circuit_breaker_open == true - All flags using defaults despite LD being healthy

Solution: 1. Check LaunchDarkly service status 2. Verify network connectivity to LaunchDarkly 3. Restart application (resets circuit breaker)

Issue: Flags Not Updating¶

Symptoms: - Changed flag in LaunchDarkly but application still uses old value

Solutions: 1. Check if LD client connected: /admin/feature-flags/status 2. Wait up to 60 seconds for client to fetch updates (default polling interval) 3. Verify flag key matches exactly (case-sensitive) 4. Check targeting rules aren't excluding your context

Issue: Application Slower After Enabling Flag¶

Debugging: 1. Check which flag was changed 2. Review flag-controlled code paths 3. Monitor token usage for LLM-related flags 4. Rollback flag temporarily to confirm causation

Best Practices¶

Flag Naming Conventions¶

Use namespaces: category.subcategory.name
Examples: llm.max_tokens.reflex, cache.memory.ttl_seconds
Keep names descriptive and lowercase with underscores

Flag Lifecycle¶

Creation:
Add to flags.py with default
Document purpose and rollout plan
Create flag in LaunchDarkly
Rollout:
Start with default OFF (if feature flag)
Gradual rollout: 10% → 50% → 100%
Monitor metrics at each stage
Stabilization:
Run at 100% for 1 week minimum
Verify no issues
Cleanup:
Update code default to match production value
Archive flag in LaunchDarkly (don't delete immediately)
Remove flag check from code after 1 month

Configuration Validation¶

Always use min/max clamping for numeric configs:

ttl = get_config(
    "cache.memory.ttl_seconds",
    default=300,
    min_value=60,    # Prevent too-short TTL
    max_value=3600   # Prevent too-long TTL
)

Testing Flags Locally¶

# Override flag for local development
import os
os.environ["LAUNCHDARKLY_SDK_KEY"] = ""  # Force defaults

# Or use test SDK key for isolated environment
os.environ["LAUNCHDARKLY_SDK_KEY"] = "sdk-test-..."

Documentation Requirements¶

For each new flag, document: - Purpose: What does this flag control? - Default: What's the safe fallback value? - Rollout Plan: How will this be deployed? - Metrics: What should we monitor? - Cleanup Date: When can this flag be removed?

Appendix¶

Runbook Template¶

# Runbook: [Flag Name]

**Flag Key:** `category.subcategory.name`
**Owner:** @username
**Created:** YYYY-MM-DD

## Purpose
[What does this flag control?]

## Rollout Plan
- Week 1: Internal testing (10%)
- Week 2: Gradual rollout (50%)
- Week 3: Full rollout (100%)
- Week 4: Stabilization
- Week 5: Update code defaults

## Metrics to Monitor
- [Metric 1]
- [Metric 2]

## Rollback Procedure
1. Set flag to false/default in LaunchDarkly
2. Monitor for 5 minutes
3. Investigate root cause

## Cleanup Date
[Expected date to remove flag]

Questions? Contact the Backend Platform Team or file an issue in the archety repository.

Feature Flags Operations Guide¶

Overview¶

Table of Contents¶

Quick Start¶

Environments¶

Setup¶

Fallback Behavior¶

Management Scripts¶

Architecture¶

Components¶

Circuit Breaker¶

Flag Reference¶

LLM Token Limits¶

Pipeline Flags¶

Cache Flags¶

Workflow Flags¶

Context Management¶

Media Processing¶

Observability¶

Payment & Monetization¶

Rate Limiting & Abuse Prevention¶

Free Tier Flags¶

Superpower Flags¶

Proactive Superpowers (Automatic/Scheduled)¶

Daily Workflows (Scheduled)¶

Agent Workflows (User-Triggered)¶

MiniApps (Collaborative Features)¶

Life/Wellness Workflows¶

User-Created Superpowers¶

Operations¶

Changing a Flag Value¶

Gradual Rollout Strategy¶

Emergency Response¶

Monitoring¶

Health Checks¶

Metrics to Track¶

Recommended Alerts¶

Troubleshooting¶

Issue: Flags Always Using Defaults¶

Issue: Circuit Breaker Stuck Open¶

Issue: Flags Not Updating¶

Issue: Application Slower After Enabling Flag¶

Best Practices¶

Flag Naming Conventions¶

Flag Lifecycle¶

Configuration Validation¶

Testing Flags Locally¶

Documentation Requirements¶

Appendix¶

Related Documentation¶

Runbook Template¶