Database Migration Guide¶

Overview¶

This document explains the database migration strategy for the Archety backend. After experiencing issues with Alembic table conflicts, we use a direct SQL migration approach for production deployments.

Current Approach: Direct SQL Migrations¶

Why Not Alembic?¶

Issue: Alembic's auto-migration caused table conflicts and inconsistent schema states
Solution: Manual SQL migration scripts executed directly on PostgreSQL
Trade-off: More manual work, but predictable and explicit

Migration Process¶

1. Local Development¶

For local development, we create tables on first run:

# In app/main.py or migration script
from app.models.database import Base
from app.config import engine

Base.metadata.create_all(bind=engine)

2. Production Deployment (Render/Railway)¶

For production PostgreSQL on Render/Railway, follow this process:

Step 1: Create Migration Script Create a numbered SQL file in /migrations/ directory:

-- migrations/001_initial_schema.sql
CREATE TABLE IF NOT EXISTS users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    phone VARCHAR(20) UNIQUE NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    last_active_at TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_users_phone ON users(phone);

Step 2: Test Migration Locally

# Connect to local PostgreSQL
psql -U your_user -d archety_dev

# Run migration
\i migrations/001_initial_schema.sql

# Verify tables created
\dt

# Check schema
\d users

Step 3: Apply to Production

# Connect to production PostgreSQL (get connection string from dashboard)
psql <REDACTED_POSTGRES_URL>

# Run migration
\i migrations/001_initial_schema.sql

# If using Alembic-managed migrations, run all heads:
alembic upgrade head

# Verify
\dt
\d users

3. Schema Version Tracking¶

Track applied migrations in a schema_migrations table:

CREATE TABLE IF NOT EXISTS schema_migrations (
    version INTEGER PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    applied_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- After applying each migration
INSERT INTO schema_migrations (version, name) VALUES (1, 'initial_schema');

Current Database Schema¶

Core Tables¶

users - User accounts (phone number → UUID)
conversations - Chat threads (chat_guid → conversation metadata)
messages - Message history for conversation continuity
relationship_states - Trust/rapport scores per user-persona pair
user_boundaries - "Forget that" / privacy boundaries
oauth_tokens - Encrypted OAuth tokens (Google Calendar/Gmail)

Group Feature Tables (Added in Phase 5)¶

polls - Group polls with voting data
checklists - Shared group checklists with item completion

Edge Agent Tables¶

edge_agents - Registered edge agents (Mac Mini instances)
edge_command_queue - Commands pending for edge agents
edge_event_log - Events reported by edge agents

Telegram Integration Tables¶

telegram_update_tracking - Deduplication for Telegram webhook
telegram_message_log - Message history from Telegram

Migration Examples¶

Adding a New Table¶

-- migrations/005_add_polls.sql
CREATE TABLE IF NOT EXISTS polls (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    chat_guid VARCHAR(255) NOT NULL,
    question TEXT NOT NULL,
    options JSONB NOT NULL DEFAULT '[]',
    votes JSONB NOT NULL DEFAULT '{}',
    created_by VARCHAR(20) NOT NULL,
    expires_at TIMESTAMP WITH TIME ZONE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    status VARCHAR(20) DEFAULT 'active'
);

CREATE INDEX idx_polls_chat_guid ON polls(chat_guid);
CREATE INDEX idx_polls_status ON polls(status);

-- Track migration
INSERT INTO schema_migrations (version, name) VALUES (5, 'add_polls');

Adding a Column¶

-- migrations/006_add_user_timezone.sql
ALTER TABLE users
ADD COLUMN IF NOT EXISTS timezone VARCHAR(50) DEFAULT 'America/Los_Angeles';

-- Track migration
INSERT INTO schema_migrations (version, name) VALUES (6, 'add_user_timezone');

Modifying a Column¶

-- migrations/007_expand_phone_length.sql
ALTER TABLE users
ALTER COLUMN phone TYPE VARCHAR(30);

-- Track migration
INSERT INTO schema_migrations (version, name) VALUES (7, 'expand_phone_length');

Schema Drift Prevention¶

Before Deploying¶

Compare SQLAlchemy models (app/models/database.py) with production schema

Run this query to check current schema:

SELECT
    table_name,
    column_name,
    data_type,
    is_nullable
FROM information_schema.columns
WHERE table_schema = 'public'
ORDER BY table_name, ordinal_position;

Generate migration for any differences

After Deploying¶

Verify migrations applied:

SELECT * FROM schema_migrations ORDER BY version;

Test CRUD operations on new tables/columns:

# In Python shell
from app.models.database import Poll
from app.config import SessionLocal

db = SessionLocal()
poll = Poll(chat_guid="test", question="Test?", options=["A", "B"])
db.add(poll)
db.commit()
print(f"Poll created: {poll.id}")

Rollback Strategy¶

For Recent Migrations¶

Create rollback SQL for each migration:

-- migrations/005_add_polls_ROLLBACK.sql
DROP TABLE IF EXISTS polls CASCADE;

-- Remove from tracking
DELETE FROM schema_migrations WHERE version = 5;

For Complex Changes¶

Create backup before migration:

-- Backup specific table
CREATE TABLE polls_backup AS SELECT * FROM polls;

-- Full database backup (Render dashboard or pg_dump)
pg_dump -h host -U user database > backup_YYYY-MM-DD.sql

Restore if needed:

-- Restore table
DROP TABLE polls;
ALTER TABLE polls_backup RENAME TO polls;

UUID Foreign Key Fix (November 2025)¶

Issue: Conversation FK used string instead of UUID, causing silent failures Fix: Updated all FK references to use UUID type

-- Migration applied directly (not through Alembic)
ALTER TABLE messages
ALTER COLUMN conversation_id TYPE UUID USING conversation_id::uuid;

ALTER TABLE relationship_states
ALTER COLUMN user_id TYPE UUID USING user_id::uuid;

Best Practices¶

Always use IF NOT EXISTS - Prevents errors on re-run
Add indexes immediately - Don't defer index creation
Use transactions - Wrap migrations in BEGIN/COMMIT
Test locally first - Never apply untested SQL to production
Track version - Always update schema_migrations
Document changes - Add comments explaining why
Keep SQLAlchemy models in sync - Update database.py with any schema changes

Automated Migration Script (Future)¶

Consider creating a Python migration runner:

# scripts/migrate.py
import psycopg2
from app.config import DATABASE_URL

def run_migration(version: int, sql_file: str):
    conn = psycopg2.connect(DATABASE_URL)
    cur = conn.cursor()

    # Check if already applied
    cur.execute("SELECT version FROM schema_migrations WHERE version = %s", (version,))
    if cur.fetchone():
        print(f"Migration {version} already applied")
        return

    # Run migration
    with open(sql_file) as f:
        cur.execute(f.read())

    # Track it
    cur.execute(
        "INSERT INTO schema_migrations (version, name) VALUES (%s, %s)",
        (version, sql_file.split('/')[-1])
    )

    conn.commit()
    cur.close()
    conn.close()

Summary¶

Use direct SQL migrations for predictable schema changes
Track versions in schema_migrations table
Test locally before production
Keep SQLAlchemy models in sync with actual schema
Document all changes in migration files
Create rollback scripts for safety

This approach sacrifices automation for reliability and explicit control over schema changes.