Skip to content

Database Migration Guide

Overview

This document explains the database migration strategy for the Archety backend. After experiencing issues with Alembic table conflicts, we use a direct SQL migration approach for production deployments.

Current Approach: Direct SQL Migrations

Why Not Alembic?

  • Issue: Alembic's auto-migration caused table conflicts and inconsistent schema states
  • Solution: Manual SQL migration scripts executed directly on PostgreSQL
  • Trade-off: More manual work, but predictable and explicit

Migration Process

1. Local Development

For local development, we create tables on first run:

# In app/main.py or migration script
from app.models.database import Base
from app.config import engine

Base.metadata.create_all(bind=engine)

2. Production Deployment (Render/Railway)

For production PostgreSQL on Render/Railway, follow this process:

Step 1: Create Migration Script Create a numbered SQL file in /migrations/ directory:

-- migrations/001_initial_schema.sql
CREATE TABLE IF NOT EXISTS users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    phone VARCHAR(20) UNIQUE NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    last_active_at TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_users_phone ON users(phone);

Step 2: Test Migration Locally

# Connect to local PostgreSQL
psql -U your_user -d archety_dev

# Run migration
\i migrations/001_initial_schema.sql

# Verify tables created
\dt

# Check schema
\d users

Step 3: Apply to Production

# Connect to production PostgreSQL (get connection string from dashboard)
psql <REDACTED_POSTGRES_URL>

# Run migration
\i migrations/001_initial_schema.sql

# If using Alembic-managed migrations, run all heads:
alembic upgrade head

# Verify
\dt
\d users

3. Schema Version Tracking

Track applied migrations in a schema_migrations table:

CREATE TABLE IF NOT EXISTS schema_migrations (
    version INTEGER PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    applied_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- After applying each migration
INSERT INTO schema_migrations (version, name) VALUES (1, 'initial_schema');

Current Database Schema

Core Tables

  • users - User accounts (phone number → UUID)
  • conversations - Chat threads (chat_guid → conversation metadata)
  • messages - Message history for conversation continuity
  • relationship_states - Trust/rapport scores per user-persona pair
  • user_boundaries - "Forget that" / privacy boundaries
  • oauth_tokens - Encrypted OAuth tokens (Google Calendar/Gmail)

Group Feature Tables (Added in Phase 5)

  • polls - Group polls with voting data
  • checklists - Shared group checklists with item completion

Edge Agent Tables

  • edge_agents - Registered edge agents (Mac Mini instances)
  • edge_command_queue - Commands pending for edge agents
  • edge_event_log - Events reported by edge agents

Telegram Integration Tables

  • telegram_update_tracking - Deduplication for Telegram webhook
  • telegram_message_log - Message history from Telegram

Migration Examples

Adding a New Table

-- migrations/005_add_polls.sql
CREATE TABLE IF NOT EXISTS polls (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    chat_guid VARCHAR(255) NOT NULL,
    question TEXT NOT NULL,
    options JSONB NOT NULL DEFAULT '[]',
    votes JSONB NOT NULL DEFAULT '{}',
    created_by VARCHAR(20) NOT NULL,
    expires_at TIMESTAMP WITH TIME ZONE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    status VARCHAR(20) DEFAULT 'active'
);

CREATE INDEX idx_polls_chat_guid ON polls(chat_guid);
CREATE INDEX idx_polls_status ON polls(status);

-- Track migration
INSERT INTO schema_migrations (version, name) VALUES (5, 'add_polls');

Adding a Column

-- migrations/006_add_user_timezone.sql
ALTER TABLE users
ADD COLUMN IF NOT EXISTS timezone VARCHAR(50) DEFAULT 'America/Los_Angeles';

-- Track migration
INSERT INTO schema_migrations (version, name) VALUES (6, 'add_user_timezone');

Modifying a Column

-- migrations/007_expand_phone_length.sql
ALTER TABLE users
ALTER COLUMN phone TYPE VARCHAR(30);

-- Track migration
INSERT INTO schema_migrations (version, name) VALUES (7, 'expand_phone_length');

Schema Drift Prevention

Before Deploying

  1. Compare SQLAlchemy models (app/models/database.py) with production schema
  2. Run this query to check current schema:

    SELECT
        table_name,
        column_name,
        data_type,
        is_nullable
    FROM information_schema.columns
    WHERE table_schema = 'public'
    ORDER BY table_name, ordinal_position;
    

  3. Generate migration for any differences

After Deploying

  1. Verify migrations applied:

    SELECT * FROM schema_migrations ORDER BY version;
    

  2. Test CRUD operations on new tables/columns:

    # In Python shell
    from app.models.database import Poll
    from app.config import SessionLocal
    
    db = SessionLocal()
    poll = Poll(chat_guid="test", question="Test?", options=["A", "B"])
    db.add(poll)
    db.commit()
    print(f"Poll created: {poll.id}")
    

Rollback Strategy

For Recent Migrations

Create rollback SQL for each migration:

-- migrations/005_add_polls_ROLLBACK.sql
DROP TABLE IF EXISTS polls CASCADE;

-- Remove from tracking
DELETE FROM schema_migrations WHERE version = 5;

For Complex Changes

  1. Create backup before migration:

    -- Backup specific table
    CREATE TABLE polls_backup AS SELECT * FROM polls;
    
    -- Full database backup (Render dashboard or pg_dump)
    pg_dump -h host -U user database > backup_YYYY-MM-DD.sql
    

  2. Restore if needed:

    -- Restore table
    DROP TABLE polls;
    ALTER TABLE polls_backup RENAME TO polls;
    

UUID Foreign Key Fix (November 2025)

Issue: Conversation FK used string instead of UUID, causing silent failures Fix: Updated all FK references to use UUID type

-- Migration applied directly (not through Alembic)
ALTER TABLE messages
ALTER COLUMN conversation_id TYPE UUID USING conversation_id::uuid;

ALTER TABLE relationship_states
ALTER COLUMN user_id TYPE UUID USING user_id::uuid;

Best Practices

  1. Always use IF NOT EXISTS - Prevents errors on re-run
  2. Add indexes immediately - Don't defer index creation
  3. Use transactions - Wrap migrations in BEGIN/COMMIT
  4. Test locally first - Never apply untested SQL to production
  5. Track version - Always update schema_migrations
  6. Document changes - Add comments explaining why
  7. Keep SQLAlchemy models in sync - Update database.py with any schema changes

Automated Migration Script (Future)

Consider creating a Python migration runner:

# scripts/migrate.py
import psycopg2
from app.config import DATABASE_URL

def run_migration(version: int, sql_file: str):
    conn = psycopg2.connect(DATABASE_URL)
    cur = conn.cursor()

    # Check if already applied
    cur.execute("SELECT version FROM schema_migrations WHERE version = %s", (version,))
    if cur.fetchone():
        print(f"Migration {version} already applied")
        return

    # Run migration
    with open(sql_file) as f:
        cur.execute(f.read())

    # Track it
    cur.execute(
        "INSERT INTO schema_migrations (version, name) VALUES (%s, %s)",
        (version, sql_file.split('/')[-1])
    )

    conn.commit()
    cur.close()
    conn.close()

Summary

  • Use direct SQL migrations for predictable schema changes
  • Track versions in schema_migrations table
  • Test locally before production
  • Keep SQLAlchemy models in sync with actual schema
  • Document all changes in migration files
  • Create rollback scripts for safety

This approach sacrifices automation for reliability and explicit control over schema changes.