WebSocket Connection Loop Fix¶
Date: November 18, 2025 Issue: Edge client experiencing continuous connect/disconnect loop Status: ✅ FIXED
Problem¶
Edge client logs showed a repeating pattern every ~1 second:
[INFO] ✅ WebSocket connected
[INFO] 🔌 WebSocket connected - real-time command delivery enabled
[WARN] WebSocket closed: code=1000, reason=New connection established
[WARN] 🔌 WebSocket disconnected - falling back to HTTP polling
[INFO] Scheduling WebSocket reconnect attempt #1 in 1000ms
Root Cause¶
Duplicate WebSocket endpoint definitions in app/api/edge_routes.py:
- Line 109:
@router.websocket("/ws")- Usesws_manager.handle_connection() - Line 277:
@router.websocket("/ws")- Manually accepts connection and registers
When a client connected, both handlers were triggered because they had the same route path.
What Was Happening:¶
- Client connects → FastAPI triggers BOTH WebSocket handlers
- Handler #1 calls
ws_manager.handle_connection()→ Registers connection foredge_agent_id - Handler #2 calls
ws_manager.connect()→ Sees existing connection for sameedge_agent_id ws_manager.connect()(line 56-63) closes old connection with reason "New connection established"- Handler #1's connection gets closed → Client sees disconnect
- Client auto-reconnects → Loop repeats
Evidence in Code¶
From app/edge/websocket_manager.py:56-63:
# If edge agent already connected, close old connection
if edge_agent_id in self.connections:
old_ws = self.connections[edge_agent_id]
try:
await old_ws.close(code=1000, reason="New connection established")
except:
pass
logger.info(f"Replaced old connection for {edge_agent_id}")
This logic is correct for replacing stale connections, but became problematic when two handlers were racing to register the same connection.
Solution¶
Removed the duplicate endpoint at line 277 (kept the first one at line 109).
Changes Made:¶
File: app/api/edge_routes.py
- ❌ Removed: Lines 277-378 (duplicate WebSocket endpoint)
- ✅ Kept: Lines 109-196 (proper implementation using
ws_manager.handle_connection())
Commit: 8019b33 - "fix: Remove duplicate WebSocket endpoint causing connection loop"
Expected Behavior After Fix¶
Client should connect once and stay connected:
[INFO] ✅ WebSocket connected
[INFO] 🔌 WebSocket connected - real-time command delivery enabled
[INFO] HTTP polling paused (using WebSocket)
No more repeated disconnects unless: - Network issues occur - Backend redeploys - Client intentionally disconnects
Testing¶
- Deploy: Pushed to
devbranch → Railway auto-deploys - Monitor: Wait ~60s for deployment
- Verify: Check edge client logs for stable connection
- Confirm: Should see WebSocket stay connected for >30 seconds
Metrics to Watch¶
- Before Fix: Connection duration ~1 second (then loop)
- After Fix: Connection duration should be >minutes (until natural disconnect)
Related Files¶
app/api/edge_routes.py- WebSocket endpointsapp/edge/websocket_manager.py- Connection manager (working correctly)- Edge client - Auto-reconnect logic (working as designed)
Lessons Learned¶
- FastAPI allows duplicate routes - Both handlers execute when paths match
- Connection loops are hard to debug - Looks like network issues but was code duplication
- WebSocket manager's "replace old connection" logic is correct - It just needs to be called once per connection
Prevention¶
- Use route inspection tools to detect duplicates
- Add integration tests that verify single WebSocket handler per route
- Code review checklist: Check for duplicate
@routerdecorators