Notice: Syft Router is in ALPHA. A major update will arrive mid-November — stay connected for updates.

Router Status Guide

Understand router availability states and how to manage your router's online presence. Learn about status monitoring, health checks, and how status changes affect request routing.

Router Status States

Routers can exist in four distinct states that reflect their operational health and availability to handle requests. Understanding these states helps you monitor and troubleshoot your router effectively.

🟢

Online (Healthy)

Your router is fully operational and performing optimally. All services are running and accepting new requests.

  • Router is running and responding
  • All services are operational
  • Accepting new requests
  • Health checks passing
🟡

Online (Degraded)

Your router is still serving requests but experiencing performance issues or partial service failures.

  • Router is running but with issues
  • Some services may be slow or failing
  • Limited functionality available
  • Performance below normal
🔴

Offline (Unhealthy)

Your router is completely unavailable and cannot serve any requests. Requires immediate attention.

  • Router is not responding
  • Services are down or failing
  • No requests being processed
  • Health checks failing

Unknown

The router's status cannot be determined, typically during startup or network issues.

  • Status cannot be determined
  • Network connectivity issues
  • Router may be starting up

Status Monitoring

Automatic Status Detection

The SyftBox platform continuously monitors your router's health by making regular requests to its health endpoint. This automated monitoring ensures quick detection of any issues.

Your router must implement this endpoint to participate in the monitoring system:

# Health check endpoint monitoring
@app.get("/health")
async def health_check():
    return {
        "status": "healthy",
        "timestamp": datetime.utcnow().isoformat(),
        "services": {
            "chat": "running",
            "search": "running"
        },
        "uptime": get_uptime(),
        "version": "1.0.0"
    }

Status Check Intervals

The monitoring system uses configurable intervals and thresholds to determine when to change a router's status. These settings balance responsiveness with system stability.

# Configuration for status monitoring
MONITORING_CONFIG = {
    "health_check_interval": 30,      # seconds
    "timeout": 10,                    # seconds for health check
    "max_failures": 3,               # consecutive failures before offline
    "recovery_checks": 2,             # successful checks to go online
    "degraded_threshold": 5000,       # ms response time for degraded
}

These configuration values determine how quickly the system detects issues and recovers from failures. Adjust them based on your router's expected performance characteristics.

Status Transition Logic

Routers transition between states based on their health check responses and performance metrics. The system uses hysteresis to prevent rapid state changes that could cause instability.

[Unknown] → [Online] → [Degraded] → [Offline]
    ↑         ↑          ↑           ↑
Startup    Healthy    Slow/Errors   Failed
State      Response   Response      Health
                                    Checks

The transition logic ensures that temporary issues don't immediately mark a router as offline, while persistent problems are quickly detected and addressed.

Implications of Status Changes

When your router's status changes, it affects how requests are routed and how usage is tracked:

Status Change Effects

  • Online → Degraded: Router continues receiving requests but may be deprioritized
  • Degraded → Offline: All traffic is redirected to other available routers
  • Offline → Online: Router rejoins the active pool and begins receiving requests again

Monitoring these transitions helps you understand your router's reliability and identify patterns that might indicate underlying issues.

Monitoring Best Practices

Health Check Implementation

Design your health checks to be:

Example Health Check Implementation

class HealthChecker:
    def __init__(self, config):
        self.config = config
        
    async def check_chat_service(self):
        """Check if chat service is responsive."""
        try:
            # Test actual chat functionality
            response = await self.chat_service.generate_response("health check")
            return {"status": "healthy", "response_time": response.time}
        except Exception as e:
            return {"status": "unhealthy", "error": str(e)}
    
    async def check_search_service(self):
        """Check if search service is responsive."""
        try:
            # Test actual search functionality
            results = await self.search_service.search("test query", limit=1)
            return {"status": "healthy", "results_count": len(results)}
        except Exception as e:
            return {"status": "unhealthy", "error": str(e)}
    
    async def get_comprehensive_health(self):
        """Get detailed health status for all services."""
        chat_health = await self.check_chat_service()
        search_health = await self.check_search_service()
        
        overall_healthy = (
            chat_health["status"] == "healthy" and 
            search_health["status"] == "healthy"
        )
        
        return {
            "status": "healthy" if overall_healthy else "degraded",
            "timestamp": datetime.utcnow().isoformat(),
            "services": {
                "chat": chat_health,
                "search": search_health
            },
            "uptime": self.get_uptime(),
            "version": "1.0.0"
        }

Troubleshooting Status Issues

Common status issues and their solutions:

Router Stuck in "Unknown"

  • Check network connectivity
  • Verify health endpoint is responding
  • Check firewall settings
  • Review router startup logs

Frequent Degraded Status

  • Monitor response times
  • Check resource usage (CPU, memory)
  • Review service dependencies
  • Optimize slow queries

Router Going Offline

  • Check service logs for errors
  • Verify external API connections
  • Monitor system resources
  • Review configuration settings