Router Status Guide

Understand router availability states and how to manage your router's online presence. Learn about status monitoring, health checks, and how status changes affect request routing.

Default Router Custom Chat Custom Search Router Status

Router Status States

Routers can exist in four distinct states that reflect their operational health and availability to handle requests. Understanding these states helps you monitor and troubleshoot your router effectively.

🟢

Online (Healthy)

Your router is fully operational and performing optimally. All services are running and accepting new requests.

Router is running and responding
All services are operational
Accepting new requests
Health checks passing

🟡

Online (Degraded)

Your router is still serving requests but experiencing performance issues or partial service failures.

Router is running but with issues
Some services may be slow or failing
Limited functionality available
Performance below normal

🔴

Offline (Unhealthy)

Your router is completely unavailable and cannot serve any requests. Requires immediate attention.

Router is not responding
Services are down or failing
No requests being processed
Health checks failing

⚫

Unknown

The router's status cannot be determined, typically during startup or network issues.

Status cannot be determined
Network connectivity issues
Router may be starting up

Status Monitoring

Automatic Status Detection

The SyftBox platform continuously monitors your router's health by making regular requests to its health endpoint. This automated monitoring ensures quick detection of any issues.

Your router must implement this endpoint to participate in the monitoring system:

# Health check endpoint monitoring
@app.get("/health")
async def health_check():
    return {
        "status": "healthy",
        "timestamp": datetime.utcnow().isoformat(),
        "services": {
            "chat": "running",
            "search": "running"
        },
        "uptime": get_uptime(),
        "version": "1.0.0"
    }

Status Check Intervals

The monitoring system uses configurable intervals and thresholds to determine when to change a router's status. These settings balance responsiveness with system stability.

# Configuration for status monitoring
MONITORING_CONFIG = {
    "health_check_interval": 30,      # seconds
    "timeout": 10,                    # seconds for health check
    "max_failures": 3,               # consecutive failures before offline
    "recovery_checks": 2,             # successful checks to go online
    "degraded_threshold": 5000,       # ms response time for degraded
}

These configuration values determine how quickly the system detects issues and recovers from failures. Adjust them based on your router's expected performance characteristics.

Status Transition Logic

Routers transition between states based on their health check responses and performance metrics. The system uses hysteresis to prevent rapid state changes that could cause instability.

[Unknown] → [Online] → [Degraded] → [Offline]
    ↑         ↑          ↑           ↑
Startup    Healthy    Slow/Errors   Failed
State      Response   Response      Health
                                    Checks

The transition logic ensures that temporary issues don't immediately mark a router as offline, while persistent problems are quickly detected and addressed.

Implications of Status Changes

When your router's status changes, it affects how requests are routed and how usage is tracked:

Status Change Effects

Online → Degraded: Router continues receiving requests but may be deprioritized
Degraded → Offline: All traffic is redirected to other available routers
Offline → Online: Router rejoins the active pool and begins receiving requests again

Monitoring these transitions helps you understand your router's reliability and identify patterns that might indicate underlying issues.

Monitoring Best Practices

Health Check Implementation

Design your health checks to be:

Fast: Complete within the timeout threshold (10 seconds default)
Reliable: Test actual service functionality, not just server availability
Informative: Return detailed status for each service component

Example Health Check Implementation

class HealthChecker:
    def __init__(self, config):
        self.config = config
        
    async def check_chat_service(self):
        """Check if chat service is responsive."""
        try:
            # Test actual chat functionality
            response = await self.chat_service.generate_response("health check")
            return {"status": "healthy", "response_time": response.time}
        except Exception as e:
            return {"status": "unhealthy", "error": str(e)}
    
    async def check_search_service(self):
        """Check if search service is responsive."""
        try:
            # Test actual search functionality
            results = await self.search_service.search("test query", limit=1)
            return {"status": "healthy", "results_count": len(results)}
        except Exception as e:
            return {"status": "unhealthy", "error": str(e)}
    
    async def get_comprehensive_health(self):
        """Get detailed health status for all services."""
        chat_health = await self.check_chat_service()
        search_health = await self.check_search_service()
        
        overall_healthy = (
            chat_health["status"] == "healthy" and 
            search_health["status"] == "healthy"
        )
        
        return {
            "status": "healthy" if overall_healthy else "degraded",
            "timestamp": datetime.utcnow().isoformat(),
            "services": {
                "chat": chat_health,
                "search": search_health
            },
            "uptime": self.get_uptime(),
            "version": "1.0.0"
        }

Troubleshooting Status Issues

Common status issues and their solutions:

Router Stuck in "Unknown"

Check network connectivity
Verify health endpoint is responding
Check firewall settings
Review router startup logs

Frequent Degraded Status

Monitor response times
Check resource usage (CPU, memory)
Review service dependencies
Optimize slow queries

Router Going Offline

Check service logs for errors
Verify external API connections
Monitor system resources
Review configuration settings

Next Steps

Accounting & Pricing

Learn about usage tracking during downtime and billing

Learn more →

Code Structure

Implementation details for health checks and monitoring

Learn more →