MCP Routing Multiplexer

stonegate Documentation

One endpoint for all your MCP servers. Tool discovery, automatic routing, connection pooling, circuit breakers, crash recovery, and async dispatch. Your agents share tools — no duplicate processes, no fragile connections.

1. Overview

stonegate is a compiled Rust binary that acts as a routing multiplexer for MCP (Model Context Protocol) servers. Instead of each agent connecting directly to every MCP server it needs, agents connect to stonegate once — and stonegate routes tool calls to the correct backend, manages connections, handles failures, and provides a unified tool catalog.

Agent A ──┐ ┌── MCP Server: filesystem Agent B ──┼── stonegate (:3393) ──┼── MCP Server: database Agent C ──┘ (gateway) ├── MCP Server: web-search ├── MCP Server: stonemem └── MCP Server: custom-tools

Key Capabilities

  • Unified tool catalog — All tools from all backends appear in a single catalog. Agents see one flat namespace of available tools without knowing which server hosts which tool.
  • Automatic routing — When an agent calls a tool, stonegate routes the call to the correct backend server transparently. No routing configuration needed — the tool name is the routing key.
  • Connection pooling — Multiple agents share backend connections. One stonegate process manages all MCP server connections, eliminating duplicate processes and fragile per-agent connections.
  • Circuit breaker — If a backend server starts failing, stonegate opens the circuit — subsequent calls fail fast instead of timing out. The circuit automatically closes when the backend recovers.
  • Crash recovery — If a backend MCP server crashes, stonegate detects the failure and restarts it automatically. Agents see a brief error, then service resumes.
  • Async dispatch — Long-running tool calls can be dispatched asynchronously. The agent gets a call ID immediately and polls for the result later.
  • Tool schema caching — Backend tool schemas are cached to avoid redundant discovery calls. Cache invalidation happens on backend restart.
  • Runtime server management — Add, remove, start, stop, and restart backend servers without restarting stonegate.

Why Use stonegate?

Without stonegate, every agent in your system needs its own connection to every MCP server it uses. With 10 agents and 5 MCP servers, that's 50 connections — 50 server processes, 50 points of failure, 50 configuration entries. With stonegate, it's 10 connections to the gateway and 5 managed backends. One place to configure, monitor, and debug.

Works with any MCP server. stonegate doesn't care what language your backends are written in — Rust, Python, TypeScript, Go. If it speaks MCP (stdio or HTTP), stonegate can manage it.

2. Installation

macOS (Homebrew)

brew install keystoneproject/tap/stonegate

macOS (Direct Download)

# Apple Silicon
curl -L https://keystoneproject.dev/releases/stonegate/darwin-aarch64/stonegate-v1.1.0-darwin-aarch64.tar.gz | tar xz
sudo mv stonegate /usr/local/bin/

# Intel
curl -L https://keystoneproject.dev/releases/stonegate/darwin-x86_64/stonegate-v1.1.0-darwin-x86_64.tar.gz | tar xz
sudo mv stonegate /usr/local/bin/

Linux

# x86_64
curl -L https://keystoneproject.dev/releases/stonegate/linux-x86_64/stonegate-v1.1.0-linux-x86_64.tar.gz | tar xz
sudo mv stonegate /usr/local/bin/

# ARM64 (aarch64)
curl -L https://keystoneproject.dev/releases/stonegate/linux-aarch64/stonegate-v1.1.0-linux-aarch64.tar.gz | tar xz
sudo mv stonegate /usr/local/bin/

3. Quick Start

Step 1: Activate

stonegate activate --key SG-XXXX-XXXX-XXXX-XXXX
stonegate status

Step 2: Configure backends

Create a config file at ~/.stonegate/config.toml:

[server]
host = "127.0.0.1"
port = 3393

[[backends]]
name = "stonemem"
command = "stonemem"
args = ["serve"]
transport = "stdio"

[[backends]]
name = "filesystem"
command = "npx"
args = ["-y", "@anthropic/mcp-filesystem"]
transport = "stdio"

[[backends]]
name = "web-search"
command = "python"
args = ["-m", "mcp_web_search"]
transport = "stdio"

Step 3: Start

stonegate serve

stonegate starts, launches all configured backends, discovers their tools, and begins routing.

Step 4: Connect your agent

{
  "mcpServers": {
    "stonegate": {
      "url": "http://localhost:3393",
      "transport": "http"
    }
  }
}

Your agent now has access to every tool from every backend — through one connection.

Step 5: Call a tool

# Discover all available tools
curl http://localhost:3393/tools

# Call a tool (stonegate routes to the correct backend)
curl -X POST http://localhost:3393/call \
  -H "Content-Type: application/json" \
  -d '{
    "server": "stonemem",
    "tool": "mem_search",
    "arguments": {"query": "deployment schedule"}
  }'

4. Configuration

[server]
host = "127.0.0.1"
port = 3393
data_dir = "~/.stonegate"

[license]
key_file = "~/.stonegate/license.key"
license_server = "https://license.keystoneproject.dev"

[gateway]
max_connections = 100             # Max concurrent agent connections
request_timeout = 30000           # Per-call timeout (ms)
schema_cache_ttl = 300            # Tool schema cache TTL (seconds)

[circuit_breaker]
failure_threshold = 5             # Failures before circuit opens
reset_timeout = 30                # Seconds before half-open test
success_threshold = 2             # Successes to close circuit

[pool]
max_idle = 10                     # Max idle backend connections
idle_timeout = 300                # Idle connection timeout (seconds)

[[backends]]
name = "stonemem"                 # Unique backend name
command = "stonemem"              # Binary to launch
args = ["serve"]                  # Command arguments
transport = "stdio"               # "stdio" or "http"
# url = "http://localhost:3391"   # For HTTP transport
env = { STONEMEM_PORT = "3391" }  # Optional environment variables
auto_restart = true               # Restart on crash (default: true)
health_check = true               # Enable health checks (default: true)

[logging]
level = "info"
format = "json"

Adding backends at runtime

# Add a new backend without restarting stonegate
curl -X POST http://localhost:3393/server/add \
  -H "Content-Type: application/json" \
  -d '{
    "name": "custom-tools",
    "command": "/path/to/custom-mcp-server",
    "args": [],
    "transport": "stdio"
  }'

5. API Reference

Tool Operations

POST /call

Call a tool on a backend server. stonegate routes the call, manages the connection, and returns the result.

Request Body

{
  "server": "string",             // Required. Backend server name.
  "tool": "string",               // Required. Tool name.
  "arguments": {}                 // Required. Tool arguments (JSON object).
}

Response

{
  "result": { ... },              // Tool output (varies by tool)
  "server": "stonemem",
  "tool": "mem_search",
  "duration_ms": 12
}

Error Response

{
  "error": "Circuit open for server 'web-search' — backend is failing",
  "code": "CIRCUIT_OPEN"
}
GET /tools

List all tools across all backends. Returns a unified catalog.

Response

{
  "tools": [
    {
      "name": "mem_search",
      "server": "stonemem",
      "description": "Full-text search across stored entries",
      "parameters": { ... }
    },
    {
      "name": "read_file",
      "server": "filesystem",
      "description": "Read a file from the filesystem",
      "parameters": { ... }
    }
  ],
  "total": 42
}
GET /tools/{server}

List tools for a specific backend server.

GET /discover

Discover tools matching a keyword or pattern. Useful for agents that need to find the right tool.

Query Parameters

?query=search              // Keyword to match against tool names and descriptions
GET /schema/{server}/{tool}

Get the full JSON Schema for a specific tool's parameters.

Server Management

GET /server/list

List all backend servers with their status, tool count, and health.

Response

{
  "servers": [
    {
      "name": "stonemem",
      "status": "running",
      "transport": "stdio",
      "tool_count": 8,
      "uptime_seconds": 3600,
      "circuit": "closed",
      "total_calls": 1247,
      "error_rate": 0.001
    }
  ]
}
POST /server/start

Start a stopped backend server.

{ "name": "string" }
POST /server/stop

Stop a running backend server gracefully.

{ "name": "string" }
POST /server/restart

Restart a backend server. Useful after configuration changes.

{ "name": "string" }
POST /server/add

Add a new backend server at runtime without restarting stonegate.

{
  "name": "string",               // Required. Unique backend name.
  "command": "string",            // Required. Binary path.
  "args": ["string"],             // Optional. Arguments.
  "transport": "stdio",           // "stdio" or "http".
  "url": "string",                // Required for HTTP transport.
  "env": {}                       // Optional. Environment variables.
}

Monitoring

GET /config

Get current gateway configuration (sanitized — no secrets).

GET /log

Get recent call log entries with timing and status.

Query Parameters

?limit=50                  // Max entries (default: 50)
&server=stonemem           // Filter by backend
&status=error              // Filter: success, error, timeout

Enterprise only. Audit logging and log export require an Enterprise license.

GET /stats

Gateway statistics — total calls, error rates, latency percentiles, per-backend breakdowns.

Response

{
  "total_calls": 15234,
  "total_errors": 23,
  "error_rate": 0.0015,
  "avg_latency_ms": 45,
  "p95_latency_ms": 120,
  "p99_latency_ms": 340,
  "backends": {
    "stonemem": { "calls": 8000, "errors": 2, "avg_ms": 12 },
    "filesystem": { "calls": 5000, "errors": 5, "avg_ms": 35 },
    "web-search": { "calls": 2234, "errors": 16, "avg_ms": 890 }
  }
}
GET /health

Gateway health — version, tier, uptime, backend count, overall status.

{
  "status": "ok",
  "version": "1.1.0",
  "tier": "pro",
  "uptime_seconds": 86400,
  "backends_total": 5,
  "backends_healthy": 5,
  "circuits_open": 0
}

6. MCP Tool Reference

When connected as an MCP server itself, stonegate exposes gateway management tools alongside all proxied backend tools:

gate_call

Tool: gate_call
Parameters:
  - server (string, required): Backend server name
  - tool (string, required): Tool name
  - arguments (object, required): Tool arguments

Example:
  gate_call({
    server: "stonemem",
    tool: "mem_search",
    arguments: { query: "deployment schedule", limit: 5 }
  })

gate_discover

Tool: gate_discover
Parameters:
  - query (string, optional): Keyword search across tool names/descriptions

Example:
  gate_discover({ query: "file read" })
  // Returns tools matching "file" or "read" from any backend

gate_status

Tool: gate_status
Parameters:
  - server (string, optional): Filter to a specific backend

Example:
  gate_status({ server: "web-search" })
  // Returns health, circuit state, call stats for web-search backend

gate_call_async

Tool: gate_call_async (Pro/Enterprise)
Parameters:
  - server (string, required): Backend server
  - tool (string, required): Tool name
  - arguments (object, required): Tool arguments

Returns immediately with a call_id.
Agent polls with gate_poll(call_id) or gate_await(call_id, timeout_ms).

7. Use Cases by Platform

Hermes / Claude Code

Unified Tool Access for Agent Estates

Multiple Claude Code instances share a single stonegate gateway. Instead of configuring 20 MCP servers per agent, each agent connects to stonegate once.

# Before stonegate (per-agent config, 5 servers x 10 agents = 50 processes):
{
  "mcpServers": {
    "filesystem": { "command": "npx", "args": ["@anthropic/mcp-filesystem"] },
    "database": { "command": "python", "args": ["-m", "mcp_postgres"] },
    "stonemem": { "command": "stonemem", "args": ["serve"] },
    "web-search": { "command": "python", "args": ["-m", "mcp_web_search"] },
    "custom": { "command": "./custom-tools" }
  }
}

# After stonegate (per-agent config, 1 connection):
{
  "mcpServers": {
    "stonegate": { "url": "http://localhost:3393" }
  }
}

# Agent calls any tool through the gateway:
gate_call({ server: "database", tool: "query", arguments: { sql: "SELECT ..." } })
gate_call({ server: "stonemem", tool: "mem_search", arguments: { query: "..." } })

Result: 10 agents share 5 backend connections managed by one stonegate process. Configuration is in one place. Backend crashes are auto-recovered.

CrewAI

Dynamic Tool Discovery for Crew Members

CrewAI crew members discover available tools at runtime instead of having them hardcoded.

# Research agent discovers what tools are available
gate_discover({ query: "search" })
// Returns:
// - web-search/search_web (web-search backend)
// - stonemem/mem_search (stonemem backend)
// - database/search_records (database backend)

# Agent picks the right tool dynamically based on the task
# For web research:
gate_call({ server: "web-search", tool: "search_web", arguments: { query: "AI market 2026" } })
# For internal knowledge:
gate_call({ server: "stonemem", tool: "mem_search", arguments: { query: "AI market analysis" } })
LangGraph

Resilient Tool Calls in Multi-Step Workflows

LangGraph workflows that call external tools through stonegate get automatic circuit breaker protection and crash recovery.

# Workflow step calls an unreliable external API via custom MCP server
# Without stonegate: timeout hangs the entire graph
# With stonegate: circuit breaker returns error instantly after 5 failures

gate_call({
  server: "external-api",
  tool: "fetch_data",
  arguments: { endpoint: "/api/v2/reports" }
})

# If external-api is failing, stonegate responds instantly:
// { "error": "Circuit open for 'external-api'", "code": "CIRCUIT_OPEN" }
// Workflow catches error and takes fallback path
OpenHands

Hot-Swapping Tool Backends

OpenHands agents can have their tool backends updated at runtime — no agent restart required.

# DevOps adds a new monitoring backend at runtime
curl -X POST http://localhost:3393/server/add \
  -d '{"name": "monitoring", "command": "python", "args": ["-m", "mcp_grafana"]}'

# Agents immediately discover the new tools
gate_discover({ query: "grafana dashboard" })
// Returns new tools from the monitoring backend

# Remove a deprecated backend
curl -X POST http://localhost:3393/server/stop \
  -d '{"name": "old-monitoring"}'
Google ADK

Multi-Tenant Tool Isolation

Multiple Google ADK deployments share a stonegate instance with per-tenant tool visibility.

# Tenant A's agent sees only their permitted backends
gate_call({
  server: "tenant-a-db",
  tool: "query",
  arguments: { sql: "SELECT * FROM orders" }
})

# Tenant B's agent sees different backends
gate_call({
  server: "tenant-b-db",
  tool: "query",
  arguments: { sql: "SELECT * FROM inventory" }
})
Haystack

Pipeline Tool Orchestration

Haystack pipeline components access diverse tools through one gateway — LLMs, databases, file systems, web search — without per-component MCP configuration.

# Retriever component uses web search
gate_call({ server: "web-search", tool: "search_web", arguments: { query: "..." } })

# Reader component uses LLM
gate_call({ server: "llm", tool: "generate", arguments: { prompt: "..." } })

# Writer component uses filesystem
gate_call({ server: "filesystem", tool: "write_file", arguments: { path: "output.md", content: "..." } })
Microsoft Agent Framework

Enterprise Tool Governance

In enterprise deployments, stonegate provides a single point of governance for all agent tool access. All calls flow through the gateway — audit logging captures who called what, when, and with what arguments.

# All 200+ agents connect to stonegate
# Enterprise tier enables audit logging:

GET /log?limit=100
// Returns:
// [
//   { "agent": "hr-1", "server": "database", "tool": "update_record",
//     "timestamp": "...", "duration_ms": 45, "status": "success" },
//   { "agent": "finance-3", "server": "payment-api", "tool": "refund",
//     "timestamp": "...", "duration_ms": 230, "status": "success" },
//   ...
// ]

# Export audit log for compliance review
GET /log?status=error&limit=1000
# Identify which agents are hitting errors and on which backends
All Platforms

Async Dispatch for Long-Running Operations

Some tool calls take minutes — data exports, complex queries, file processing. Async dispatch lets the agent continue working while waiting.

# Dispatch a long-running export asynchronously
gate_call_async({
  server: "database",
  tool: "export_table",
  arguments: { table: "transactions", format: "csv", since: "2026-01-01" }
})
// Returns immediately: { "call_id": "abc-123", "submitted_at": "..." }

# Agent does other work...

# Check if it's done
gate_poll({ call_id: "abc-123" })
// { "status": "running" }

# Or block briefly
gate_await({ call_id: "abc-123", timeout_ms: 5000 })
// { "status": "complete", "result": { ... } }

8. Tier Comparison

FeatureFreePro ($19/mo)Enterprise ($99/mo)
Backend servers3UnlimitedUnlimited
Agent connections1UnlimitedUnlimited
Tool discovery + routingYesYesYes
Schema cachingYesYesYes
Crash recoveryYesYesYes
Async dispatchNoYesYes
Connection poolingNoYesYes
Circuit breakerNoYesYes
Runtime server add/removeNoYesYes
HTTP transport (multi-agent)NoNoYes
Audit loggingNoNoYes
Backend metrics exportNoNoYes

9. License Management

Activation

stonegate activate --key SG-XXXX-XXXX-XXXX-XXXX
stonegate status

Heartbeat & SIGIL

Same model as stonemem and stonemux — daily heartbeat (telemetry only), SIGIL die command for revocation.

License key prefixes

ProductPrefixPort
stonememSM-3391
stonemuxSX-3392
stonegateSG-3393

10. Troubleshooting

Backend won't start

# Check backend binary exists
which stonemem

# Check stonegate logs
stonegate serve --log-level debug

# Verify backend starts independently
stonemem serve --port 3391

Circuit breaker open

# Check which backends have open circuits
curl http://localhost:3393/server/list | jq '.servers[] | select(.circuit == "open")'

# Manually restart the failing backend
curl -X POST http://localhost:3393/server/restart -d '{"name": "web-search"}'

# Circuit auto-closes after reset_timeout (default 30s) if the restart succeeds

Tool not found

# List all known tools
curl http://localhost:3393/tools | jq '.tools[].name'

# Check if the backend is running
curl http://localhost:3393/server/list | jq '.servers[] | select(.name == "myserver")'

# If the backend was just added, restart to refresh tool catalog
curl -X POST http://localhost:3393/server/restart -d '{"name": "myserver"}'

High latency

# Check per-backend latency
curl http://localhost:3393/stats | jq '.backends'

# If a specific backend is slow, check its health directly
curl http://localhost:3391/health  # stonemem
curl http://localhost:3392/health  # stonemux

# Enable connection pooling (Pro) to reduce connection overhead

Free tier limit reached (3 backends)

# Check current backend count
curl http://localhost:3393/server/list | jq '.servers | length'

# Remove an unused backend or upgrade to Pro for unlimited backends

Getting Help