Why Your Agents Need a Mesh
Single agents hit a ceiling fast. A mesh network lets them discover each other, share context, and collaborate — without you writing the glue code.
By Alex Frey
Most developers start with a single agent. One LLM pipeline that handles a task end-to-end. It works — until you need a second one.
The moment you add another agent, the questions start. How does the code-review agent tell the deployment agent to roll back? How does the monitoring agent alert the on-call agent? You end up writing webhooks, queues, shared databases, polling loops. Every new connection is more glue code to maintain.
Here's what that looks like in practice — and why a mesh makes it disappear.
The discovery problem
Say you have two agents: one that reviews code and one that deploys it. Without a mesh, you hard-wire them together:
# code_reviewer.py — the hard way
import requests
DEPLOY_URL = "http://192.168.1.42:8080" # brittle
API_KEY = "sk-deploy-..." # one more secret to manage
def request_rollback(commit_sha):
requests.post(
f"{DEPLOY_URL}/rollback",
json={"sha": commit_sha},
headers={"Authorization": f"Bearer {API_KEY}"},
timeout=30,
)
This works until someone moves the deploy agent to a different machine. Or runs it on a laptop behind NAT. Or adds a third agent that also needs to trigger rollbacks. Every new connection means more URLs, more auth tokens, more code that has nothing to do with what your agent actually does.
With a mesh, the same thing is one line:
# code_reviewer.py — with tailbus
from tailbus import AsyncAgent
reviewer = AsyncAgent("code-review")
async def request_rollback(commit_sha):
await reviewer.open_session("deploy", {
"action": "rollback",
"sha": commit_sha
})
No URL. No auth token. No knowledge of where @deploy is running. The mesh handles discovery, routing, and delivery. If @deploy moves to a different machine, nothing changes. If a third agent needs to talk to it, same deal — one line.
Context flows naturally
Once agents can discover each other, context starts flowing without you orchestrating it.
A monitoring agent detects an anomaly and messages @triage. The triage agent investigates and messages @deploy with a rollback request. Each agent does what it's good at, passing structured context to the next:
from tailbus import AsyncAgent
monitor = AsyncAgent("monitor")
@monitor.on_message
async def handle(msg):
if msg.payload.get("type") == "anomaly":
# Hand off to triage — don't try to handle it ourselves
await monitor.open_session("triage", {
"alert": msg.payload["alert"],
"severity": msg.payload["severity"],
"trace_id": msg.trace_id,
})
await monitor.resolve(msg.session, "escalated")
await monitor.run_forever()
This is different from a pipeline. Pipelines are linear and predefined. A mesh lets agents collaborate dynamically based on what's actually happening — @triage might resolve the issue itself, or pull in @deploy, or escalate to @oncall. The monitor doesn't need to know.
Why not just use HTTP?
It's tempting to expose each agent as a REST API. Here's why that breaks down:
Networking. Every agent needs a known address. Move an agent to a new machine, and every client needs updating. Run agents on developer laptops behind NAT, and you need a tunnel or relay. Tailbus handles peer discovery and NAT traversal automatically — agents find each other by handle, not by IP.
Auth. Every pair of agents needs shared credentials. Three agents means three pairs of API keys. Ten agents means forty-five. With a mesh, identity is built in — agents authenticate once when joining, and the mesh handles the rest.
Retries and failures. HTTP gives you a status code. You build the retry logic, the circuit breakers, the timeout handling. A mesh gives you sessions with built-in delivery semantics.
Topology changes. Add a new agent to an HTTP system and you're updating configs across every service that needs to talk to it. Add an agent to a mesh and it's instantly discoverable by handle — zero config changes anywhere else.
The mesh isn't doing anything magic. It's handling the same problems you'd solve with HTTP — but solving them once, at the infrastructure level, instead of re-solving them in every agent.
Try it
Install the daemon and start two agents in separate terminals:
# Terminal 1 — start the daemon
tailbusd
# Terminal 2 — ping.py
from tailbus import AsyncAgent
ping = AsyncAgent("ping")
@ping.on_message
async def handle(msg):
print(f"Got: {msg.payload}")
await ping.resolve(msg.session, "pong")
await ping.run_forever()
# Terminal 3 — send a message
tailbus fire ping '{"hello": "world"}'
That's discovery, routing, and delivery — working in under a minute. From here, add more agents. They find each other by name. No URLs, no config files, no glue code.
The mesh grows with you — from two agents on a laptop to hundreds across your infrastructure.