OpenClaw
Konfiguration17 Min. Lesezeit

OpenClaw + Hermes Agent Dual-Stack: A Practical Playbook

Run OpenClaw + Hermes Agent together: architecture, message routing, Skills split, memory isolation, and real monthly cost numbers for dual-stack.

O

OpenClaw Guides

Tutorial Authors

Who this OpenClaw + Hermes Agent dual-stack playbook is for

This playbook will pay off for you if:

  • You're running a team-scale deployment that needs both multi-channel reach and heavy background automation
  • You already use OpenClaw and have recently started eyeing Hermes Agent's self-evolution features
  • You have actual DevOps capacity and are willing to operate two systems
  • You can articulate what "breadth" and "depth" mean concretely in your workload

This playbook will waste your time if:

  • You're a solo user with one or two channels
  • Your workload is homogeneous (all Q&A, or all research)
  • You don't have monitoring or DevOps chops
  • You're brand new to AI agents and haven't shipped either system yet

That's not condescension, that's a cost-efficiency warning. Dual-stack isn't "better" — it's "a better fit for a specific shape of problem." If you're not in that shape, section 8 will tell you what to do instead.

1. OpenClaw vs Hermes Agent: why this isn't an either/or

Our Hermes vs OpenClaw comparison lands on a simple split:

  • OpenClaw = breadth (20+ channels, a ready-made Skills marketplace, central-gateway architecture)
  • Hermes Agent = depth (persistent memory, self-generating Skills, sandboxed execution, learning loops)

These two dimensions are orthogonal, not a trade-off. Choosing OpenClaw doesn't make you "shallower," and choosing Hermes doesn't make you "narrower." They're solving different layers of the same problem:

  • OpenClaw solves "how does my agent get accessed" — channels, message protocols, human-to-bot UX
  • Hermes solves "how does my agent get stronger" — memory, learning, isolation

If you need both — "an agent that lives in every channel we use" and "an agent that gets better at the hard work over time" — picking one leaves a visible gap. The point of dual-stack is letting each system do only what it's actually good at: OpenClaw handles I/O, Hermes is the brain.

2. The canonical OpenClaw + Hermes dual-stack architecture

Core idea in one sentence: OpenClaw is the unified entry point for all messages; it handles light tasks itself and forwards heavy tasks to Hermes; when Hermes finishes, OpenClaw replies to the user through the original channel.

       ┌────────────────────────────────────────────────┐
       │         Messaging Channels (20+)               │
       │   WhatsApp / Telegram / Slack / Feishu / ...    │
       └──────────────────┬─────────────────────────────┘
                          │  incoming message
                          ▼
       ┌────────────────────────────────────────────────┐
       │         OpenClaw (Central Gateway)             │
       │  • Channel protocol adaptation                  │
       │  • Short-term session memory (last N msgs)      │
       │  • Upfront intent-classification Skill          │
       │  • Light tasks: Q&A, lookups, translation       │
       └──────────────────┬─────────────────────────────┘
                          │  forward-to-hermes
                          │  (user_id + task description)
                          ▼
       ┌────────────────────────────────────────────────┐
       │         Hermes Agent (Deep Execution)          │
       │  • Long-term memory (FTS5 + summaries + profile)│
       │  • Self-generating / self-improving Skills      │
       │  • Sandboxed execution                          │
       │    (Docker / SSH / Modal / Daytona)             │
       │  • Heavy tasks: research, review, reporting     │
       └──────────────────┬─────────────────────────────┘
                          │  result callback
                          ▼
              OpenClaw replies via the original channel

Three non-negotiable design principles:

  1. One messaging entry point. Users only send messages through OpenClaw's channels — they never talk to Hermes directly. This keeps channel credentials in exactly one place and avoids a world where both systems are fighting over which one owns "the bot."
  2. Forwarding is one-way. OpenClaw hands a task to Hermes; Hermes runs it in its own loop and delivers the result back via a callback. The two systems never read each other's databases.
  3. Short-term memory belongs to OpenClaw, long-term knowledge belongs to Hermes. This boundary must be absolutely crisp, and it's the single biggest source of bugs when it isn't. We'll dig into this in section 6.

3. Dual-stack deployment sizing: three tiers

Three canonical configurations depending on team size and budget.

Tier A — Entry level: single VPS, both systems on one box

  • Hardware: 1 × 4-core / 8GB VPS (≥40GB SSD), roughly $15–25/month
  • Hermes backend: local (shares the filesystem and process space with OpenClaw)
  • Fits: ≤5-person team or a heavy individual user, trying dual-stack before committing
  • Limits: OpenClaw and Hermes will compete for CPU; when Hermes is grinding on a heavy task, channel latency degrades
  • Deployment tip: manage both processes with systemd, and set a CPU quota (e.g. 50%) on the Hermes unit so it can't starve OpenClaw

Tier B — Production: two VPSes, separated

  • Hardware:
    • Gateway box: 2-core / 4GB, dedicated to OpenClaw, ~$10–15/month
    • Worker box: 4-core / 8GB or 8-core / 16GB, dedicated to Hermes, ~$20–35/month
  • Hermes backend: SSH (gateway dispatches tasks over SSH) or Docker (Hermes containerized on the worker)
  • Fits: 10–30-person teams, steady workload, needs resource isolation
  • Wins: gateway latency stays flat regardless of task load; you can scale the worker independently
  • Monthly total: roughly $30–50/month (infra only, excluding API costs)

Tier C — Enterprise: OpenClaw on-prem + Hermes on serverless

  • Hardware:
    • OpenClaw: 2–4 core / 4–8GB VPS on-prem, ~$15–25/month
    • Hermes: Modal or Daytona as the serverless backend, pay-per-use
  • Hermes backend: Modal / Daytona, which auto-sleep when idle and bill per second of execution
  • Fits: teams with bursty workloads, businesses that need cost elasticity, anyone with compliance isolation requirements
  • Wins: heavy tasks run in a fully isolated environment with crisp security boundaries; idle cost is near zero
  • Monthly total: roughly $20–40/month infra + metered serverless charges
  • Downside: cold-start latency (a few seconds on first task) — not a fit for latency-sensitive sync tasks

Picking a tier is easy: bursty workload with budget to spare → C; stable moderate load → B; testing the waters → A.

4. Message routing: OpenClaw vs Hermes task handoff

The central question of dual-stack: when OpenClaw receives a message, how does it decide whether to handle it locally or forward it to Hermes?

Three common approaches. You probably want to combine them.

Approach A — Keyword triggers

Dead simple. Define a set of trigger phrases in OpenClaw's config; if the message matches, forward:

json
// ~/.openclaw/openclaw.json (illustrative config — check current docs for field names)
{
  "routing": {
    "forwardToHermes": {
      "enabled": true,
      "triggers": {
        "keywords": [
          "research", "analyze", "deep dive", "investigate",
          "summarize this week", "generate report", "draft a plan"
        ]
      },
      "endpoint": "http://localhost:7788/hermes/task"
    }
  }
}

Pros: zero API cost, fully transparent rules. Cons: false positives — "can you analyze this quick question for me" is probably not a deep-research request.

Approach B — Upfront intent-classification Skill

Use a cheap model (Claude Haiku is ideal) to classify the message as simple or deep:

user message → OpenClaw
   ↓
upfront classifier Skill (Haiku, ~100 tokens)
   ↓
returns: { "intent": "deep", "confidence": 0.87 }
   ↓
confidence > 0.7 AND intent = deep → forward to Hermes
otherwise → OpenClaw handles locally

Pros: much higher accuracy, understands natural language intent. Cons: every message costs one extra API call. Haiku is cheap, but it adds up — see the model selection guide for cost context.

Approach C — Explicit command prefixes

Let users opt into deep processing explicitly:

@hermes turn this month's customer feedback into an analysis report
@deep dig into this open-source project's architecture
/research what's new in Next.js 15 RSC

Pros: fully user-controlled, zero false positives. Cons: depends on user education; new users won't know to use it.

A + C as the baseline, add B when scale justifies it.

Most teams do fine with A+C: power users who know they want depth use @hermes, and regular users saying "can you research this" get caught by the keyword trigger. Once your scale makes false-positive cost meaningful, layer in the Approach-B classifier as a safety net.

5. Skill ownership: OpenClaw Skills vs Hermes Skills

The single biggest footgun of dual-stack is writing the same Skill twice — once as a ClawHub install in OpenClaw, once as a self-generated Skill in Hermes. Suddenly both systems can answer the same question, and they give different answers.

Four rules for assigning ownership:

  1. Deterministic, stable-I/O, one-shot tools → OpenClaw. Translation, weather lookups, outbound notifications, calendar queries — anything where the answer is the same every time. Use a ClawHub Skill and move on.
  2. Anything that needs to learn or accumulate across sessions → Hermes. Code review (it needs to learn your codebase style), knowledge base curation, long-horizon writing-style training — workloads where "the more it runs, the better it gets." Let Hermes grow the Skill itself.
  3. Channel-bound capabilities must stay in OpenClaw. Feishu bot APIs, WeCom app integration, Slack App primitives — Hermes doesn't have these adapters at all, so this is non-negotiable.
  4. Sandbox-required or high-privilege tasks must go to Hermes. Running untrusted code, handling sensitive files, executing system commands — OpenClaw runs in-process with no sandbox, so these tasks are safer on Hermes.

Some typical examples:

TaskOwnerWhy
Translate a block of textOpenClawDeterministic, one-shot
Weather lookupOpenClawStateless query
Send a Feishu notificationOpenClawChannel-bound
Read a Git repo and do a code reviewHermesNeeds sandbox + learns the codebase
Generate monthly customer feedback reportHermesNeeds long-term memory + multi-step reasoning
Curate a knowledge base on a topicHermesCross-session accumulation
Post a weekly digest on a scheduleOpenClaw triggers + Hermes generatesScheduling on OpenClaw, content on Hermes

That last row is the canonical "hybrid task" — and it's exactly where dual-stack architecture earns its keep.

6. Memory isolation: the hardest part of dual-stack

This is the biggest footgun of dual-stack deployments, and the place most people mess it up first.

What goes wrong without isolation

If you don't enforce memory boundaries, you'll see exactly this sequence:

  • User asks OpenClaw in Feishu: "where's that product analysis you did for me last week?"
  • OpenClaw checks its own session memory — nothing (the task ran on Hermes)
  • OpenClaw then queries Hermes's memory, finds a summary
  • OpenClaw writes the Hermes summary into its own conversation history
  • Next time Hermes processes a new task, it reads OpenClaw's copy of the summary, treats it as authoritative, and starts reasoning from a derivative
  • Over time the two systems circular-reference each other and the context drifts further and further from reality

Root cause: both systems have memory, but nobody owns "the source of truth."

The correct setup

Four rules:

  1. OpenClaw stores only short-term session memory. Set historyLimit to 5–10. Its scope is "the last few turns of this conversation." It is not storing user profiles, long-term preferences, or any kind of knowledge base.
  2. Hermes owns all long-term memory. User preferences, project knowledge, past task results, self-generated Skills — all of it. This is Hermes's home court; don't try to keep a parallel copy in OpenClaw.
  3. Forward minimal payloads. When OpenClaw hands a task to Hermes, the payload contains only user_id + current message + channel context (e.g. @-mention info). Do not stuff OpenClaw's full session history into the request — let Hermes pull the relevant context from its own long-term store.
  4. Don't pollute OpenClaw on the callback. When Hermes returns a result, OpenClaw's job is exactly one thing: relay it to the channel. Don't copy Hermes's reasoning trace or quoted memory fragments into OpenClaw's database. If the user later asks "where's that report," OpenClaw should call the Hermes API to fetch it — not keep a local copy.

One-liner: OpenClaw is a stateless messenger; Hermes is the only memory authority. They're linked by user_id, but they never share storage.

See the OpenClaw config baseline for historyLimit guidance — in a dual-stack setup, push it even lower than the defaults.

7. Cost comparison: OpenClaw + Hermes dual-stack vs single-stack

Scenario: 10-person team, ~200 messages/day, three channels (Feishu + Slack + WhatsApp), ~20% of messages need deep processing.

Rough monthly cost (±30% in either direction, depending on workload shape):

SetupInfrastructureAPI callsMaintenance loadTotal (excl. headcount)
Single-stack OpenClaw$15–25$50–80Medium (upgrades + Skills maintenance)$65–105/month
Single-stack Hermes$10–15$35–60Low (self-optimizing)$45–75/month, but hard channel-coverage gap
Dual-stack (Tier B)$30–50$45–70High (two systems)$75–120/month

A few things worth noting:

  • Dual-stack is not the cheapest. The cheapest is single-stack Hermes — but only if you can live with its 6-channel limit. For teams that need Feishu/DingTalk/WeCom/QQ, that's a hard blocker.
  • Dual-stack API costs can undercut single-stack OpenClaw. Hermes's learning loop reduces token consumption on repeat tasks (second-time execution doesn't need to reason from scratch), so long-run API costs on the heavy half of your workload drop by 20–40%.
  • The real dual-stack overhead is infra + maintenance time. If you count ops hours as real money, dual-stack doesn't always win on spreadsheet cost.
  • Dual-stack buys capability complementarity, not savings. The extra ~$20–30/month buys you "all channels covered and long-term memory / self-evolution." No single-stack option gives you both.

Want to push costs down further? The model selection & cost optimization guide covers per-channel model routing and prompt caching — both strategies apply equally to dual-stack.

8. When you should not run OpenClaw + Hermes dual-stack

Honest discouragement, because the ROI on dual-stack is narrower than it looks. Any one of these and you should walk away:

  1. You're a solo user with 1–2 channels and a few dozen messages a day. The ops overhead will eat any value you get from it.
  2. Your workload is homogeneous. If 90%+ is Q&A or 90%+ is deep research, pick the matching single-stack and ship.
  3. You don't have DevOps capacity. Dual-stack assumes you can use systemd, read logs, and debug the occasional IPC hiccup.
  4. You don't have monitoring. Two systems means double the failure surface. Without monitoring, you won't know which side is broken when things go sideways.
  5. You don't yet know OpenClaw or Hermes well. Get one working in production first. When you actually hit a ceiling on that system, then consider adding the other.
  6. You're chasing "cheaper." Dual-stack is usually more expensive than a single stack on paper. Its value is complementarity, not savings.

Any one of those → default to single-stack OpenClaw. Its channel coverage is the largest safety zone; it won't be wrong. When you hit a concrete "deep-task bottleneck" that you can actually name, that's the moment to bring Hermes in.

Next steps

If you've read this and you're sure dual-stack fits your situation:

If you're still undecided:

One last reminder: dual-stack isn't a technical flex — it's a response to a workload that genuinely requires both breadth and depth. If your problem only lives in one of those dimensions, don't force it.