Hermes Agent Skills Advanced Guide: SKILL.md, GEPA Self-Evolution & Skill Bundles (2026)
If your Hermes Agent still relies on one-shot prompts, you are paying full context cost every session while procedural knowledge never compounds. This guide is for advanced users and team leads building evolvable skill libraries: it covers agentskills.io SKILL.md format, Progressive Disclosure tiers, Skill Bundles YAML, conditional activation rules, GEPA + DSPy self-evolution ($2–10 per run), Skill Tap publishing, a hosting decision matrix, five-step Mac cloud Runbook, and five FAQ answers—so your Gateway keeps learning after you close the lid.
Table of contents
- Pain points: why Skills need a dedicated deep dive
- 1. Why Hermes Skills deserve their own guide
- 2. Skills ≠ Memory ≠ Prompts
- 3. SKILL.md format & Progressive Disclosure
- 4. Skill Bundles: one command, full workflow
- 5. Conditional Activation (4 rules)
- 6. Skills Hub & open-source repos
- 7. Publishing your Skill Tap
- 8. GEPA + DSPy self-evolution
- 9. Plugin-bundled skills
- 10. Authoring tips & skill_manage
- 11. Blog workflow case study
- 12. Hosting decision matrix
- 13. Five-step Runbook
- 14. Citeable technical facts
- 15. FAQ
- 16. Resources
- 17. Conclusion
Pain points: why Skills need a dedicated deep dive
- Prompts do not survive sessions or repos. Deployment checklists and PR templates live in chat history. Every new engineer re-pastes the same twelve steps. Skills move procedural knowledge into Git where review applies—but only if you understand SKILL.md routing and Progressive Disclosure.
- Loading everything burns context. Dumping all instructions into system prompts costs tokens on every turn. Hermes Skills load on demand via Level 0 descriptions (~3K tokens for the full catalog), yet most teams never tune descriptions or split references—so the wrong Skill fires or none fires at all.
- Evolution and uptime are disconnected. GEPA can improve SKILL.md text from execution traces, but if your Gateway sleeps on a laptop or runs on Linux without native macOS tooling, Skill scripts fail silently and evolution data never accumulates. See our Hermes three-layer memory and always-on hardware guide for why uptime compounds Skill value.
1. Why Hermes Skills deserve their own guide
In early 2026, Nous Research open-sourced Hermes Agent. Within two months it surpassed 160k GitHub stars—one of the fastest-growing AI agent projects. The thesis is not a bigger model; it is the agent that grows with you. Skills are the procedural memory layer that makes that growth real: standardized, evolvable, cross-session documents—not disposable prompts.
This post skips install basics. We go straight into SKILL.md authoring, Bundles, conditional activation, community Taps, and GEPA self-evolution—the mechanics that separate a demo Agent from a production skill library.
2. Skills ≠ Memory ≠ Prompts
| Dimension | Prompt | Memory | Skill |
|---|---|---|---|
| Persistence | Current conversation | Cross-session, permanent | Cross-session, permanent |
| Load timing | Always in context | Injected each session | On demand (key difference) |
| Token cost | Every turn | Small and stable | Zero until activated |
| Content type | Any intent | User preferences / facts | Procedural steps (how to do X) |
| Maintained by | User manually | Agent automatically | User + Agent |
| Shareability | Hard to share | Private | Publishable as community Tap |
Mnemonic: Prompt = sticky note (single use). Memory = notebook (always nearby). Skill = SOP manual (pulled when needed).
3. SKILL.md format & Progressive Disclosure
All Hermes Skills follow the agentskills.io open standard—portable across Hermes, Claude Code, and Cursor.
Directory layout under ~/.hermes/skills/:
Progressive Disclosure — three loading tiers
| Level | Content | When loaded | Token cost |
|---|---|---|---|
| Level 0 | name + description | Session start (all skills) | ~3K total catalog |
| Level 1 | Full SKILL.md body | /skill-name or LLM match | Depends on file length |
| Level 2 | references/, scripts/ | During execution | Per file, on demand |
Write descriptions for when, not what. The LLM routes on Level 0 text alone. Vague descriptions cause misfires; precise trigger phrases save tokens downstream.
4. Skill Bundles: one command, full workflow
Skill Bundles (2026) pack multiple skills into a single slash command. File location: ~/.hermes/skill-bundles/<slug>.yaml.
Research session bundle
Priority rules: Bundle beats a same-named Skill; missing skills are skipped with a warning; Bundles do not modify system prompts (prompt-cache friendly).
CLI quick create:
5. Conditional Activation — four rules
Skills can auto-hide or show based on available toolsets. Configure under metadata.hermes:
| Field | Behavior |
|---|---|
requires_toolsets | Hide skill when listed toolsets are missing |
requires_tools | Hide skill when listed tools are missing |
fallback_for_toolsets | Hide skill when listed toolsets exist (fallback only) |
fallback_for_tools | Hide skill when listed tools exist (fallback only) |
Classic pattern: duckduckgo-search sets fallback_for_tools: [web_search]—when Firecrawl/Brave keys activate paid search, the free fallback disappears automatically, saving tokens.
6. Skills Hub & open-source repos
| Repository | Description | Highlight |
|---|---|---|
| ChuckSRQ/awesome-hermes-skills | Production-grade curated skills | Deep Research, MLOps, Apple integration |
| amanning3390/hermeshub | Community registry with security scan | Prompt-injection detection per skill |
| kevinnft/ai-agent-skills | 191 skills, 28 categories | Cross Agent: Hermes / Claude / Cursor |
| NousResearch/hermes-agent | Official repo | Authoritative built-in skills |
Validate format compliance: skills-ref validate ./my-skill.
7. Publishing your Skill Tap
Version-control ~/.hermes/skills/ in Git for cross-device sync. After pull: hermes skills reset rebuilds built-ins.
8. GEPA + DSPy self-evolution
GEPA (Genetic-Pareto Prompt Evolution)—ICLR 2026 Oral—lives in hermes-agent-self-evolution. It improves SKILL.md text from execution traces without touching model weights. Cost: $2–10 per optimization run (API only, no GPU).
Five-stage pipeline
- Trace collection — SQLite stores full reasoning traces (tool calls, branches, errors).
- Reflective failure analysis — LLM generates actionable side information, not just "failed."
- Targeted mutation — 10–20 SKILL.md variants per failure root cause.
- Multi-objective Pareto evaluation — Optimize success rate × token efficiency × speed.
- Human PR review — Best variant opens a PR; ship after approval.
Four guardrails (all must pass)
- Full test suite:
pytest tests/ -qat 100% - Size limits: Skills ≤ 15KB; tool descriptions ≤ 500 chars
- Prompt-cache compatibility: no mid-session invalidation
- Semantic preservation: core purpose unchanged
Evolution roadmap
| Phase | Target | Engine | Status |
|---|---|---|---|
| Phase 1 | SKILL.md files | DSPy + GEPA | ✅ Shipped |
| Phase 2 | Tool descriptions | DSPy + GEPA | Planned |
| Phase 3 | System prompt fragments | DSPy + GEPA | Planned |
| Phase 4 | Tool implementation code | Darwinian Evolver | Planned |
| Phase 5 | Fully automated loop | Pipeline | Planned |
9. Plugin-bundled skills
Plugins namespace skills as plugin:skill—hidden from default skills_list, opt-in only, with sibling awareness:
In plugin.yaml:
10. Authoring tips & skill_manage
Description precision: Wrong: "Helps with code." Right: "Use when reviewing a pull request… Do NOT use for writing new code."
Pitfalls section separates good Skills from great ones—specific failure modes, root causes, and fixes (rate limits, selector brittleness, token overflow on large diffs).
Size guidance: <500 lines in SKILL.md; 500–1000 split to references/; >15KB blocks GEPA evolution.
Agents can maintain skills programmatically:
11. Blog workflow case study
The seo-keyword-research skill uses requires_toolsets: [web] and outputs a keyword matrix (3–5 primary + 10–15 long-tail per language) before any outline work begins—exactly the workflow behind multi-language VPSMAC blog production.
12. Hosting decision matrix: where Skills actually run
| Host | 7×24 uptime | GEPA trace collection | Native macOS / Xcode | Best fit |
|---|---|---|---|---|
| Local MacBook | ❌ Lid close drops Gateway | ❌ Gaps in session DB | ✅ | Authoring, short tests |
| Linux VPS | ✅ systemd | ✅ CLI-only skills | ❌ | Text agents, no Apple toolchain |
| VPSMAC Mac cloud | ✅ launchd | ✅ Continuous traces | ✅ Bare-metal SSH | Hermes Gateway + GEPA loop |
13. Five-step Runbook: production Skill library on Mac cloud
Step 1 — Audit and install base skills. Run hermes skills tap add for team Taps; validate with skills-ref validate. Document which Bundles map to which workflows.
Step 2 — Author or patch SKILL.md. Write Level 0 descriptions with trigger phrases; split references/ for anything over 500 lines. Enable agent_writes_require_approval in production.
Step 3 — Create Bundles and conditional rules. hermes bundles create blog-workflow --skills …; set fallback_for_tools for free/paid tool switching.
Step 4 — Deploy Gateway on VPSMAC Mac node. Sync ~/.hermes/skills/ and skill-bundles/ via Git; install Hermes with launchd KeepAlive. Confirm Cron and IM channels stay connected 7×24.
Step 5 — Enable GEPA evolution loop. Point HERMES_AGENT_PATH at the synced directory; run evolve_skill --eval-source sessiondb weekly; review PRs before merge. Backup ~/.hermes before instance changes.
14. Citeable technical facts (2026-06)
- Hermes Agent: 160k+ GitHub stars within two months of launch; closed learning loop via Skills + Honcho user model.
- Level 0 skill catalog: ~3K tokens for full discovery tier regardless of individual Skill body size.
- GEPA optimization: $2–10 per run, API-only; Skills must stay ≤ 15KB to pass guardrails.
kevinnft/ai-agent-skills: 191 skills across 28 categories, installable on Hermes, Claude Code, and Cursor.
15. FAQ
Skills vs MCP? Skills teach procedure; MCP supplies tools. Use both—Skills orchestrate MCP calls.
Skill changed but Agent uses old version? Edits apply only in new sessions (/reset) or with --now install (invalidates prompt cache).
Are GEPA-evolved Skills safe? Four guardrails + human PR review; still inspect every diff.
Reuse in Claude Code? Copy to ~/.claude/skills/ or use cross-platform install scripts.
Chinese content token cost? ~1–1.5 tokens per character; keep description in English for routing accuracy.
16. Resources
- Hermes Agent docs · Skills system · agentskills.io spec
- hermes-agent-self-evolution · gepa-ai/gepa · stanfordnlp/dspy
- awesome-hermes-skills · hermeshub · ai-agent-skills
17. Conclusion: Skills compound only when the Gateway keeps running
Laptop authoring, Docker on cheap VPS, and WSL2 can all host Hermes—but each leaves gaps: sleep interrupts trace collection, Linux lacks native Apple tooling for signing and Metal-backed scripts, and local hardware ties GEPA data to one machine with no clean backup story. Skill Bundles and conditional activation save tokens; GEPA turns failures into better SKILL.md text—but only if session databases grow continuously on stable hardware.
For teams treating Skills as infrastructure—not chat tricks—renting a VPSMAC Apple Silicon Mac cloud node delivers launchd 7×24 uptime, Git-synced ~/.hermes, and monthly RAM upgrades without buying new silicon. Ship the skill library on bare-metal macOS; let GEPA iterate while you review PRs—not while you chase uptime.