多 Agent 系統應該從哪種編排模式開始？

建議從順序流水線（Sequential Pipeline）起步，驗證核心價值後再按需引入並行扇出或層級主管模式。生產環境最佳 Agent 數量通常為 3-8 個，超過後協調開銷往往超過收益。

LangGraph、CrewAI、AutoGen 該如何選型？

需要生產級可靠性、複雜狀態管理與 Human-in-the-Loop 精細控制時選 LangGraph；快速原型與角色制內容流水線選 CrewAI；微軟/Azure 技術棧且需要多輪辯論式協作時選 AutoGen。

MCP 和 A2A 協議分別解決什麼問題？

MCP（垂直層）標準化 Agent 訪問外部工具、資料庫與 API；A2A（水平層）標準化 Agent 之間的任務委託、能力發現與狀態同步。兩者互補，均已納入 Linux Foundation Agentic AI Foundation 管理。

多Agent協作架構實戰：從設計模式到生產落地（2026）

2024–2025 年 AI Agent 從實驗室走向生產，但很多團隊很快發現：把所有任務塞給一個 LLM Agent，系統會在規模化時崩潰。本文面向 AI 工程師與架構師，系統講解多 Agent 協作的六大編排模式（含完整程式碼）、LangGraph/CrewAI/AutoGen 選型矩陣、MCP+A2A 雙協議層、PostgresSaver 生產工程與 MAST 可觀測性；內含 Google Bake-Off 6 倍提速資料、AdaptOrch 12-23% 拓撲增益、故障分佈統計、五步 Runbook 與選型決策樹。

引言：為什麼單個 Agent 不夠用了

2024 年至 2025 年，AI Agent 的概念從實驗室走向了生產。但很多團隊很快發現：把所有任務塞給一個 LLM Agent，系統會在規模化時崩潰。問題不在模型本身，而在架構——單 Agent 在上下文視窗、專業能力、併發執行與容錯方面存在結構性瓶頸。多 Agent 協作架構正是為了解決上述問題而生。

根據 MLflow 2026 年的報告，Google 內部的 Agent Bake-Off 實驗顯示，採用分散式多 Agent 架構後，處理時間從 1 小時降至 10 分鐘，提升幅度超過 6 倍。而 AdaptOrch（2026 年學術論文）進一步證明：在多 Agent 系統中，編排拓撲的選擇對系統效能的影響比底層模型的選擇更大，在 SWE-bench 等基準測試中，正確的拓撲選擇可以帶來 12-23% 的效能提升。

核心痛點：單 Agent 的四個結構性瓶頸

上下文視窗瓶頸。 複雜任務的中間結果會把上下文塞滿，導致後續推理質量驟降；檢索、分析、生成、稽核全部擠在同一視窗內，有效推理空間被嚴重壓縮。
專業能力稀釋。 一個 Agent 既要做資訊檢索、又要寫程式碼、又要做決策稽核，樣樣都做但樣樣不精；無法為每個子任務選用最優模型或專用工具鏈。
序列執行低效。 所有子任務順序執行，總耗時是每步耗時之和，無法併發；獨立子任務（如多源研究、多維度風險評估）白白浪費等待時間。
單點故障風險（SPOF）。 一旦這個 Agent 出問題——模型超時、工具呼叫失敗或幻覺級聯——整個流程全部停擺，缺乏區域性降級與重試隔離能力。

一、多 Agent 協作系統核心概念

1.1 基本定義

多 Agent 協作系統（Multi-Agent System，MAS）是指由多個獨立的 AI Agent 組成的系統，這些 Agent 透過明確的通訊協議和編排機制協作完成單個 Agent 無法高效完成的複雜任務。

每個 Agent 通常具備以下特徵：

特徵	描述
角色專一	只負責一個明確定義的子任務（檢索、推理、生成、驗證等）
工具訪問	擁有完成自身任務所需的特定工具集
狀態隔離	維護自己的上下文和記憶體，不汙染其他 Agent
可替換性	可以獨立升級、替換，不影響整體系統

1.2 三種控制拓撲

集中式（Centralized）          分散式（Decentralized）        層級式（Hierarchical）

     [Orchestrator]              A ←→ B ←→ C                  [Top Orchestrator]
    /      |      \                 ↕       ↕                   /           \
   [A]    [B]    [C]              D ←→ E ←→ F            [Team-1 Lead]  [Team-2 Lead]
                                                           /    \           /    \
優點: 可審計、可控             優點: 高彈性、低延遲          [a] [b]       [c]  [d]
缺點: 單點瓶頸               缺點: 除錯難、非確定性
                                                          優點: 兩者平衡

二、六大編排設計模式詳解

這六種模式覆蓋了生產中 95% 以上的多 Agent 系統場景。

2.1 模式一：順序流水線（Sequential Pipeline）

核心思路：Agent A 的輸出直接作為 Agent B 的輸入，嚴格線性執行。

[使用者輸入] → [資訊檢索Agent] → [分析Agent] → [撰寫Agent] → [稽核Agent] → [輸出]

適用場景：步驟間有嚴格依賴關係；流程固定、不需要動態路由；典型案例包括文章創作流水線、程式碼審查流程。

from langgraph.graph import StateGraph, START, END
from typing import TypedDict

class PipelineState(TypedDict):
    query: str
    retrieved_docs: str
    analysis: str
    final_report: str

def retrieval_agent(state: PipelineState):
    docs = search_knowledge_base(state["query"])
    return {"retrieved_docs": docs}

def analysis_agent(state: PipelineState):
    result = llm.invoke(f"分析以下內容：{state['retrieved_docs']}")
    return {"analysis": result.content}

def writer_agent(state: PipelineState):
    report = llm.invoke(f"根據分析撰寫報告：{state['analysis']}")
    return {"final_report": report.content}

builder = StateGraph(PipelineState)
builder.add_node("retriever", retrieval_agent)
builder.add_node("analyzer", analysis_agent)
builder.add_node("writer", writer_agent)
builder.add_edge(START, "retriever")
builder.add_edge("retriever", "analyzer")
builder.add_edge("analyzer", "writer")
builder.add_edge("writer", END)
pipeline = builder.compile()

優點	缺點
實現簡單，易於除錯	總耗時 = 各步耗時之和
行為可預測	單步失敗整體阻塞
適合合規審計	無法處理動態分支需求

2.2 模式二：並行扇出/扇入（Parallel Fan-out / Fan-in）

核心思路：多個 Agent 同時處理獨立的子任務，最後由匯聚節點合併結果。總耗時 = max(T1, T2, ..., Tn) 而非 T1 + T2 + ... + Tn。

                    ┌──→ [研究Agent-A] ──┐
[Supervisor] ──────├──→ [研究Agent-B] ──┼──→ [Synthesizer] → [輸出]
                    └──→ [研究Agent-C] ──┘

from langgraph.types import Send
from langgraph.graph import StateGraph, START, END
from typing import TypedDict, Annotated
import operator

class ResearchState(TypedDict):
    query: str
    research_results: Annotated[list, operator.add]
    final_synthesis: str

def supervisor(state: ResearchState):
    subtasks = [
        {"query": state["query"], "source": "academic"},
        {"query": state["query"], "source": "industry"},
        {"query": state["query"], "source": "news"},
    ]
    return [Send("research_worker", task) for task in subtasks]

def research_worker(state: dict):
    result = search_by_source(state["query"], state["source"])
    return {"research_results": [result]}

def synthesizer(state: ResearchState):
    combined = "\n".join(state["research_results"])
    synthesis = llm.invoke(f"綜合以下研究結果：{combined}")
    return {"final_synthesis": synthesis.content}

builder = StateGraph(ResearchState)
builder.add_node("research_worker", research_worker)
builder.add_node("synthesizer", synthesizer)
builder.add_conditional_edges(START, supervisor, ["research_worker"])
builder.add_edge("research_worker", "synthesizer")
builder.add_edge("synthesizer", END)
graph = builder.compile()

關鍵技術點：LangGraph 的 Send API 返回 Send 物件列表，子圖會真正併發執行。配合 Annotated[list, operator.add] Reducer，並行分支結果自動聚合，無需手寫鎖或同步邏輯。

2.3 模式三：層級主管-工人（Hierarchical Supervisor-Worker）

核心思路：主管 Agent 負責意圖識別、任務拆解和路由決策，將子任務分配給專業 Worker Agent，最後彙總結果。

           [使用者請求]
                ↓
         [Supervisor Agent]  ← 任務規劃 + 路由決策
        /         |         \
[程式碼Agent] [搜尋Agent] [資料Agent]
        \         |         /
         [Synthesizer Agent]
                ↓
           [最終輸出]

雙層路由最佳化（關鍵字快速通道 + LLM 精確路由）：

KEYWORD_ROUTING = {
    "程式碼": "code_agent",
    "code": "code_agent",
    "搜尋": "search_agent",
    "查詢": "search_agent",
    "資料": "data_agent",
}

def supervisor_with_fast_path(state):
    query = state["query"].lower()
    # 第一層：關鍵字快速通道（無需 LLM 呼叫，響應 <1ms）
    for keyword, agent_name in KEYWORD_ROUTING.items():
        if keyword in query:
            return {"next": agent_name}
    # 第二層：LLM 精確路由（處理複雜/模糊意圖）
    routing_prompt = f"""
    使用者請求：{state['query']}
    可用Agent：code_agent, search_agent, data_agent
    請返回最合適的Agent名稱，只返回名稱，不含其他內容。
    """
    decision = llm.invoke(routing_prompt)
    return {"next": decision.content.strip()}

2.4 模式四：群體協作（Swarm / Network）

核心思路：Agent 之間點對點直接傳遞任務，沒有中央協調者，依靠終止規則（輪數、共識、超時）停止協作。適合程式碼審查、方案評估等多輪協商場景；非確定性高，生產環境慎用。

import autogen

reviewer_1 = autogen.AssistantAgent(
    name="SecurityReviewer",
    system_message="你是一位安全專家，專注於程式碼中的安全漏洞。"
)
reviewer_2 = autogen.AssistantAgent(
    name="PerformanceReviewer",
    system_message="你是一位效能專家，專注於程式碼的效率和資源使用。"
)
human_proxy = autogen.UserProxyAgent(
    name="CodeAuthor",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=2,
    is_termination_msg=lambda x: "APPROVED" in x.get("content", "")
)
groupchat = autogen.GroupChat(
    agents=[human_proxy, reviewer_1, reviewer_2],
    messages=[],
    max_round=6  # 硬性終止防止無限迴圈
)
manager = autogen.GroupChatManager(groupchat=groupchat)

2.5 模式五：黑板架構（Blackboard）

核心思路：所有 Agent 共享一個結構化工作空間（黑板），Agent 在滿足自身前提條件時主動讀寫黑板，無需顯式排程。適合長時間非同步任務（小時級甚至天級）與異構服務協作。

                    ┌─────────────────────────────┐
                    │         黑板（共享狀態）         │
                    │  task_status: "research_done" │
                    │  research_data: {...}         │
                    │  analysis_result: null        │
                    └─────┬──────────────────┬──────┘
                          ↑ 寫入            ↓ 讀取（條件滿足時）
                   [研究Agent]          [分析Agent]
                   （完成後寫入）         （檢測到research_done後執行）

2.6 模式六：混合模式（Hybrid）

核心思路：在同一系統中組合使用多種模式，通常是「主管模式 + 流水線」的組合，以平衡控制性與效率。

[使用者請求]
    ↓
[Intent Agent]（路由器）
    ├──→ [簡單查詢] → 直接回答（無需多Agent）
    └──→ [複雜報告生成]
              ↓
         [Supervisor]（層級主管）
        /              \
[並行研究扇出]        [質量保障流水線]
 ↙     ↓     ↘           ↓
[A]   [B]   [C]    [稽核] → [人工稽核] → [釋出]
 ↘     ↓     ↙
  [Synthesizer]

三、LangGraph vs CrewAI vs AutoGen 對比

維度	LangGraph	CrewAI	AutoGen（微軟）
架構正規化	狀態機圖	角色制團隊	對話式多Agent
程式語言	Python / JS/TS	Python	Python / .NET
學習曲線	較陡	平緩	中等
狀態管理	原生支援	需自實現	有限支援
Human-in-the-Loop	原生支援	需自實現	支援
可觀測性	LangSmith（商業）	有限	Azure Monitor
生產就緒度	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
快速原型	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
Azure 整合	⭐⭐⭐	⭐⭐	⭐⭐⭐⭐⭐
適合場景	複雜有狀態工作流	角色制內容流水線	對話式協作

選型建議

選 LangGraph：需要生產級可靠性（合規、金融、醫療）、複雜狀態管理與持久化、Human-in-the-Loop 精細控制、條件分支和迴圈的精確表達。
選 CrewAI：快速驗證 Idea（1-2 天出原型）、團隊成員可用「角色」直覺理解 Agent、內容生成與研究報告等角色制場景。
選 AutoGen：處於微軟/Azure 技術棧、需要 Agent 之間多輪辯論和迭代推理、做研究或快速實驗不同對話模式。

四、MCP + A2A 雙協議層

2026 年，多 Agent 系統的通訊協議已標準化為兩層互補架構，兩者均已納入 Linux Foundation Agentic AI Foundation 管理。

┌──────────────────────────────────────────────────────┐
│                   多Agent系統                          │
│    Agent-1 ←──── A2A 協議 ────→ Agent-2              │
│       │                             │                │
│    MCP協議                       MCP協議              │
│       ↓                             ↓                │
│  [工具/資料庫/API]            [工具/資料庫/API]         │
└──────────────────────────────────────────────────────┘
MCP（垂直）：Agent ↔ 工具/外部系統
A2A（水平）：Agent ↔ Agent

4.1 MCP（Model Context Protocol）

由 Anthropic 主導、Linux Foundation 管理的工具接入標準協議，統一 Agent 訪問外部工具、資料庫、API 的介面。

from mcp.server import Server
from mcp.types import Tool, TextContent

app = Server("data-agent-mcp")

@app.list_tools()
async def list_tools():
    return [
        Tool(
            name="query_customer_db",
            description="查詢客戶資料庫，支援按ID、姓名、郵箱檢索",
            inputSchema={
                "type": "object",
                "properties": {
                    "field": {"type": "string", "enum": ["id", "name", "email"]},
                    "value": {"type": "string"}
                },
                "required": ["field", "value"]
            }
        )
    ]

@app.call_tool()
async def call_tool(name: str, arguments: dict):
    if name == "query_customer_db":
        result = db.query(arguments["field"], arguments["value"])
        return [TextContent(type="text", text=str(result))]

4.2 A2A（Agent-to-Agent Protocol）

由 Google 發起，2025 年 4 月開源，2026 年初發布 v1.0，已有 Atlassian、Salesforce、SAP 等 50+ 合作伙伴。標準化 Agent 之間的任務委託、能力發現、狀態同步。

// /.well-known/agent.json
{
  "name": "ResearchAgent",
  "version": "1.0",
  "description": "專業資訊檢索與摘要Agent",
  "url": "https://research-agent.internal/a2a",
  "capabilities": { "streaming": true, "async": true },
  "skills": [
    {
      "id": "web_research",
      "name": "網路資訊檢索",
      "description": "從網際網路檢索並摘要最新資訊",
      "tags": ["research", "summarization", "web"]
    }
  ]
}

import httpx

async def discover_and_delegate(agent_url: str, task: str):
    card_response = await httpx.get(f"{agent_url}/.well-known/agent.json")
    agent_card = card_response.json()
    available_skills = [s["id"] for s in agent_card["skills"]]
    if "web_research" not in available_skills:
        raise ValueError(f"Agent {agent_card['name']} 不支援 web_research 技能")
    payload = {
        "jsonrpc": "2.0",
        "method": "message/send",
        "id": "task-001",
        "params": {
            "message": {
                "role": "user",
                "parts": [{"type": "text", "text": task}]
            }
        }
    }
    response = await httpx.post(agent_card["url"], json=payload)
    return response.json()

五、生產級工程實踐

5.1 狀態持久化與斷點續傳（PostgresSaver）

from langgraph.checkpoint.postgres import PostgresSaver

with PostgresSaver.from_conn_string("postgresql://user:pass@localhost/agentdb") as checkpointer:
    graph = builder.compile(checkpointer=checkpointer)
    config = {"configurable": {"thread_id": "user-session-12345"}}
    # 即使程序重啟，也可以從上次狀態恢復
    result = graph.invoke({"query": "分析Q2財報"}, config)

5.2 Human-in-the-Loop（interrupt HITL）

from langgraph.types import interrupt

def high_risk_action_agent(state):
    proposed_action = plan_action(state)
    human_decision = interrupt({
        "proposed_action": proposed_action,
        "risk_level": "HIGH",
        "message": "此操作將修改生產資料庫，請確認是否執行"
    })
    if human_decision["approved"]:
        return execute_action(proposed_action)
    else:
        return {"status": "cancelled", "reason": human_decision.get("reason")}

5.3 熔斷器與重試機制（CircuitBreaker）

import time
from functools import wraps

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=60):
        self.failure_count = 0
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.state = "CLOSED"
        self.last_failure_time = None

    def __call__(self, func):
        @wraps(func)
        async def wrapper(*args, **kwargs):
            if self.state == "OPEN":
                if time.time() - self.last_failure_time > self.recovery_timeout:
                    self.state = "HALF_OPEN"
                else:
                    raise Exception("Circuit breaker OPEN - Agent 暫時不可用")
            try:
                result = await func(*args, **kwargs)
                if self.state == "HALF_OPEN":
                    self.state = "CLOSED"
                    self.failure_count = 0
                return result
            except Exception:
                self.failure_count += 1
                self.last_failure_time = time.time()
                if self.failure_count >= self.failure_threshold:
                    self.state = "OPEN"
                raise
        return wrapper

@CircuitBreaker(failure_threshold=3, recovery_timeout=30)
async def call_external_agent(task):
    return await agent_client.send(task)

5.4 Token 預算控制（TokenBudgetManager）

class TokenBudgetManager:
    def __init__(self, total_budget: int = 100_000):
        self.total_budget = total_budget
        self.used_tokens = 0
        self.agent_usage = {}

    def check_budget(self, agent_name: str, estimated_tokens: int) -> bool:
        remaining = self.total_budget - self.used_tokens
        if estimated_tokens > remaining:
            raise BudgetExceededException(
                f"Agent {agent_name} 請求 {estimated_tokens} tokens，"
                f"但剩餘預算僅 {remaining} tokens"
            )
        return True

    def record_usage(self, agent_name: str, actual_tokens: int):
        self.used_tokens += actual_tokens
        self.agent_usage[agent_name] = self.agent_usage.get(agent_name, 0) + actual_tokens

六、可觀測性：讓黑盒變透明

根據 MAST 研究團隊對 1642 個執行追蹤的分析，多 Agent 系統的故障分佈如下：

故障型別	佔比	說明
系統設計問題	41.77%	步驟重複、工具選擇錯誤、上下文溢位、缺少終止條件
Agent間不對齊	36.94%	交接時上下文丟失、一個Agent的幻覺成為下一個的「事實」
任務驗證失敗	21.30%	過早終止、不完整驗證

更令人擔憂的是：57% 的組織已有 Agent 在生產環境執行，但僅 8% 完成了 LLM 可觀測性的實施。大量錯誤以 HTTP 200 返回，監控面板顯示綠色，但系統實際上輸出的是錯誤結果。

6.1 OpenTelemetry 分散式追蹤

from opentelemetry import trace
import uuid

tracer = trace.get_tracer("multi-agent-system")

def traced_agent_call(agent_name: str, task: dict, correlation_id: str = None):
    if not correlation_id:
        correlation_id = str(uuid.uuid4())
    with tracer.start_as_current_span(f"agent.{agent_name}") as span:
        span.set_attribute("agent.name", agent_name)
        span.set_attribute("correlation.id", correlation_id)
        span.set_attribute("task.type", task.get("type", "unknown"))
        try:
            result = agent_registry[agent_name].run(task)
            span.set_attribute("agent.tokens_used", result.get("tokens", 0))
            span.set_attribute("agent.status", "success")
            return result
        except Exception as e:
            span.set_attribute("agent.status", "error")
            span.set_attribute("error.message", str(e))
            raise

6.2 關鍵監控指標

MONITORING_METRICS = {
    "task_success_rate": "端到端任務完成率（目標：>85%）",
    "e2e_latency_p95": "P95端到端延遲（目標：<30s）",
    "total_cost_per_task": "每次任務平均Token成本",
    "agent_error_rate": "各Agent錯誤率（目標：<5%）",
    "agent_retry_count": "重試次數（高重試 = 需要調查）",
    "tool_call_budget_usage": "工具呼叫次數/預算比",
    "output_quality_score": "輸出質量評分",
    "goal_alignment_score": "目標一致性評分",
    "hallucination_rate": "幻覺檢測率",
}

6.3 LLM-as-a-Judge 自動評估

def evaluate_agent_output(original_task: str, agent_output: str) -> dict:
    evaluation_prompt = f"""
    你是一位嚴格的質量評審專家。請評估以下AI Agent的輸出質量。
    原始任務：{original_task}
    Agent輸出：{agent_output}
    請從以下維度評分（1-5分）：
    1. 任務完成度  2. 準確性  3. 相關性  4. 是否存在幻覺
    請以JSON格式返回：
    {{"completeness": x, "accuracy": x, "relevance": x,
      "hallucination_detected": true/false, "comments": "..."}}
    """
    evaluation = llm.invoke(evaluation_prompt)
    return json.loads(evaluation.content)

七、常見踩坑與防坑指南

❌ 陷阱一：上下文汙染（Context Pollution）

現象：Agent A 產生幻覺，錯誤結果被傳給 Agent B、C，整個系統輸出基於錯誤前提，而所有 HTTP 狀態碼都是 200。

def validate_agent_output(output: dict, schema: dict) -> bool:
    jsonschema.validate(output, schema)
    if output.get("confidence_score", 1.0) < 0.7:
        raise LowConfidenceError(f"Agent 輸出置信度過低: {output['confidence_score']}")
    required_fields = schema.get("required", [])
    missing = [f for f in required_fields if not output.get(f)]
    if missing:
        raise MissingFieldsError(f"輸出缺少必填欄位: {missing}")
    return True

❌ 陷阱二：無限迴圈與代價失控

現象：Agent 進入重試迴圈或反覆呼叫工具，Token 費用在幾分鐘內暴漲至預期的百倍。

MAX_ITERATIONS = 10
MAX_TOOL_CALLS_PER_AGENT = 20
MAX_TOTAL_TOKENS = 50_000

graph = builder.compile(
    checkpointer=checkpointer,
    interrupt_before=["high_cost_tool"]
)

❌ 陷阱三：過度工程化

現象：為了使用多 Agent 而使用多 Agent，把簡單的兩步 LLM 鏈拆成 8 個 Agent，除錯難度指數級上升。

先從順序流水線開始。只有在有具體證據（併發需求、超過上下文限制、專業能力需要獨立升級）時，才增加 Agent 數量。生產系統的最佳 Agent 數量通常是 3-8 個。

❌ 陷阱四：Demo 到生產的鴻溝（ProductionGuardrails）

現象：內部 Demo 效果很好，上線後面對真實使用者的邊緣輸入就頻繁失敗。

class ProductionGuardrails:
    def __init__(self):
        self.input_validators = []
        self.output_validators = []

    def validate_input(self, user_input: str) -> str:
        if len(user_input) > 10000:
            raise InputTooLongError("輸入超過10000字元限制")
        injection_patterns = ["ignore previous instructions", "forget everything"]
        for pattern in injection_patterns:
            if pattern.lower() in user_input.lower():
                raise PromptInjectionError("檢測到潛在的提示注入攻擊")
        return user_input.strip()

    def validate_output(self, output: str) -> str:
        output = self.pii_filter.redact(output)
        if self.content_classifier.is_harmful(output):
            raise HarmfulContentError("輸出包含有害內容")
        return output

八、選型決策樹

你的任務是否有明確的線性依賴步驟？
├─ 是 → 子任務是否可以併發執行？
│        ├─ 否 → 【順序流水線】
│        └─ 是 → 【並行扇出 + 順序流水線 混合】
│
└─ 否 → 是否有一個 Agent 具有決策權威？
         ├─ 是 → 規模是否足夠大需要子團隊？
         │        ├─ 否 → 【Supervisor-Worker 層級模式】
         │        └─ 是 → 【層級式（Supervisors of Supervisors）】
         │
         └─ 否 → 任務是否是長時間非同步的？
                  ├─ 是 → 【黑板架構】
                  └─ 否 → Agent 數量是否 ≤ 5？
                           ├─ 是 → 【Swarm（注意設定終止條件）】
                           └─ 否 → 【考慮重新拆分為層級模式】

九、總結與 2026 趨勢展望

核心要點回顧

編排拓撲 > 模型選擇：AdaptOrch 研究證明，在多 Agent 系統中，如何組織 Agent 的協作方式比選擇什麼底層模型影響更大。
從簡單開始：先用順序流水線驗證核心價值，有具體需求時再引入併發和層級結構。
MCP + A2A 是未來標準：這兩個協議已成為行業共識，值得在新專案中直接採用。
可觀測性不是可選項：57% 的組織有 Agent 在生產執行，但僅 8% 完成了可觀測性實施——這個差距正是事故發生的溫床。
生產 Agent 數量 3-8 個最佳：超過這個數量，協調開銷往往超過收益，應考慮層級化。

2026 年值得關注的趨勢

聯邦編排（Federated Orchestration）：多團隊維護各自的子編排器，共享學習到的路由策略
多模態多 Agent：視覺、音訊 Agent 與文字 Agent 的混合協作正在成熟
自適應拓撲選擇：系統根據任務特徵動態選擇最優編排模式（AdaptOrch 方向）
EU AI Act 合規：歐盟法規要求完整的決策審計鏈，Agent 系統的可審計性成為強制要求

五步 Runbook：多 Agent 系統生產落地

步驟 1 — 評估任務拓撲

用選型決策樹判斷任務是否有線性依賴、可否併發、是否需要決策權威或長時間非同步；選定順序流水線、並行扇出、層級主管、黑板或混合模式。

步驟 2 — 選定框架並搭建骨架

按可靠性需求在 LangGraph（生產級狀態機）、CrewAI（快速角色制原型）、AutoGen（對話式協作）中選型；實現 Supervisor 路由與 Worker 節點，關鍵字快速通道優先。

步驟 3 — 接入 MCP 與 A2A 協議層

用 MCP Server 暴露工具能力（資料庫、API、檔案系統）；用 A2A Agent Card 實現跨 Agent 任務委託與能力發現，Orchestrator 透過 JSON-RPC 2.0 傳送任務。

步驟 4 — 加固生產工程

配置 PostgresSaver 斷點續傳、interrupt() HITL 人工稽核、CircuitBreaker 熔斷與 TokenBudgetManager 預算控制；設定 MAX_ITERATIONS 與 MAX_TOOL_CALLS 硬性上限。

步驟 5 — 部署可觀測性與護欄

接入 OpenTelemetry 分散式追蹤（correlation_id 貫穿呼叫鏈）、核心監控指標（task_success_rate、e2e_latency_p95、hallucination_rate）與 LLM-as-Judge 質量評估；啟用 ProductionGuardrails 輸入輸出校驗。

可引用技術要點（2026）

效能增益： Google Agent Bake-Off 顯示分散式多 Agent 架構將處理時間從 1 小時降至 10 分鐘（6 倍提速）；AdaptOrch 證明正確編排拓撲可帶來 12-23% 基準測試提升，影響大於底層模型選擇。
故障分佈（MAST，1642 條追蹤）： 系統設計問題 41.77%、Agent 間不對齊 36.94%、任務驗證失敗 21.30%；57% 組織已有生產 Agent，僅 8% 完成 LLM 可觀測性實施。
協議標準： MCP（垂直，Agent↔工具）與 A2A（水平，Agent↔Agent）均已納入 Linux Foundation AAIF；A2A v1.0 已有 Atlassian、Salesforce、SAP 等 50+ 合作伙伴。
生產引數： 推薦 Agent 數量 3-8 個；Token 預算預設 100,000/任務；熔斷閾值 failure_threshold=3-5；P95 端到端延遲目標 <30s，任務成功率目標 >85%。

結語與配套資源

多 Agent 協作不是「把一個大模型拆成多個小模型」這麼簡單——編排拓撲的選擇往往比模型選擇更重要。從順序流水線起步，按需引入並行扇出與層級主管；用 MCP+A2A 構建標準化通訊層；用 PostgresSaver、熔斷器與 Token 預算守住生產底線；用 OpenTelemetry 與 LLM-as-Judge 讓黑盒變透明。

配套資源：

LangGraph 文件：langchain-ai.github.io/langgraph
CrewAI 文件：docs.crewai.com
MCP 規範：modelcontextprotocol.io
A2A 協議：google.github.io/A2A

在普通 Linux VPS 或 Docker 容器中跑多 Agent 編排可以完成驗證，但缺乏原生 macOS 環境、Apple 工具鏈與 launchd 程序守護，Cursor/Claude Desktop 的 STDIO 子程序在筆記本合蓋即斷；Docker 增加抽象層與排障複雜度，多 Agent 長時執行任務的 PostgresSaver 與 OpenTelemetry Collector 在資源受限 VPS 上容易成為瓶頸。若你需要多 Agent 編排器、MCP Server 與 IDE Agent 長期同機 7×24 常駐、原生 macOS 與 M4 算力支撐併發 Worker，租賃 VPSMAC 的 Mac 雲節點通常是更省心、更適合 AI 自動化生產環境的選擇——編排拓撲寫一次，模型與 Worker 隨意換，節點始終線上。