Introduction to A2A - agent-to-agent over gRPC
The Agent-to-Agent protocol gives multi-agent systems the same transport story REST gave web services. A walkthrough with Go.
By Harrison ItotiaApr 9, 202611 min read
Once you've built one agent, you eventually want two. A research agent that gathers facts; a writing agent that turns them into a draft. A scheduling agent and a meeting-prep agent. A specialist that summarises documents and a generalist that decides when to call it.
The naive answer is "just put both agents in the same process and let one call the other's Go function." That works until you want them on different machines, in different repos, written by different teams, or upgradable independently. At that point you need a protocol: a stable wire contract that says "here's how one agent asks another to do something, and here's how it gets the result back."
A2A - the Agent-to-Agent protocol - is that contract. It's gRPC-first (with a JSON-RPC fallback for browsers), protobuf-defined, and explicitly designed to model the things agents do that REST doesn't: long-running tasks, streaming progress, mid-flight human-in-the-loop, file artifacts, multi-turn conversations.
Why a new protocol
Couldn't you just use REST? Or plain gRPC? You can - and for "agent A wants the current time from agent B," it's overkill to reach for A2A. But once any of these are true, you'll find yourself reinventing pieces of A2A:
- The work takes more than a request-response cycle's worth of seconds.
- The caller wants progress updates, not just a final answer.
- The work might pause to ask the user something.
- The result includes more than text - a generated PDF, a chart, a structured object.
- The "conversation" continues across multiple back-and-forth turns.
Each one of those, REST handles by kludge: webhooks, polling, server-sent events bolted on, base64 blobs in JSON. A2A models them as first-class concepts in the proto.
The four messages
You only need to learn four things to read A2A traffic:
Message- what the caller sends, what the agent sends back for short interactions. A list ofParts (text, file, structured data).Task- server-side state for work that's bigger than a single message. Has a status (working, input-required, completed, failed) and accumulates artifacts.TaskStatusUpdateEvent- pushed during streaming. "I'm thinking," "I'm calling tool X," "I need you to confirm Y."TaskArtifactUpdateEvent- pushed during streaming. The actual output - a chunk of text, a PDF, a JSON object - possibly built up across many events.
The proto
Here's roughly what the A2A service definition looks like (simplified). The actual upstream proto has more methods and more fields, but if you understand these you can read any A2A trace:
syntax = "proto3"; package a2a.v1; service AgentService { // Unary: short, synchronous interactions. Caller gets back a single // Message (no Task created) for trivial requests like "hello". rpc SendMessage(SendMessageRequest) returns (SendMessageResponse) {} // Streaming: the agent emits many events while it works. THIS is the // method 90% of A2A traffic uses — long-running tasks, tool calls, // artifact chunks all flow through this stream. rpc SendMessageStream(SendMessageRequest) returns (stream Event) {} // Resume an in-flight task — used after human-in-the-loop, or when // a transport drops and the caller wants to reattach. rpc TasksResubscribe(TaskQueryRequest) returns (stream Event) {} // Catalog endpoint. Hosts read this to know what skills the agent // exposes. AgentCard is also published at GET /.well-known/agent-card.json. rpc GetAgentCard(google.protobuf.Empty) returns (AgentCard) {} } message Message { string message_id = 1; Role role = 2; // USER, AGENT repeated Part parts = 3; // text + files + structured data string context_id = 4; // multi-turn conversation key string task_id = 5; // attached task, if any } message Part { oneof data { string text = 1; FilePart file = 2; google.protobuf.Struct data = 3; // structured JSON payload } } message Task { string id = 1; string context_id = 2; TaskStatus status = 3; repeated Artifact artifacts = 4; } message Event { oneof event { Message message = 1; Task task = 2; TaskStatusUpdateEvent status_update = 3; TaskArtifactUpdateEvent artifact_update = 4; } } A few details that matter when you start writing one:
context_idis the conversation;task_idis one piece of work within it. A conversation can span many tasks.Partis a oneof, not just text. Files and structured data ride alongside. That's how an agent returns "here's a markdown summary AND a generated PDF AND the raw JSON" in a single response.- Status transitions are explicit:
SUBMITTED → WORKING → INPUT_REQUIRED ↔ WORKING → COMPLETED | FAILED | CANCELLED. TheINPUT_REQUIREDstate is what powers human-in-the-loop - the agent pauses, the caller (or a UI) supplies the answer, the task resumes.
Implementing the server side in Go
Here's a minimal Go agent that implements SendMessageStream. The interesting part is that the agent code is regular Go iterating over a model's response and emitting events - A2A's job is just to standardise the wire format of those events:
type agentExecutor struct{} func (a *agentExecutor) Execute( ctx context.Context, req *a2a.Request, q a2asrv.EventQueue, ) error { // 1. Tell the caller we've accepted the work and started a task. taskID := uuid.New().String() if err := q.Write(ctx, &a2a.Task{ Id: taskID, Status: a2a.TaskStatus{State: a2a.TaskStateSubmitted}, ContextId: req.Message.ContextId, }); err != nil { return err } // 2. Hand the user message to the LLM. As tokens stream back, push // them as artifact updates so the UI fills incrementally rather // than waiting for the whole response. artifactID := uuid.New().String() var first = true for chunk, err := range llm.Stream(ctx, req.Message) { if err != nil { q.Write(ctx, &a2a.TaskStatusUpdateEvent{ TaskId: taskID, Status: a2a.TaskStatus{State: a2a.TaskStateFailed}, }) return err } q.Write(ctx, &a2a.TaskArtifactUpdateEvent{ TaskId: taskID, Artifact: a2a.Artifact{ ArtifactId: artifactID, Parts: []a2a.Part{ {Text: chunk.Text} }, }, Append: !first, // first chunk replaces; subsequent chunks append }) first = false } // 3. Terminal state. The caller's stream closes after this. return q.Write(ctx, &a2a.TaskStatusUpdateEvent{ TaskId: taskID, Status: a2a.TaskStatus{State: a2a.TaskStateCompleted}, }) } func main() { handler := a2asrv.NewHandler(&agentExecutor{}) grpcSrv := grpc.NewServer() a2agrpc.RegisterAgentServiceServer(grpcSrv, handler.GRPC()) // Same handler, exposed twice: gRPC for service-to-service, JSON-RPC // over HTTP for browsers. Browsers can't speak gRPC directly without // grpc-web; JSON-RPC sidesteps that with plain fetch + SSE. mux := http.NewServeMux() mux.Handle("POST /jsonrpc", a2asrv.NewJSONRPCHandler(handler)) mux.Handle("GET /.well-known/agent-card.json", a2asrv.NewStaticAgentCardHandler(buildAgentCard())) // h2c lets the same listener carry both HTTP/2 (for gRPC) and the // HTTP/1.1 fetch from the SPA. srv := &http.Server{ Addr: ":8080", Handler: h2c.NewHandler(mux, &http2.Server{}), } log.Fatal(srv.ListenAndServe()) } The AgentCard - discovery and capability
Every A2A agent publishes an AgentCard at /.well-known/agent-card.json. It's the agent's resume: name, description, supported transports, what skills it can do, what input/output modes it accepts. Other agents (and humans) read it to know whether this agent is worth talking to.
{ "name": "research-agent", "description": "Specialist that gathers and summarises web sources for a topic.", "version": "1.2.0", "preferredTransport": "JSONRPC", "url": "https://research.example.com/jsonrpc", "additionalInterfaces": [ { "transport": "GRPC", "url": "research.example.com:443" } ], "capabilities": { "streaming": true, "pushNotifications": false }, "skills": [ { "id": "ResearchTopic", "name": "Research a topic", "description": "Given a topic, return a structured summary with citations.", "tags": ["research", "summarisation"] } ], "defaultInputModes": ["text/plain"], "defaultOutputModes": ["text/plain", "application/json"] } This is what makes agent ecosystems possible. You don't have to hardcode another agent's URL and method names - you publish a card, and any A2A client (your own host, a different team's host, a tool that crawls cards) can discover what you do.
Where MCP and A2A meet
MCP and A2A solve different problems and compose nicely:
- MCP exposes tools to a model. "Here's a function the LLM can call." It's how you give one agent capabilities.
- A2A exposes agents to other agents. "Here's another autonomous worker you can collaborate with." It's how you compose agents into systems.
In practice you build them in layers: each agent owns a set of MCP tools (its capabilities) and presents itself to the world via A2A (its public contract). When agent A wants something done, it doesn't import agent B's code - it sends a Message, watches the stream, and treats B as a black-box collaborator. Same way services have always communicated, just with the semantics that LLMs needed.