Agent Messaging: The Definitive Implementation Guide for A2A Systems
Effective agent messaging is the foundation of any production‑grade multi‑agent system. While agent communication describes the process of exchanging information, agent messaging is the concrete mechanism – the structured bytes that travel from one agent to another. This guide covers every practical detail you need to design, transmit, validate, and process messages between AI agents using A2A (Agent‑to‑Agent) protocols.
You will learn message lifecycle, payload design, serialization choices, validation strategies, reliability patterns, and operational best practices – all without diving into theoretical distributed systems or high‑level architectural patterns.
What Is Agent Messaging
Agent messaging is the mechanism through which AI agents exchange structured, self‑contained units of information – called messages – during communication and collaboration workflows. A message is a discrete packet that carries a request, response, event, or notification from a sender agent to one or more receiver agents.
Unlike raw data streaming or shared memory, agent messaging is:
- Structured – follows a predefined schema (JSON, Protobuf, etc.)
- Self‑describing – contains metadata (ID, type, timestamp, sender, receiver)
- Transport‑agnostic – can travel over HTTP, message queues, WebSockets, or file‑based channels
- Stateless – each message carries enough context to be processed independently (though agents may maintain external state)
In an A2A system, every interaction between agents is expressed as one or more messages. No side‑channel communication – everything goes through the messaging layer.
Why Agent Messaging Matters
Without a robust agent messaging implementation, agent coordination becomes fragile, unobservable, and non‑deterministic.
| Requirement | Why Messaging Is Essential |
|---|---|
| Communication reliability | Messages can be retried, acknowledged, and persisted. Guarantees delivery even during temporary agent failures. |
| Task execution | A request message encodes exactly what the sender wants (operation, parameters, deadline). The response message carries the result or error. |
| Context exchange | Messages carry conversation IDs, task provenance, and partial results so each agent understands the bigger picture. |
| Workflow coordination | Sequence of messages (request → acknowledge → progress events → final response) enables deterministic multi‑step workflows. |
Practical example: A Research Agent asks a Data Agent to “find all active customers.” Without structured messaging, the Data Agent cannot validate the request, the Research Agent cannot correlate the response, and neither can log or debug the exchange.
Messaging vs Communication
These terms are often used interchangeably, but in A2A implementation they have distinct meanings.
| Aspect | Agent Communication | Agent Messaging |
|---|---|---|
| Scope | End‑to‑end information exchange process | Concrete message implementation |
| Focus | Semantics, protocols, interaction patterns | Formats, serialization, validation, delivery |
| Questions answered | What is the agent trying to achieve? What pattern (request/response, event, publish/subscribe)? | How is the data structured? How is it encoded? How is it delivered reliably? |
| Examples | “Agent A asks Agent B for a vector search.” | “Message contains messageId, type: request, payload: {operation: vector_search, embedding: [...]}” |
Rule of thumb: Communication is why and what; messaging is how (the actual bytes on the wire).
For the rest of this article, we focus exclusively on the messaging layer – the implementation details that make communication possible.
Agent Message Lifecycle
A message travels through seven distinct stages. Production systems must implement each stage correctly.
Detailed breakdown:
- Message Creation – Sender builds a message object with unique ID, sender/receiver identifiers, timestamp, type, payload, and metadata.
- Validation (Sender) – Sender validates the message against a schema (fail fast to avoid wasting network round trips).
- Serialization – The message object is converted into bytes (JSON, Protobuf, etc.).
- Transmission – Bytes are sent over the chosen transport (HTTP POST, AMQP, Kafka produce, etc.).
- Reception – Receiver accepts the bytes and deserializes back into a message object.
- Validation (Receiver) – Receiver validates the message again (never trust the sender).
- Processing – Based on message type and payload, the receiver executes business logic and may generate a response message.
- Logging – Both sender and receiver log the message outcome (with redaction of sensitive fields).
Core Message Components
Every agent message must contain a standard set of fields. These are the minimum viable components for A2A messaging.
| Component | Description | Required | Example |
|---|---|---|---|
| Message ID | Unique identifier (UUIDv7 recommended) | Yes | 0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b |
| Sender | Logical name or address of originating agent | Yes | agent/research/v1 |
| Receiver | Logical name or address of target agent | Yes | agent/data/v1 |
| Timestamp | ISO 8601 UTC timestamp (creation time) | Yes | 2025-06-10T14:30:00.123Z |
| Message Type | Classifies the message semantics (request, response, event, notification, error) | Yes | request |
| Payload | The actual data (operation, parameters, result, error details) | Yes | {"operation": "query", "parameters": {...}} |
| Metadata | Auxiliary information (priority, TTL, retry count, trace ID) | Recommended | {"priority": 5, "ttl_ms": 30000} |
Additional optional fields (often placed inside metadata or a separate context object):
correlationId– for linking response to request (often same as original request’smessageId)replyTo– address where the receiver should send responses (queue name, callback URL)expiresAt– absolute expiration time (if different fromtimestamp + ttl)
Agent Message Structure
Below is a canonical JSON representation of an agent message. This structure works for all message types.
{
"message_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b",
"sender": "agent/research/alpha",
"receiver": "agent/data/v1",
"timestamp": "2025-06-10T14:30:00.123Z",
"type": "request",
"payload": {
"operation": "vector_search",
"parameters": {
"embedding": [0.12, -0.34, 0.56],
"top_k": 10
}
},
"metadata": {
"priority": 5,
"ttl_ms": 30000,
"retry_count": 0,
"trace_id": "trace_abc123"
}
}
Field‑by‑field explanation:
message_id– UUID version 7 (time‑ordered, sortable). Use UUIDv7 instead of v4 to simplify debugging and log correlation.sender– Logical identifier, not necessarily a network address. The receiver resolves it to a security principal for auth.receiver– Logical identifier. Can be a specific agent (data-agent-01) or a role (data-agent), which a router resolves.timestamp– When the message was created (not sent). Helps with ordering and expiration.type– Controls how the receiver processes the message. See next section.payload– The core content. Structure depends ontypeandoperation.metadata– Not for business logic. Used for transport and observability. Never modified by application code.
Response Message Example
{
"message_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2c",
"sender": "agent/data/v1",
"receiver": "agent/research/alpha",
"timestamp": "2025-06-10T14:30:00.456Z",
"type": "response",
"payload": {
"result": {
"documents": [
{"id": "doc_1", "score": 0.95},
{"id": "doc_2", "score": 0.87}
]
},
"error": null
},
"metadata": {
"correlation_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b",
"processing_time_ms": 333
}
}
Message Types
The type field is a discriminator. It tells the receiver how to interpret the payload and what behaviour to expect.
| Type | Direction | Expects Response? | Idempotent? | Use Case |
|---|---|---|---|---|
request | Sender → Receiver | Yes (one response) | Should be (design operation‑wise) | Ask another agent to perform work |
response | Receiver → Sender | No | N/A | Return result or error for a request |
event | Any → Any | No | Usually yes | Notify about state change, progress, or log |
notification | Any → Any | No (but transport ACK) | N/A | One‑way alert (e.g., “cache warmed up”) |
error | Any → Any | No | N/A | Specialised error message (can be used instead of response with error) |
Request Message Example
{
"type": "request",
"payload": {
"operation": "train_model",
"parameters": {
"dataset_uri": "s3://ml-bucket/training.parquet",
"algorithm": "random_forest"
}
}
}
Event Message Example
{
"type": "event",
"payload": {
"event_type": "progress",
"data": {
"step": 5,
"total_steps": 10,
"percent": 50
}
}
}
Note on error messages: While you can embed an error inside a response payload (as shown earlier), sometimes you need to send an error that is not tied to a specific request (e.g., a background job failed). For that, use type: error with a payload containing code, message, and details.
Message Payload Design
The payload is where your application‑specific data lives. Well‑designed payloads make agents robust and evolvable.
Structured Payloads
Always use a discriminator field inside the payload, such as operation or action. This allows the receiver to route to the correct handler without parsing other fields.
// Good – self‑describing
{
"payload": {
"operation": "query",
"table": "customers",
"filter": {"status": "active"}
}
}
// Bad – ambiguous
{
"payload": {
"query": "SELECT * FROM customers WHERE status = 'active'"
}
}
Schema Design for Payloads
Define a JSON Schema for each operation value. This enables automatic validation and documentation.
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "QueryOperation",
"type": "object",
"required": ["operation", "table"],
"properties": {
"operation": { "const": "query" },
"table": { "type": "string" },
"filter": { "type": "object" },
"limit": { "type": "integer", "minimum": 1, "maximum": 1000 }
}
}
Including Context in Payload
For messages that belong to a multi‑step workflow, include a context object inside the payload (not to be confused with the outer metadata).
{
"payload": {
"operation": "process_fraud_check",
"context": {
"conversation_id": "conv_xyz",
"user_id": "user_123",
"previous_decision": "approved"
},
"transaction_data": { ... }
}
}
Metadata Handling
The outer metadata field is only for infrastructure‑level concerns:
priority– for queue prioritisationttl_ms– time‑to‑live in millisecondsretry_count– number of previous delivery attemptsdelivery_mode–persistentornon‑persistent
Never place business data inside metadata. That belongs in payload.
Message Serialization
Serialization converts your message object into bytes for transmission. Choose based on your performance, schema evolution, and human‑readability needs.
| Format | Human‑readable | Schema | Binary | Size | Speed | Best for |
|---|---|---|---|---|---|---|
| JSON | Yes | Optional (JSON Schema) | No | Medium | Medium | Debugging, HTTP APIs, mixed environments |
| YAML | Yes (verbose) | Optional | No | Larger | Slow | Configuration, not recommended for runtime messaging |
| Protocol Buffers (Protobuf) | No (requires .proto) | Mandatory | Yes | Small | Very fast | High‑throughput internal agents, gRPC |
| MessagePack | No | Optional (but can use JSON Schema) | Yes | Small | Fast | Lightweight alternative to Protobuf with dynamic schemas |
Recommendation: Start with JSON + JSON Schema for development and debugging. When you hit performance or bandwidth limits, migrate to Protobuf for internal agent communication. Never use YAML for runtime messaging – parsing is slow and ambiguous (no strict types).
Example: Protobuf Definition
syntax = "proto3";
message AgentMessage {
string message_id = 1;
string sender = 2;
string receiver = 3;
string timestamp = 4;
string type = 5;
bytes payload = 6; // Encoded as separate Protobuf or JSON
map<string, string> metadata = 7;
}
For Protobuf, you typically define separate message types for each payload shape (e.g., QueryRequest, QueryResponse) and use a oneof field.
Message Validation
Validate every message twice: on the sender side (fail fast) and on the receiver side (defence in depth).
Schema Validation
Use JSON Schema (or Protobuf’s native schema validation) to verify the message envelope and payload.
Python example using jsonschema:
import jsonschema
from jsonschema import validate
ENVELOPE_SCHEMA = {
"type": "object",
"required": ["message_id", "sender", "receiver", "timestamp", "type", "payload"],
"properties": {
"message_id": {"type": "string", "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-7[0-9a-f]{3}-[0-9a-f]{4}-[0-9a-f]{12}$"},
"sender": {"type": "string"},
"receiver": {"type": "string"},
"timestamp": {"type": "string", "format": "date-time"},
"type": {"enum": ["request", "response", "event", "notification", "error"]},
"payload": {"type": "object"},
"metadata": {"type": "object"}
}
}
def validate_envelope(raw_dict):
try:
validate(instance=raw_dict, schema=ENVELOPE_SCHEMA)
except jsonschema.ValidationError as e:
raise InvalidMessageError(f"Envelope validation failed: {e.message}")
Required Fields Validation
Beyond schema validation, check for business‑required fields that depend on type or payload.operation.
def validate_request_payload(payload: dict):
if "operation" not in payload:
raise MissingFieldError("payload.operation is required for request messages")
if payload["operation"] == "query" and "table" not in payload:
raise MissingFieldError("payload.table is required for query operation")
Payload Validation
For each operation, validate parameter types, ranges, and existence of referenced resources.
def validate_query_operation(payload):
table = payload.get("table")
if not isinstance(table, str) or len(table) < 1:
raise InvalidParameterError("table must be a non‑empty string")
if "limit" in payload:
if not isinstance(payload["limit"], int) or payload["limit"] < 1 or payload["limit"] > 1000:
raise InvalidParameterError("limit must be integer between 1 and 1000")
Type Checking
Use language‑native type hints or a library like Pydantic (Python) or Zod (TypeScript) to enforce types at deserialisation time.
Pydantic example:
from pydantic import BaseModel, Field, validator
from uuid import UUID
from datetime import datetime
class AgentMessage(BaseModel):
message_id: UUID
sender: str
receiver: str
timestamp: datetime
type: Literal["request", "response", "event", "notification", "error"]
payload: dict
metadata: dict = Field(default_factory=dict)
@validator("type")
def validate_type(cls, v):
if v not in ("request", "response", "event", "notification", "error"):
raise ValueError(f"Invalid message type: {v}")
return v
Reliable Message Delivery
Messages can be lost, delayed, or duplicated. Implement these patterns for at‑least‑once or exactly‑once semantics.
Retries
Attach a retry_count to metadata. The sender automatically retries on transient failures (network timeouts, 5xx responses).
async def send_with_retry(message, max_retries=3, base_delay=1.0):
for attempt in range(max_retries + 1):
try:
message.metadata["retry_count"] = attempt
return await transport.send(message)
except TransientError as e:
if attempt == max_retries:
raise PermanentFailure(e)
delay = base_delay * (2 ** attempt) + random.uniform(0, 0.1)
await asyncio.sleep(delay)
Acknowledgements
For queue‑based messaging, use consumer acknowledgements. The message is removed from the queue only after the agent successfully processes it and explicitly acknowledges.
# RabbitMQ / Pika example
def on_message(channel, method, properties, body):
try:
process_message(body)
channel.basic_ack(delivery_tag=method.delivery_tag)
except Exception:
channel.basic_nack(delivery_tag=method.delivery_tag, requeue=True)
Duplicate Detection
Because of retries or broker redeliveries, agents may receive the same message twice. Store processed message_ids with a TTL (at least as long as the maximum retry window).
Redis implementation:
import redis
r = redis.Redis()
def is_duplicate(message_id: str, ttl_seconds: int = 3600) -> bool:
if r.setnx(f"processed:{message_id}", "1"):
r.expire(f"processed:{message_id}", ttl_seconds)
return False
return True
def process_message(message):
if is_duplicate(message.message_id):
log.info(f"Skipping duplicate message {message.message_id}")
return
# actual processing
Timeout Handling
Set a per‑message TTL (time‑to‑live). If the message sits in a queue longer than the TTL, the receiver should discard it.
def is_expired(message):
if "ttl_ms" in message.metadata:
age_ms = (utc_now() - message.timestamp).total_seconds() * 1000
return age_ms > message.metadata["ttl_ms"]
return False
Message Routing
Routing determines which agent instance receives a given message based on the receiver field.
Direct Routing
The sender knows the exact network address of the receiver (e.g., http://data-agent-1.internal:8080). Simple but fragile – not recommended for production.
Registry‑Based Routing
A service registry (e.g., Consul, etcd, or a simple Redis map) maps logical agent names to current network addresses.
class AgentRegistry:
def __init__(self):
self._registry = {} # logical_name -> address
def register(self, logical_name: str, address: str):
self._registry[logical_name] = address
def resolve(self, logical_name: str) -> str:
if logical_name not in self._registry:
raise UnknownAgentError(f"No agent registered as {logical_name}")
return self._registry[logical_name]
Dynamic Routing
For advanced scenarios, use a message router that inspects the message and decides the target based on content (e.g., payload.table routes to different database agents).
def route_message(message: AgentMessage) -> str:
if message.type != "request":
return message.receiver # fallback
operation = message.payload.get("operation")
if operation == "query":
table = message.payload.get("table")
if table.startswith("user_"):
return "agent/user-db"
else:
return "agent/analytics-db"
return message.receiver
Implementation note: Keep routing logic out of individual agents. Implement a separate router agent or use a smart message broker (e.g., RabbitMQ topic exchanges with bindings).
Error Messages
Errors deserve their own message structure, even when embedded inside a response.
Standard Error Structure
{
"type": "response",
"payload": {
"result": null,
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "Too many requests per second",
"details": {
"limit": 100,
"window_seconds": 60,
"retry_after_ms": 5000
},
"retryable": true
}
}
}
Error Codes (Recommended Categories)
| Category | Example Codes |
|---|---|
| Client errors (4xx equivalent) | INVALID_MESSAGE_SCHEMA, MISSING_REQUIRED_FIELD, UNSUPPORTED_OPERATION, UNAUTHORIZED |
| Server errors (5xx equivalent) | INTERNAL_ERROR, TIMEOUT, DEPENDENCY_FAILURE |
| Capacity errors | RATE_LIMIT_EXCEEDED, AGENT_OVERLOADED, QUEUE_FULL |
| Business errors | RESOURCE_NOT_FOUND, CONFLICT, PRECONDITION_FAILED |
Retryable vs Non‑Retryable
Include a retryable boolean in the error object.
- Retryable (
true) – network timeouts, 5xx, rate limits, agent overloaded. - Non‑retryable (
false) – schema violations, authentication failures, unknown operation, resource not found.
The sender’s retry logic checks this flag before attempting another send.
if error.get("retryable"):
schedule_retry(message)
else:
log.error("Non‑retryable error, failing immediately")
raise NonRetryableError(error["code"])
Message Security
Secure messaging is mandatory when agents run in different trust domains (e.g., separate microservices, external agents). Use this checklist.
Security Checklist for Agent Messaging
-
Authentication – Every message must prove the sender’s identity.
- mTLS (mutual TLS) for HTTP/2 and gRPC
- JWT (JSON Web Token) with short expiration and signature verification
- Pre‑shared symmetric key (HMAC) for lightweight internal systems
-
Authorization – Verify that the sender is allowed to send this message type and/or operation.
- Implement policy checks (e.g., Open Policy Agent) before processing.
- Example:
sender == "agent/research"is allowed to sendoperation: querybut notoperation: delete.
-
Integrity Verification – Ensure message was not tampered with in transit.
- TLS already provides integrity for the channel.
- For end‑to‑end integrity (beyond TLS), add a
signaturefield over the canonicalised message.
-
Confidentiality – Protect sensitive payload fields (PII, API keys, credentials).
- Use TLS for transport encryption.
- For fields that must not be visible to intermediaries, encrypt at the field level using envelope encryption.
-
Sensitive Data Handling – Never log full messages that contain secrets or PII.
- Redact fields named
password,secret,authorization,credit_card,ssnbefore logging.
- Redact fields named
Adding a signature:
import hmac, hashlib, json
def sign_message(message: dict, secret: bytes) -> str:
# Sort keys to ensure deterministic serialisation
canonical = json.dumps(message, sort_keys=True, separators=(',', ':'))
return hmac.new(secret, canonical.encode(), hashlib.sha256).hexdigest()
def verify_signature(message: dict, signature: str, secret: bytes) -> bool:
expected = sign_message({k:v for k,v in message.items() if k != "signature"}, secret)
return hmac.compare_digest(expected, signature)
Message Monitoring
You cannot debug messaging failures without observability. Export the following metrics from every agent and router.
| Metric | Type | Labels | Alert When |
|---|---|---|---|
agent_messages_sent_total | Counter | type, sender, receiver | – |
agent_messages_received_total | Counter | type, sender, receiver | – |
agent_message_size_bytes | Histogram | type | p99 > 256 KB |
agent_message_delivery_latency_seconds | Histogram | sender, receiver | p95 > 1s (sync) / > 5s (async) |
agent_message_validation_errors_total | Counter | error_code | Any increase > 1% |
agent_message_duplicates_detected_total | Counter | sender | Any (indicates retry issues) |
agent_message_retries_total | Counter | reason | > 10% of messages |
Logging recommendation: Emit a structured log for each message received and processed. Include message_id, type, sender, receiver, duration_ms, status (success/error). Redact sensitive fields.
{
"event": "message_processed",
"message_id": "0194f0a2-...",
"type": "request",
"operation": "query",
"sender": "research-agent",
"receiver": "data-agent",
"duration_ms": 42,
"status": "success"
}
Message Testing
Test messaging behaviour at all levels. Use real serialization and validation code, not mocks, where possible.
Unit Testing
Test message creation, validation logic, and serialization/deserialization in isolation.
def test_message_validation():
valid_msg = {
"message_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b",
"sender": "test",
"receiver": "test",
"timestamp": "2025-06-10T14:30:00Z",
"type": "request",
"payload": {"operation": "ping"}
}
assert validate_envelope(valid_msg) is True
invalid_msg = {**valid_msg, "type": "invalid"}
with pytest.raises(InvalidMessageError):
validate_envelope(invalid_msg)
Schema Validation Testing
Test that your JSON Schema rejects malformed payloads.
def test_query_payload_schema():
schema = load_schema("query_operation.json")
valid = {"operation": "query", "table": "users", "limit": 10}
assert is_valid(valid, schema)
invalid = {"operation": "query", "limit": -1}
assert not is_valid(invalid, schema)
Integration Testing
Spin up a real receiver agent and sender client, send a message, and assert the response.
async def test_request_response_integration():
receiver = DataAgent()
await receiver.start("localhost", 8888)
client = AgentClient("http://localhost:8888")
response = await client.send_request("data-agent", {"operation": "ping"})
assert response.type == "response"
assert response.payload["result"] == "pong"
await receiver.stop()
End‑to‑End Testing
Test a full workflow that involves multiple agents, a message broker, and external dependencies (using test containers).
def test_full_messaging_workflow():
with DockerContainer("rabbitmq:3") as broker:
with DockerContainer("data-agent") as data_agent:
with DockerContainer("research-agent") as research:
client = ResearchAgentClient(research.get_host())
result = client.run_analysis("customer_churn")
assert result.status == "completed"
assert "churn_report.pdf" in result.artifacts
Agent Messaging Best Practices
Adopt these 12 implementation guidelines for production‑grade agent messaging.
-
Use unique message IDs – UUIDv7 is preferred. Never reuse IDs.
-
Keep payloads small – Under 256 KB. For larger data, return a URI.
-
Validate every message twice – Sender (fail fast) + receiver (defence in depth).
-
Log every message (with redaction) – Essential for debugging and auditing.
-
Use structured schemas – JSON Schema or Protobuf. Avoid ad‑hoc formats.
-
Version your message formats – Include a
versionfield in the envelope. Never remove required fields; add optional ones. -
Set explicit timeouts and TTLs – No message should live forever in a queue.
-
Implement idempotency – Store processed
message_ids to safely retry. -
Make retry policies exponential with jitter – Prevent retry storms.
-
Separate metadata from payload – Infrastructure fields in
metadata, business data inpayload. -
Monitor all the metrics – Throughput, latency, errors, duplicates, retries.
-
Test schema changes – Use backward‑compatible evolution (add only, never remove/rename).
Common Messaging Mistakes
| Mistake | Consequence | Solution |
|---|---|---|
| Missing or non‑unique message IDs | Cannot deduplicate, cannot correlate responses, logs useless. | Generate UUIDv7 for every message. |
| Large payloads (>1 MB) | Network congestion, queue overflow, high latency. | Use reference (URI) to blob storage. |
| No schema validation | Malformed messages cause cryptic runtime crashes. | Enforce JSON Schema or Protobuf. |
| No retry mechanism | Transient failures become permanent. | Implement exponential backoff with max retries. |
| Inconsistent message formats | Receivers need custom parsers per sender. | Adopt a single envelope format across all agents. |
| Putting business data in metadata | Confusion, metadata size limits (e.g., Kafka headers). | Use payload for business data. |
| No message logging | Impossible to debug failures. | Log each message (redacted) with correlation ID. |
| Ignoring duplicate messages | Double processing, data corruption. | Store processed IDs with TTL. |
| Blocking on all responses | Poor throughput, cascading failures. | Use async patterns with timeouts. |
Case Study: Research Agent ↔ Data Agent Messaging
Scenario: A Research Agent needs to retrieve a list of high‑value customers from a Data Agent. The Data Agent exposes a query interface. The system uses JSON over HTTPS with registry‑based routing.
Step 1 – Message Structure Definition
Both agents share a common JSON Schema for the envelope and a schema for the query operation.
Envelope schema (excerpt):
{
"message_id": {"type": "string", "format": "uuid"},
"sender": {"type": "string"},
"receiver": {"type": "string"},
"type": {"enum": ["request", "response"]}
}
Query request payload schema:
{
"operation": {"const": "query"},
"table": {"type": "string", "enum": ["customers", "transactions"]},
"filter": {"type": "object"},
"limit": {"type": "integer", "minimum": 1, "maximum": 1000}
}
Step 2 – Request Flow
Research Agent constructs a message:
{
"message_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b",
"sender": "agent/research/v1",
"receiver": "agent/data/v1",
"timestamp": "2025-06-10T14:30:00.123Z",
"type": "request",
"payload": {
"operation": "query",
"table": "customers",
"filter": {"lifetime_value": {"$gte": 10000}},
"limit": 50
},
"metadata": {
"correlation_id": "conv_456",
"ttl_ms": 5000,
"priority": 3
}
}
The sender validates the envelope and payload schema. It then serialises to JSON and POSTs to https://data-agent.internal/v1/message (resolved from registry).
Step 3 – Response Flow
Data Agent receives the request, validates again, and processes. It returns:
{
"message_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2c",
"sender": "agent/data/v1",
"receiver": "agent/research/v1",
"timestamp": "2025-06-10T14:30:00.456Z",
"type": "response",
"payload": {
"result": {
"customers": [
{"id": "cust_001", "name": "Acme Corp", "lifetime_value": 150000},
{"id": "cust_002", "name": "Beta LLC", "lifetime_value": 120000}
],
"count": 2
},
"error": null
},
"metadata": {
"correlation_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b",
"processing_time_ms": 333
}
}
Step 4 – Error Handling (Simulated)
If the Data Agent cannot connect to its database, it returns a retryable error:
{
"type": "response",
"payload": {
"result": null,
"error": {
"code": "DATABASE_TIMEOUT",
"message": "Connection to database timed out after 10 seconds",
"retryable": true
}
}
}
Research Agent’s retry logic catches this, increments retry_count, and re‑sends the same request after a 1‑second delay (exponential). After 3 failures, it sends a notification to an operator.
Step 5 – Monitoring
Prometheus metrics from this interaction:
agent_messages_sent_total{type="request", sender="research", receiver="data"} 1
agent_messages_received_total{type="request", sender="research", receiver="data"} 1
agent_messages_sent_total{type="response", sender="data", receiver="research"} 1
agent_message_delivery_latency_seconds{sender="research", receiver="data"} 0.333
agent_message_validation_errors_total{error_code="none"} 0
Logs:
{"event": "message_sent", "message_id": "0194f0a2-...", "type": "request", "receiver": "data"}
{"event": "message_received", "message_id": "0194f0a2-...", "type": "request", "sender": "research"}
{"event": "message_processed", "message_id": "0194f0a2-...", "type": "request", "status": "success", "duration_ms": 333}
{"event": "message_sent", "message_id": "0194f0a2-...", "type": "response", "receiver": "research"}
FAQ
1. What is agent messaging?
Agent messaging is the structured, self‑describing mechanism through which AI agents exchange requests, responses, events, and notifications. It defines the format, validation, delivery, and observability of individual messages.
2. How is agent messaging different from agent communication?
Communication is the high‑level process (why and what agents exchange). Messaging is the concrete implementation (the actual bytes, structure, and delivery guarantees). Think of communication as the conversation and messaging as the sentences.
3. What fields are mandatory in every agent message?
message_id, sender, receiver, timestamp, type, and payload. metadata is strongly recommended but not strictly required.
4. Should messages be versioned?
Yes. Include a version field in the envelope (e.g., "version": "1.0"). When making breaking changes, increment the major version and support both versions during migration. Never remove required fields – add optional ones.
5. How do I choose between JSON and Protobuf?
Start with JSON + JSON Schema for flexibility and debugging. Move to Protobuf when you need smaller message size, faster parsing, or strict schema evolution in high‑throughput internal systems.
6. How do agents validate incoming messages?
They validate: (1) envelope schema (fields, types, required), (2) message type specific rules (e.g., request must have payload.operation), (3) payload schema based on operation, (4) authentication and authorisation.
7. What is the best way to correlate a response with its request?
The receiver copies the request’s message_id into the response’s metadata.correlation_id. The sender maintains a map from message_id to a future/promise/callback.
8. How do I handle duplicate messages?
Store processed message_ids in a fast key‑value store (Redis) with a TTL slightly longer than your maximum retry window (e.g., 1 hour). Before processing, check if the ID already exists.
9. What is a good TTL for messages?
Depends on your workflow. For interactive requests, 5–30 seconds. For background batch jobs, 5–60 minutes. Never set TTL to infinite – always have an expiry.
10. How do I test agent messaging without real agents?
Unit test message creation and validation with fake serializers. Integration test with an in‑memory channel or a lightweight message broker (e.g., Redis). For end‑to‑end, use test containers.
11. Should I use synchronous or asynchronous messaging?
Use synchronous (request‑response over HTTP) for low‑latency, interactive tasks. Use asynchronous (queues, event streams) for long‑running operations, high‑volume workloads, or when you need durability.
12. How do I secure messages between agents?
Use mTLS for transport, JWT or API keys for authentication, and field‑level encryption for sensitive data. Add message signatures for end‑to‑end integrity. Always authorise based on sender identity.
13. What metrics should I collect for agent messaging?
Sent/received counts, message size distribution, end‑to‑end latency, validation error rate, duplicate rate, retry rate, and success rate per message type.
14. How large can a message be?
Aim for < 256 KB. If you need more, design a two‑step pattern: first message contains a reference (URI), the receiver fetches the large data from object storage or a shared volume.
15. Can agents from different teams use different message formats?
Only if they agree on a common envelope (at least message_id, sender, receiver, type). The payload can be team‑specific, but the receiving agent must know how to deserialize it. Better to standardise fully.
16. How do I evolve a message schema without breaking existing agents?
Follow the robustness principle: be conservative in what you send, liberal in what you accept. Never remove or rename fields. Only add optional fields with default values. Use a version field and keep old handlers for at least one major version.
17. What happens if a message cannot be delivered after all retries?
Send it to a dead‑letter queue (DLQ) with the original metadata. Alert an operator. The DLQ can be replayed after fixing the issue.
Internal Linking Recommendations
Deepen your understanding of agent messaging by exploring these related implementation guides in the AgentDevPro Handbook:
/guides/a2a/– A2A protocol fundamentals/guides/a2a/agent-communication/– The broader process of information exchange between agents/guides/a2a/agent-collaboration/– How agents work together using messages/guides/agent-workflows/– Orchestrating multi‑step tasks with message sequences/guides/agent-tools/– How agents expose and consume tools via messaging/guides/agent-memory/– Sharing semantic memory across agents using messages/guides/mcp/client/– Model Context Protocol client integration for tool‑augmented agents
This article is part of the AgentDevPro Handbook – practical, engineering‑focused guides for building production AI agent systems.