Skip to main content

Agent Messaging: The Definitive Implementation Guide for A2A Systems

Effective agent messaging is the foundation of any production‑grade multi‑agent system. While agent communication describes the process of exchanging information, agent messaging is the concrete mechanism – the structured bytes that travel from one agent to another. This guide covers every practical detail you need to design, transmit, validate, and process messages between AI agents using A2A (Agent‑to‑Agent) protocols.

You will learn message lifecycle, payload design, serialization choices, validation strategies, reliability patterns, and operational best practices – all without diving into theoretical distributed systems or high‑level architectural patterns.

What Is Agent Messaging

Agent messaging is the mechanism through which AI agents exchange structured, self‑contained units of information – called messages – during communication and collaboration workflows. A message is a discrete packet that carries a request, response, event, or notification from a sender agent to one or more receiver agents.

Unlike raw data streaming or shared memory, agent messaging is:

  • Structured – follows a predefined schema (JSON, Protobuf, etc.)
  • Self‑describing – contains metadata (ID, type, timestamp, sender, receiver)
  • Transport‑agnostic – can travel over HTTP, message queues, WebSockets, or file‑based channels
  • Stateless – each message carries enough context to be processed independently (though agents may maintain external state)

In an A2A system, every interaction between agents is expressed as one or more messages. No side‑channel communication – everything goes through the messaging layer.

Why Agent Messaging Matters

Without a robust agent messaging implementation, agent coordination becomes fragile, unobservable, and non‑deterministic.

RequirementWhy Messaging Is Essential
Communication reliabilityMessages can be retried, acknowledged, and persisted. Guarantees delivery even during temporary agent failures.
Task executionA request message encodes exactly what the sender wants (operation, parameters, deadline). The response message carries the result or error.
Context exchangeMessages carry conversation IDs, task provenance, and partial results so each agent understands the bigger picture.
Workflow coordinationSequence of messages (request → acknowledge → progress events → final response) enables deterministic multi‑step workflows.

Practical example: A Research Agent asks a Data Agent to “find all active customers.” Without structured messaging, the Data Agent cannot validate the request, the Research Agent cannot correlate the response, and neither can log or debug the exchange.

Messaging vs Communication

These terms are often used interchangeably, but in A2A implementation they have distinct meanings.

AspectAgent CommunicationAgent Messaging
ScopeEnd‑to‑end information exchange processConcrete message implementation
FocusSemantics, protocols, interaction patternsFormats, serialization, validation, delivery
Questions answeredWhat is the agent trying to achieve? What pattern (request/response, event, publish/subscribe)?How is the data structured? How is it encoded? How is it delivered reliably?
Examples“Agent A asks Agent B for a vector search.”“Message contains messageId, type: request, payload: {operation: vector_search, embedding: [...]}

Rule of thumb: Communication is why and what; messaging is how (the actual bytes on the wire).

For the rest of this article, we focus exclusively on the messaging layer – the implementation details that make communication possible.

Agent Message Lifecycle

A message travels through seven distinct stages. Production systems must implement each stage correctly.

Detailed breakdown:

  1. Message Creation – Sender builds a message object with unique ID, sender/receiver identifiers, timestamp, type, payload, and metadata.
  2. Validation (Sender) – Sender validates the message against a schema (fail fast to avoid wasting network round trips).
  3. Serialization – The message object is converted into bytes (JSON, Protobuf, etc.).
  4. Transmission – Bytes are sent over the chosen transport (HTTP POST, AMQP, Kafka produce, etc.).
  5. Reception – Receiver accepts the bytes and deserializes back into a message object.
  6. Validation (Receiver) – Receiver validates the message again (never trust the sender).
  7. Processing – Based on message type and payload, the receiver executes business logic and may generate a response message.
  8. Logging – Both sender and receiver log the message outcome (with redaction of sensitive fields).

Core Message Components

Every agent message must contain a standard set of fields. These are the minimum viable components for A2A messaging.

ComponentDescriptionRequiredExample
Message IDUnique identifier (UUIDv7 recommended)Yes0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b
SenderLogical name or address of originating agentYesagent/research/v1
ReceiverLogical name or address of target agentYesagent/data/v1
TimestampISO 8601 UTC timestamp (creation time)Yes2025-06-10T14:30:00.123Z
Message TypeClassifies the message semantics (request, response, event, notification, error)Yesrequest
PayloadThe actual data (operation, parameters, result, error details)Yes{"operation": "query", "parameters": {...}}
MetadataAuxiliary information (priority, TTL, retry count, trace ID)Recommended{"priority": 5, "ttl_ms": 30000}

Additional optional fields (often placed inside metadata or a separate context object):

  • correlationId – for linking response to request (often same as original request’s messageId)
  • replyTo – address where the receiver should send responses (queue name, callback URL)
  • expiresAt – absolute expiration time (if different from timestamp + ttl)

Agent Message Structure

Below is a canonical JSON representation of an agent message. This structure works for all message types.

{
"message_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b",
"sender": "agent/research/alpha",
"receiver": "agent/data/v1",
"timestamp": "2025-06-10T14:30:00.123Z",
"type": "request",
"payload": {
"operation": "vector_search",
"parameters": {
"embedding": [0.12, -0.34, 0.56],
"top_k": 10
}
},
"metadata": {
"priority": 5,
"ttl_ms": 30000,
"retry_count": 0,
"trace_id": "trace_abc123"
}
}

Field‑by‑field explanation:

  • message_idUUID version 7 (time‑ordered, sortable). Use UUIDv7 instead of v4 to simplify debugging and log correlation.
  • senderLogical identifier, not necessarily a network address. The receiver resolves it to a security principal for auth.
  • receiver – Logical identifier. Can be a specific agent (data-agent-01) or a role (data-agent), which a router resolves.
  • timestamp – When the message was created (not sent). Helps with ordering and expiration.
  • type – Controls how the receiver processes the message. See next section.
  • payloadThe core content. Structure depends on type and operation.
  • metadataNot for business logic. Used for transport and observability. Never modified by application code.

Response Message Example

{
"message_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2c",
"sender": "agent/data/v1",
"receiver": "agent/research/alpha",
"timestamp": "2025-06-10T14:30:00.456Z",
"type": "response",
"payload": {
"result": {
"documents": [
{"id": "doc_1", "score": 0.95},
{"id": "doc_2", "score": 0.87}
]
},
"error": null
},
"metadata": {
"correlation_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b",
"processing_time_ms": 333
}
}

Message Types

The type field is a discriminator. It tells the receiver how to interpret the payload and what behaviour to expect.

TypeDirectionExpects Response?Idempotent?Use Case
requestSender → ReceiverYes (one response)Should be (design operation‑wise)Ask another agent to perform work
responseReceiver → SenderNoN/AReturn result or error for a request
eventAny → AnyNoUsually yesNotify about state change, progress, or log
notificationAny → AnyNo (but transport ACK)N/AOne‑way alert (e.g., “cache warmed up”)
errorAny → AnyNoN/ASpecialised error message (can be used instead of response with error)

Request Message Example

{
"type": "request",
"payload": {
"operation": "train_model",
"parameters": {
"dataset_uri": "s3://ml-bucket/training.parquet",
"algorithm": "random_forest"
}
}
}

Event Message Example

{
"type": "event",
"payload": {
"event_type": "progress",
"data": {
"step": 5,
"total_steps": 10,
"percent": 50
}
}
}

Note on error messages: While you can embed an error inside a response payload (as shown earlier), sometimes you need to send an error that is not tied to a specific request (e.g., a background job failed). For that, use type: error with a payload containing code, message, and details.

Message Payload Design

The payload is where your application‑specific data lives. Well‑designed payloads make agents robust and evolvable.

Structured Payloads

Always use a discriminator field inside the payload, such as operation or action. This allows the receiver to route to the correct handler without parsing other fields.

// Good – self‑describing
{
"payload": {
"operation": "query",
"table": "customers",
"filter": {"status": "active"}
}
}

// Bad – ambiguous
{
"payload": {
"query": "SELECT * FROM customers WHERE status = 'active'"
}
}

Schema Design for Payloads

Define a JSON Schema for each operation value. This enables automatic validation and documentation.

{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "QueryOperation",
"type": "object",
"required": ["operation", "table"],
"properties": {
"operation": { "const": "query" },
"table": { "type": "string" },
"filter": { "type": "object" },
"limit": { "type": "integer", "minimum": 1, "maximum": 1000 }
}
}

Including Context in Payload

For messages that belong to a multi‑step workflow, include a context object inside the payload (not to be confused with the outer metadata).

{
"payload": {
"operation": "process_fraud_check",
"context": {
"conversation_id": "conv_xyz",
"user_id": "user_123",
"previous_decision": "approved"
},
"transaction_data": { ... }
}
}

Metadata Handling

The outer metadata field is only for infrastructure‑level concerns:

  • priority – for queue prioritisation
  • ttl_ms – time‑to‑live in milliseconds
  • retry_count – number of previous delivery attempts
  • delivery_modepersistent or non‑persistent

Never place business data inside metadata. That belongs in payload.

Message Serialization

Serialization converts your message object into bytes for transmission. Choose based on your performance, schema evolution, and human‑readability needs.

FormatHuman‑readableSchemaBinarySizeSpeedBest for
JSONYesOptional (JSON Schema)NoMediumMediumDebugging, HTTP APIs, mixed environments
YAMLYes (verbose)OptionalNoLargerSlowConfiguration, not recommended for runtime messaging
Protocol Buffers (Protobuf)No (requires .proto)MandatoryYesSmallVery fastHigh‑throughput internal agents, gRPC
MessagePackNoOptional (but can use JSON Schema)YesSmallFastLightweight alternative to Protobuf with dynamic schemas

Recommendation: Start with JSON + JSON Schema for development and debugging. When you hit performance or bandwidth limits, migrate to Protobuf for internal agent communication. Never use YAML for runtime messaging – parsing is slow and ambiguous (no strict types).

Example: Protobuf Definition

syntax = "proto3";

message AgentMessage {
string message_id = 1;
string sender = 2;
string receiver = 3;
string timestamp = 4;
string type = 5;
bytes payload = 6; // Encoded as separate Protobuf or JSON
map<string, string> metadata = 7;
}

For Protobuf, you typically define separate message types for each payload shape (e.g., QueryRequest, QueryResponse) and use a oneof field.

Message Validation

Validate every message twice: on the sender side (fail fast) and on the receiver side (defence in depth).

Schema Validation

Use JSON Schema (or Protobuf’s native schema validation) to verify the message envelope and payload.

Python example using jsonschema:

import jsonschema
from jsonschema import validate

ENVELOPE_SCHEMA = {
"type": "object",
"required": ["message_id", "sender", "receiver", "timestamp", "type", "payload"],
"properties": {
"message_id": {"type": "string", "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-7[0-9a-f]{3}-[0-9a-f]{4}-[0-9a-f]{12}$"},
"sender": {"type": "string"},
"receiver": {"type": "string"},
"timestamp": {"type": "string", "format": "date-time"},
"type": {"enum": ["request", "response", "event", "notification", "error"]},
"payload": {"type": "object"},
"metadata": {"type": "object"}
}
}

def validate_envelope(raw_dict):
try:
validate(instance=raw_dict, schema=ENVELOPE_SCHEMA)
except jsonschema.ValidationError as e:
raise InvalidMessageError(f"Envelope validation failed: {e.message}")

Required Fields Validation

Beyond schema validation, check for business‑required fields that depend on type or payload.operation.

def validate_request_payload(payload: dict):
if "operation" not in payload:
raise MissingFieldError("payload.operation is required for request messages")
if payload["operation"] == "query" and "table" not in payload:
raise MissingFieldError("payload.table is required for query operation")

Payload Validation

For each operation, validate parameter types, ranges, and existence of referenced resources.

def validate_query_operation(payload):
table = payload.get("table")
if not isinstance(table, str) or len(table) < 1:
raise InvalidParameterError("table must be a non‑empty string")
if "limit" in payload:
if not isinstance(payload["limit"], int) or payload["limit"] < 1 or payload["limit"] > 1000:
raise InvalidParameterError("limit must be integer between 1 and 1000")

Type Checking

Use language‑native type hints or a library like Pydantic (Python) or Zod (TypeScript) to enforce types at deserialisation time.

Pydantic example:

from pydantic import BaseModel, Field, validator
from uuid import UUID
from datetime import datetime

class AgentMessage(BaseModel):
message_id: UUID
sender: str
receiver: str
timestamp: datetime
type: Literal["request", "response", "event", "notification", "error"]
payload: dict
metadata: dict = Field(default_factory=dict)

@validator("type")
def validate_type(cls, v):
if v not in ("request", "response", "event", "notification", "error"):
raise ValueError(f"Invalid message type: {v}")
return v

Reliable Message Delivery

Messages can be lost, delayed, or duplicated. Implement these patterns for at‑least‑once or exactly‑once semantics.

Retries

Attach a retry_count to metadata. The sender automatically retries on transient failures (network timeouts, 5xx responses).

async def send_with_retry(message, max_retries=3, base_delay=1.0):
for attempt in range(max_retries + 1):
try:
message.metadata["retry_count"] = attempt
return await transport.send(message)
except TransientError as e:
if attempt == max_retries:
raise PermanentFailure(e)
delay = base_delay * (2 ** attempt) + random.uniform(0, 0.1)
await asyncio.sleep(delay)

Acknowledgements

For queue‑based messaging, use consumer acknowledgements. The message is removed from the queue only after the agent successfully processes it and explicitly acknowledges.

# RabbitMQ / Pika example
def on_message(channel, method, properties, body):
try:
process_message(body)
channel.basic_ack(delivery_tag=method.delivery_tag)
except Exception:
channel.basic_nack(delivery_tag=method.delivery_tag, requeue=True)

Duplicate Detection

Because of retries or broker redeliveries, agents may receive the same message twice. Store processed message_ids with a TTL (at least as long as the maximum retry window).

Redis implementation:

import redis

r = redis.Redis()

def is_duplicate(message_id: str, ttl_seconds: int = 3600) -> bool:
if r.setnx(f"processed:{message_id}", "1"):
r.expire(f"processed:{message_id}", ttl_seconds)
return False
return True

def process_message(message):
if is_duplicate(message.message_id):
log.info(f"Skipping duplicate message {message.message_id}")
return
# actual processing

Timeout Handling

Set a per‑message TTL (time‑to‑live). If the message sits in a queue longer than the TTL, the receiver should discard it.

def is_expired(message):
if "ttl_ms" in message.metadata:
age_ms = (utc_now() - message.timestamp).total_seconds() * 1000
return age_ms > message.metadata["ttl_ms"]
return False

Message Routing

Routing determines which agent instance receives a given message based on the receiver field.

Direct Routing

The sender knows the exact network address of the receiver (e.g., http://data-agent-1.internal:8080). Simple but fragile – not recommended for production.

Registry‑Based Routing

A service registry (e.g., Consul, etcd, or a simple Redis map) maps logical agent names to current network addresses.

class AgentRegistry:
def __init__(self):
self._registry = {} # logical_name -> address

def register(self, logical_name: str, address: str):
self._registry[logical_name] = address

def resolve(self, logical_name: str) -> str:
if logical_name not in self._registry:
raise UnknownAgentError(f"No agent registered as {logical_name}")
return self._registry[logical_name]

Dynamic Routing

For advanced scenarios, use a message router that inspects the message and decides the target based on content (e.g., payload.table routes to different database agents).

def route_message(message: AgentMessage) -> str:
if message.type != "request":
return message.receiver # fallback
operation = message.payload.get("operation")
if operation == "query":
table = message.payload.get("table")
if table.startswith("user_"):
return "agent/user-db"
else:
return "agent/analytics-db"
return message.receiver

Implementation note: Keep routing logic out of individual agents. Implement a separate router agent or use a smart message broker (e.g., RabbitMQ topic exchanges with bindings).

Error Messages

Errors deserve their own message structure, even when embedded inside a response.

Standard Error Structure

{
"type": "response",
"payload": {
"result": null,
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "Too many requests per second",
"details": {
"limit": 100,
"window_seconds": 60,
"retry_after_ms": 5000
},
"retryable": true
}
}
}
CategoryExample Codes
Client errors (4xx equivalent)INVALID_MESSAGE_SCHEMA, MISSING_REQUIRED_FIELD, UNSUPPORTED_OPERATION, UNAUTHORIZED
Server errors (5xx equivalent)INTERNAL_ERROR, TIMEOUT, DEPENDENCY_FAILURE
Capacity errorsRATE_LIMIT_EXCEEDED, AGENT_OVERLOADED, QUEUE_FULL
Business errorsRESOURCE_NOT_FOUND, CONFLICT, PRECONDITION_FAILED

Retryable vs Non‑Retryable

Include a retryable boolean in the error object.

  • Retryable (true) – network timeouts, 5xx, rate limits, agent overloaded.
  • Non‑retryable (false) – schema violations, authentication failures, unknown operation, resource not found.

The sender’s retry logic checks this flag before attempting another send.

if error.get("retryable"):
schedule_retry(message)
else:
log.error("Non‑retryable error, failing immediately")
raise NonRetryableError(error["code"])

Message Security

Secure messaging is mandatory when agents run in different trust domains (e.g., separate microservices, external agents). Use this checklist.

Security Checklist for Agent Messaging

  • Authentication – Every message must prove the sender’s identity.

    • mTLS (mutual TLS) for HTTP/2 and gRPC
    • JWT (JSON Web Token) with short expiration and signature verification
    • Pre‑shared symmetric key (HMAC) for lightweight internal systems
  • Authorization – Verify that the sender is allowed to send this message type and/or operation.

    • Implement policy checks (e.g., Open Policy Agent) before processing.
    • Example: sender == "agent/research" is allowed to send operation: query but not operation: delete.
  • Integrity Verification – Ensure message was not tampered with in transit.

    • TLS already provides integrity for the channel.
    • For end‑to‑end integrity (beyond TLS), add a signature field over the canonicalised message.
  • Confidentiality – Protect sensitive payload fields (PII, API keys, credentials).

    • Use TLS for transport encryption.
    • For fields that must not be visible to intermediaries, encrypt at the field level using envelope encryption.
  • Sensitive Data Handling – Never log full messages that contain secrets or PII.

    • Redact fields named password, secret, authorization, credit_card, ssn before logging.

Adding a signature:

import hmac, hashlib, json

def sign_message(message: dict, secret: bytes) -> str:
# Sort keys to ensure deterministic serialisation
canonical = json.dumps(message, sort_keys=True, separators=(',', ':'))
return hmac.new(secret, canonical.encode(), hashlib.sha256).hexdigest()

def verify_signature(message: dict, signature: str, secret: bytes) -> bool:
expected = sign_message({k:v for k,v in message.items() if k != "signature"}, secret)
return hmac.compare_digest(expected, signature)

Message Monitoring

You cannot debug messaging failures without observability. Export the following metrics from every agent and router.

MetricTypeLabelsAlert When
agent_messages_sent_totalCountertype, sender, receiver
agent_messages_received_totalCountertype, sender, receiver
agent_message_size_bytesHistogramtypep99 > 256 KB
agent_message_delivery_latency_secondsHistogramsender, receiverp95 > 1s (sync) / > 5s (async)
agent_message_validation_errors_totalCountererror_codeAny increase > 1%
agent_message_duplicates_detected_totalCountersenderAny (indicates retry issues)
agent_message_retries_totalCounterreason> 10% of messages

Logging recommendation: Emit a structured log for each message received and processed. Include message_id, type, sender, receiver, duration_ms, status (success/error). Redact sensitive fields.

{
"event": "message_processed",
"message_id": "0194f0a2-...",
"type": "request",
"operation": "query",
"sender": "research-agent",
"receiver": "data-agent",
"duration_ms": 42,
"status": "success"
}

Message Testing

Test messaging behaviour at all levels. Use real serialization and validation code, not mocks, where possible.

Unit Testing

Test message creation, validation logic, and serialization/deserialization in isolation.

def test_message_validation():
valid_msg = {
"message_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b",
"sender": "test",
"receiver": "test",
"timestamp": "2025-06-10T14:30:00Z",
"type": "request",
"payload": {"operation": "ping"}
}
assert validate_envelope(valid_msg) is True

invalid_msg = {**valid_msg, "type": "invalid"}
with pytest.raises(InvalidMessageError):
validate_envelope(invalid_msg)

Schema Validation Testing

Test that your JSON Schema rejects malformed payloads.

def test_query_payload_schema():
schema = load_schema("query_operation.json")
valid = {"operation": "query", "table": "users", "limit": 10}
assert is_valid(valid, schema)

invalid = {"operation": "query", "limit": -1}
assert not is_valid(invalid, schema)

Integration Testing

Spin up a real receiver agent and sender client, send a message, and assert the response.

async def test_request_response_integration():
receiver = DataAgent()
await receiver.start("localhost", 8888)

client = AgentClient("http://localhost:8888")
response = await client.send_request("data-agent", {"operation": "ping"})

assert response.type == "response"
assert response.payload["result"] == "pong"

await receiver.stop()

End‑to‑End Testing

Test a full workflow that involves multiple agents, a message broker, and external dependencies (using test containers).

def test_full_messaging_workflow():
with DockerContainer("rabbitmq:3") as broker:
with DockerContainer("data-agent") as data_agent:
with DockerContainer("research-agent") as research:
client = ResearchAgentClient(research.get_host())
result = client.run_analysis("customer_churn")
assert result.status == "completed"
assert "churn_report.pdf" in result.artifacts

Agent Messaging Best Practices

Adopt these 12 implementation guidelines for production‑grade agent messaging.

  1. Use unique message IDs – UUIDv7 is preferred. Never reuse IDs.

  2. Keep payloads small – Under 256 KB. For larger data, return a URI.

  3. Validate every message twice – Sender (fail fast) + receiver (defence in depth).

  4. Log every message (with redaction) – Essential for debugging and auditing.

  5. Use structured schemas – JSON Schema or Protobuf. Avoid ad‑hoc formats.

  6. Version your message formats – Include a version field in the envelope. Never remove required fields; add optional ones.

  7. Set explicit timeouts and TTLs – No message should live forever in a queue.

  8. Implement idempotency – Store processed message_ids to safely retry.

  9. Make retry policies exponential with jitter – Prevent retry storms.

  10. Separate metadata from payload – Infrastructure fields in metadata, business data in payload.

  11. Monitor all the metrics – Throughput, latency, errors, duplicates, retries.

  12. Test schema changes – Use backward‑compatible evolution (add only, never remove/rename).

Common Messaging Mistakes

MistakeConsequenceSolution
Missing or non‑unique message IDsCannot deduplicate, cannot correlate responses, logs useless.Generate UUIDv7 for every message.
Large payloads (>1 MB)Network congestion, queue overflow, high latency.Use reference (URI) to blob storage.
No schema validationMalformed messages cause cryptic runtime crashes.Enforce JSON Schema or Protobuf.
No retry mechanismTransient failures become permanent.Implement exponential backoff with max retries.
Inconsistent message formatsReceivers need custom parsers per sender.Adopt a single envelope format across all agents.
Putting business data in metadataConfusion, metadata size limits (e.g., Kafka headers).Use payload for business data.
No message loggingImpossible to debug failures.Log each message (redacted) with correlation ID.
Ignoring duplicate messagesDouble processing, data corruption.Store processed IDs with TTL.
Blocking on all responsesPoor throughput, cascading failures.Use async patterns with timeouts.

Case Study: Research Agent ↔ Data Agent Messaging

Scenario: A Research Agent needs to retrieve a list of high‑value customers from a Data Agent. The Data Agent exposes a query interface. The system uses JSON over HTTPS with registry‑based routing.

Step 1 – Message Structure Definition

Both agents share a common JSON Schema for the envelope and a schema for the query operation.

Envelope schema (excerpt):

{
"message_id": {"type": "string", "format": "uuid"},
"sender": {"type": "string"},
"receiver": {"type": "string"},
"type": {"enum": ["request", "response"]}
}

Query request payload schema:

{
"operation": {"const": "query"},
"table": {"type": "string", "enum": ["customers", "transactions"]},
"filter": {"type": "object"},
"limit": {"type": "integer", "minimum": 1, "maximum": 1000}
}

Step 2 – Request Flow

Research Agent constructs a message:

{
"message_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b",
"sender": "agent/research/v1",
"receiver": "agent/data/v1",
"timestamp": "2025-06-10T14:30:00.123Z",
"type": "request",
"payload": {
"operation": "query",
"table": "customers",
"filter": {"lifetime_value": {"$gte": 10000}},
"limit": 50
},
"metadata": {
"correlation_id": "conv_456",
"ttl_ms": 5000,
"priority": 3
}
}

The sender validates the envelope and payload schema. It then serialises to JSON and POSTs to https://data-agent.internal/v1/message (resolved from registry).

Step 3 – Response Flow

Data Agent receives the request, validates again, and processes. It returns:

{
"message_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2c",
"sender": "agent/data/v1",
"receiver": "agent/research/v1",
"timestamp": "2025-06-10T14:30:00.456Z",
"type": "response",
"payload": {
"result": {
"customers": [
{"id": "cust_001", "name": "Acme Corp", "lifetime_value": 150000},
{"id": "cust_002", "name": "Beta LLC", "lifetime_value": 120000}
],
"count": 2
},
"error": null
},
"metadata": {
"correlation_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b",
"processing_time_ms": 333
}
}

Step 4 – Error Handling (Simulated)

If the Data Agent cannot connect to its database, it returns a retryable error:

{
"type": "response",
"payload": {
"result": null,
"error": {
"code": "DATABASE_TIMEOUT",
"message": "Connection to database timed out after 10 seconds",
"retryable": true
}
}
}

Research Agent’s retry logic catches this, increments retry_count, and re‑sends the same request after a 1‑second delay (exponential). After 3 failures, it sends a notification to an operator.

Step 5 – Monitoring

Prometheus metrics from this interaction:

agent_messages_sent_total{type="request", sender="research", receiver="data"} 1
agent_messages_received_total{type="request", sender="research", receiver="data"} 1
agent_messages_sent_total{type="response", sender="data", receiver="research"} 1
agent_message_delivery_latency_seconds{sender="research", receiver="data"} 0.333
agent_message_validation_errors_total{error_code="none"} 0

Logs:

{"event": "message_sent", "message_id": "0194f0a2-...", "type": "request", "receiver": "data"}
{"event": "message_received", "message_id": "0194f0a2-...", "type": "request", "sender": "research"}
{"event": "message_processed", "message_id": "0194f0a2-...", "type": "request", "status": "success", "duration_ms": 333}
{"event": "message_sent", "message_id": "0194f0a2-...", "type": "response", "receiver": "research"}

FAQ

1. What is agent messaging?
Agent messaging is the structured, self‑describing mechanism through which AI agents exchange requests, responses, events, and notifications. It defines the format, validation, delivery, and observability of individual messages.

2. How is agent messaging different from agent communication?
Communication is the high‑level process (why and what agents exchange). Messaging is the concrete implementation (the actual bytes, structure, and delivery guarantees). Think of communication as the conversation and messaging as the sentences.

3. What fields are mandatory in every agent message?
message_id, sender, receiver, timestamp, type, and payload. metadata is strongly recommended but not strictly required.

4. Should messages be versioned?
Yes. Include a version field in the envelope (e.g., "version": "1.0"). When making breaking changes, increment the major version and support both versions during migration. Never remove required fields – add optional ones.

5. How do I choose between JSON and Protobuf?
Start with JSON + JSON Schema for flexibility and debugging. Move to Protobuf when you need smaller message size, faster parsing, or strict schema evolution in high‑throughput internal systems.

6. How do agents validate incoming messages?
They validate: (1) envelope schema (fields, types, required), (2) message type specific rules (e.g., request must have payload.operation), (3) payload schema based on operation, (4) authentication and authorisation.

7. What is the best way to correlate a response with its request?
The receiver copies the request’s message_id into the response’s metadata.correlation_id. The sender maintains a map from message_id to a future/promise/callback.

8. How do I handle duplicate messages?
Store processed message_ids in a fast key‑value store (Redis) with a TTL slightly longer than your maximum retry window (e.g., 1 hour). Before processing, check if the ID already exists.

9. What is a good TTL for messages?
Depends on your workflow. For interactive requests, 5–30 seconds. For background batch jobs, 5–60 minutes. Never set TTL to infinite – always have an expiry.

10. How do I test agent messaging without real agents?
Unit test message creation and validation with fake serializers. Integration test with an in‑memory channel or a lightweight message broker (e.g., Redis). For end‑to‑end, use test containers.

11. Should I use synchronous or asynchronous messaging?
Use synchronous (request‑response over HTTP) for low‑latency, interactive tasks. Use asynchronous (queues, event streams) for long‑running operations, high‑volume workloads, or when you need durability.

12. How do I secure messages between agents?
Use mTLS for transport, JWT or API keys for authentication, and field‑level encryption for sensitive data. Add message signatures for end‑to‑end integrity. Always authorise based on sender identity.

13. What metrics should I collect for agent messaging?
Sent/received counts, message size distribution, end‑to‑end latency, validation error rate, duplicate rate, retry rate, and success rate per message type.

14. How large can a message be?
Aim for < 256 KB. If you need more, design a two‑step pattern: first message contains a reference (URI), the receiver fetches the large data from object storage or a shared volume.

15. Can agents from different teams use different message formats?
Only if they agree on a common envelope (at least message_id, sender, receiver, type). The payload can be team‑specific, but the receiving agent must know how to deserialize it. Better to standardise fully.

16. How do I evolve a message schema without breaking existing agents?
Follow the robustness principle: be conservative in what you send, liberal in what you accept. Never remove or rename fields. Only add optional fields with default values. Use a version field and keep old handlers for at least one major version.

17. What happens if a message cannot be delivered after all retries?
Send it to a dead‑letter queue (DLQ) with the original metadata. Alert an operator. The DLQ can be replayed after fixing the issue.

Internal Linking Recommendations

Deepen your understanding of agent messaging by exploring these related implementation guides in the AgentDevPro Handbook:

  • /guides/a2a/ – A2A protocol fundamentals
  • /guides/a2a/agent-communication/ – The broader process of information exchange between agents
  • /guides/a2a/agent-collaboration/ – How agents work together using messages
  • /guides/agent-workflows/ – Orchestrating multi‑step tasks with message sequences
  • /guides/agent-tools/ – How agents expose and consume tools via messaging
  • /guides/agent-memory/ – Sharing semantic memory across agents using messages
  • /guides/mcp/client/ – Model Context Protocol client integration for tool‑augmented agents

This article is part of the AgentDevPro Handbook – practical, engineering‑focused guides for building production AI agent systems.