Agent Messaging: The Definitive Implementation Guide for A2A Systems

Effective agent messaging is the foundation of any production‑grade multi‑agent system. While agent communication describes the process of exchanging information, agent messaging is the concrete mechanism – the structured bytes that travel from one agent to another. This guide covers every practical detail you need to design, transmit, validate, and process messages between AI agents using A2A (Agent‑to‑Agent) protocols.

You will learn message lifecycle, payload design, serialization choices, validation strategies, reliability patterns, and operational best practices – all without diving into theoretical distributed systems or high‑level architectural patterns.

What Is Agent Messaging

Agent messaging is the mechanism through which AI agents exchange structured, self‑contained units of information – called messages – during communication and collaboration workflows. A message is a discrete packet that carries a request, response, event, or notification from a sender agent to one or more receiver agents.

Unlike raw data streaming or shared memory, agent messaging is:

Structured – follows a predefined schema (JSON, Protobuf, etc.)
Self‑describing – contains metadata (ID, type, timestamp, sender, receiver)
Transport‑agnostic – can travel over HTTP, message queues, WebSockets, or file‑based channels
Stateless – each message carries enough context to be processed independently (though agents may maintain external state)

In an A2A system, every interaction between agents is expressed as one or more messages. No side‑channel communication – everything goes through the messaging layer.

Why Agent Messaging Matters

Without a robust agent messaging implementation, agent coordination becomes fragile, unobservable, and non‑deterministic.

Requirement	Why Messaging Is Essential
Communication reliability	Messages can be retried, acknowledged, and persisted. Guarantees delivery even during temporary agent failures.
Task execution	A request message encodes exactly what the sender wants (operation, parameters, deadline). The response message carries the result or error.
Context exchange	Messages carry conversation IDs, task provenance, and partial results so each agent understands the bigger picture.
Workflow coordination	Sequence of messages (request → acknowledge → progress events → final response) enables deterministic multi‑step workflows.

Practical example: A Research Agent asks a Data Agent to “find all active customers.” Without structured messaging, the Data Agent cannot validate the request, the Research Agent cannot correlate the response, and neither can log or debug the exchange.

Messaging vs Communication

These terms are often used interchangeably, but in A2A implementation they have distinct meanings.

Aspect	Agent Communication	Agent Messaging
Scope	End‑to‑end information exchange process	Concrete message implementation
Focus	Semantics, protocols, interaction patterns	Formats, serialization, validation, delivery
Questions answered	What is the agent trying to achieve? What pattern (request/response, event, publish/subscribe)?	How is the data structured? How is it encoded? How is it delivered reliably?
Examples	“Agent A asks Agent B for a vector search.”	“Message contains `messageId`, `type: request`, `payload: {operation: vector_search, embedding: [...]}`”

Rule of thumb: Communication is why and what; messaging is how (the actual bytes on the wire).

For the rest of this article, we focus exclusively on the messaging layer – the implementation details that make communication possible.

Agent Message Lifecycle

A message travels through seven distinct stages. Production systems must implement each stage correctly.

Detailed breakdown:

Message Creation – Sender builds a message object with unique ID, sender/receiver identifiers, timestamp, type, payload, and metadata.
Validation (Sender) – Sender validates the message against a schema (fail fast to avoid wasting network round trips).
Serialization – The message object is converted into bytes (JSON, Protobuf, etc.).
Transmission – Bytes are sent over the chosen transport (HTTP POST, AMQP, Kafka produce, etc.).
Reception – Receiver accepts the bytes and deserializes back into a message object.
Validation (Receiver) – Receiver validates the message again (never trust the sender).
Processing – Based on message type and payload, the receiver executes business logic and may generate a response message.
Logging – Both sender and receiver log the message outcome (with redaction of sensitive fields).

Core Message Components

Every agent message must contain a standard set of fields. These are the minimum viable components for A2A messaging.

Component	Description	Required	Example
Message ID	Unique identifier (UUIDv7 recommended)	Yes	`0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b`
Sender	Logical name or address of originating agent	Yes	`agent/research/v1`
Receiver	Logical name or address of target agent	Yes	`agent/data/v1`
Timestamp	ISO 8601 UTC timestamp (creation time)	Yes	`2025-06-10T14:30:00.123Z`
Message Type	Classifies the message semantics (`request`, `response`, `event`, `notification`, `error`)	Yes	`request`
Payload	The actual data (operation, parameters, result, error details)	Yes	`{"operation": "query", "parameters": {...}}`
Metadata	Auxiliary information (priority, TTL, retry count, trace ID)	Recommended	`{"priority": 5, "ttl_ms": 30000}`

Additional optional fields (often placed inside metadata or a separate context object):

correlationId – for linking response to request (often same as original request’s messageId)
replyTo – address where the receiver should send responses (queue name, callback URL)
expiresAt – absolute expiration time (if different from timestamp + ttl)

Agent Message Structure

Below is a canonical JSON representation of an agent message. This structure works for all message types.

{
  "message_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b",
  "sender": "agent/research/alpha",
  "receiver": "agent/data/v1",
  "timestamp": "2025-06-10T14:30:00.123Z",
  "type": "request",
  "payload": {
    "operation": "vector_search",
    "parameters": {
      "embedding": [0.12, -0.34, 0.56],
      "top_k": 10
    }
  },
  "metadata": {
    "priority": 5,
    "ttl_ms": 30000,
    "retry_count": 0,
    "trace_id": "trace_abc123"
  }
}

Field‑by‑field explanation:

message_id – UUID version 7 (time‑ordered, sortable). Use UUIDv7 instead of v4 to simplify debugging and log correlation.
sender – Logical identifier, not necessarily a network address. The receiver resolves it to a security principal for auth.
receiver – Logical identifier. Can be a specific agent (data-agent-01) or a role (data-agent), which a router resolves.
timestamp – When the message was created (not sent). Helps with ordering and expiration.
type – Controls how the receiver processes the message. See next section.
payload – The core content. Structure depends on type and operation.
metadata – Not for business logic. Used for transport and observability. Never modified by application code.

Response Message Example

{
  "message_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2c",
  "sender": "agent/data/v1",
  "receiver": "agent/research/alpha",
  "timestamp": "2025-06-10T14:30:00.456Z",
  "type": "response",
  "payload": {
    "result": {
      "documents": [
        {"id": "doc_1", "score": 0.95},
        {"id": "doc_2", "score": 0.87}
      ]
    },
    "error": null
  },
  "metadata": {
    "correlation_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b",
    "processing_time_ms": 333
  }
}

Message Types

The type field is a discriminator. It tells the receiver how to interpret the payload and what behaviour to expect.

Type	Direction	Expects Response?	Idempotent?	Use Case
`request`	Sender → Receiver	Yes (one `response`)	Should be (design operation‑wise)	Ask another agent to perform work
`response`	Receiver → Sender	No	N/A	Return result or error for a request
`event`	Any → Any	No	Usually yes	Notify about state change, progress, or log
`notification`	Any → Any	No (but transport ACK)	N/A	One‑way alert (e.g., “cache warmed up”)
`error`	Any → Any	No	N/A	Specialised error message (can be used instead of `response` with error)

Request Message Example

{
  "type": "request",
  "payload": {
    "operation": "train_model",
    "parameters": {
      "dataset_uri": "s3://ml-bucket/training.parquet",
      "algorithm": "random_forest"
    }
  }
}

Event Message Example

{
  "type": "event",
  "payload": {
    "event_type": "progress",
    "data": {
      "step": 5,
      "total_steps": 10,
      "percent": 50
    }
  }
}

Note on error messages: While you can embed an error inside a response payload (as shown earlier), sometimes you need to send an error that is not tied to a specific request (e.g., a background job failed). For that, use type: error with a payload containing code, message, and details.

Message Payload Design

The payload is where your application‑specific data lives. Well‑designed payloads make agents robust and evolvable.

Structured Payloads

Always use a discriminator field inside the payload, such as operation or action. This allows the receiver to route to the correct handler without parsing other fields.

// Good – self‑describing
{
  "payload": {
    "operation": "query",
    "table": "customers",
    "filter": {"status": "active"}
  }
}

// Bad – ambiguous
{
  "payload": {
    "query": "SELECT * FROM customers WHERE status = 'active'"
  }
}

Schema Design for Payloads

Define a JSON Schema for each operation value. This enables automatic validation and documentation.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "QueryOperation",
  "type": "object",
  "required": ["operation", "table"],
  "properties": {
    "operation": { "const": "query" },
    "table": { "type": "string" },
    "filter": { "type": "object" },
    "limit": { "type": "integer", "minimum": 1, "maximum": 1000 }
  }
}

Including Context in Payload

For messages that belong to a multi‑step workflow, include a context object inside the payload (not to be confused with the outer metadata).

{
  "payload": {
    "operation": "process_fraud_check",
    "context": {
      "conversation_id": "conv_xyz",
      "user_id": "user_123",
      "previous_decision": "approved"
    },
    "transaction_data": { ... }
  }
}

Metadata Handling

The outer metadata field is only for infrastructure‑level concerns:

priority – for queue prioritisation
ttl_ms – time‑to‑live in milliseconds
retry_count – number of previous delivery attempts
delivery_mode – persistent or non‑persistent

Never place business data inside metadata. That belongs in payload.

Message Serialization

Serialization converts your message object into bytes for transmission. Choose based on your performance, schema evolution, and human‑readability needs.

Format	Human‑readable	Schema	Binary	Size	Speed	Best for
JSON	Yes	Optional (JSON Schema)	No	Medium	Medium	Debugging, HTTP APIs, mixed environments
YAML	Yes (verbose)	Optional	No	Larger	Slow	Configuration, not recommended for runtime messaging
Protocol Buffers (Protobuf)	No (requires `.proto`)	Mandatory	Yes	Small	Very fast	High‑throughput internal agents, gRPC
MessagePack	No	Optional (but can use JSON Schema)	Yes	Small	Fast	Lightweight alternative to Protobuf with dynamic schemas

Recommendation: Start with JSON + JSON Schema for development and debugging. When you hit performance or bandwidth limits, migrate to Protobuf for internal agent communication. Never use YAML for runtime messaging – parsing is slow and ambiguous (no strict types).

Example: Protobuf Definition

syntax = "proto3";

message AgentMessage {
  string message_id = 1;
  string sender = 2;
  string receiver = 3;
  string timestamp = 4;
  string type = 5;
  bytes payload = 6;  // Encoded as separate Protobuf or JSON
  map<string, string> metadata = 7;
}

For Protobuf, you typically define separate message types for each payload shape (e.g., QueryRequest, QueryResponse) and use a oneof field.

Message Validation

Validate every message twice: on the sender side (fail fast) and on the receiver side (defence in depth).

Schema Validation

Use JSON Schema (or Protobuf’s native schema validation) to verify the message envelope and payload.

Python example using jsonschema:

import jsonschema
from jsonschema import validate

ENVELOPE_SCHEMA = {
    "type": "object",
    "required": ["message_id", "sender", "receiver", "timestamp", "type", "payload"],
    "properties": {
        "message_id": {"type": "string", "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-7[0-9a-f]{3}-[0-9a-f]{4}-[0-9a-f]{12}$"},
        "sender": {"type": "string"},
        "receiver": {"type": "string"},
        "timestamp": {"type": "string", "format": "date-time"},
        "type": {"enum": ["request", "response", "event", "notification", "error"]},
        "payload": {"type": "object"},
        "metadata": {"type": "object"}
    }
}

def validate_envelope(raw_dict):
    try:
        validate(instance=raw_dict, schema=ENVELOPE_SCHEMA)
    except jsonschema.ValidationError as e:
        raise InvalidMessageError(f"Envelope validation failed: {e.message}")

Required Fields Validation

Beyond schema validation, check for business‑required fields that depend on type or payload.operation.

def validate_request_payload(payload: dict):
    if "operation" not in payload:
        raise MissingFieldError("payload.operation is required for request messages")
    if payload["operation"] == "query" and "table" not in payload:
        raise MissingFieldError("payload.table is required for query operation")

Payload Validation

For each operation, validate parameter types, ranges, and existence of referenced resources.

def validate_query_operation(payload):
    table = payload.get("table")
    if not isinstance(table, str) or len(table) < 1:
        raise InvalidParameterError("table must be a non‑empty string")
    if "limit" in payload:
        if not isinstance(payload["limit"], int) or payload["limit"] < 1 or payload["limit"] > 1000:
            raise InvalidParameterError("limit must be integer between 1 and 1000")

Type Checking

Use language‑native type hints or a library like Pydantic (Python) or Zod (TypeScript) to enforce types at deserialisation time.

Pydantic example:

from pydantic import BaseModel, Field, validator
from uuid import UUID
from datetime import datetime

class AgentMessage(BaseModel):
    message_id: UUID
    sender: str
    receiver: str
    timestamp: datetime
    type: Literal["request", "response", "event", "notification", "error"]
    payload: dict
    metadata: dict = Field(default_factory=dict)

    @validator("type")
    def validate_type(cls, v):
        if v not in ("request", "response", "event", "notification", "error"):
            raise ValueError(f"Invalid message type: {v}")
        return v

Reliable Message Delivery

Messages can be lost, delayed, or duplicated. Implement these patterns for at‑least‑once or exactly‑once semantics.

Retries

Attach a retry_count to metadata. The sender automatically retries on transient failures (network timeouts, 5xx responses).

async def send_with_retry(message, max_retries=3, base_delay=1.0):
    for attempt in range(max_retries + 1):
        try:
            message.metadata["retry_count"] = attempt
            return await transport.send(message)
        except TransientError as e:
            if attempt == max_retries:
                raise PermanentFailure(e)
            delay = base_delay * (2 ** attempt) + random.uniform(0, 0.1)
            await asyncio.sleep(delay)

Acknowledgements

For queue‑based messaging, use consumer acknowledgements. The message is removed from the queue only after the agent successfully processes it and explicitly acknowledges.

# RabbitMQ / Pika example
def on_message(channel, method, properties, body):
    try:
        process_message(body)
        channel.basic_ack(delivery_tag=method.delivery_tag)
    except Exception:
        channel.basic_nack(delivery_tag=method.delivery_tag, requeue=True)

Duplicate Detection

Because of retries or broker redeliveries, agents may receive the same message twice. Store processed message_ids with a TTL (at least as long as the maximum retry window).

Redis implementation:

import redis

r = redis.Redis()

def is_duplicate(message_id: str, ttl_seconds: int = 3600) -> bool:
    if r.setnx(f"processed:{message_id}", "1"):
        r.expire(f"processed:{message_id}", ttl_seconds)
        return False
    return True

def process_message(message):
    if is_duplicate(message.message_id):
        log.info(f"Skipping duplicate message {message.message_id}")
        return
    # actual processing

Timeout Handling

Set a per‑message TTL (time‑to‑live). If the message sits in a queue longer than the TTL, the receiver should discard it.

def is_expired(message):
    if "ttl_ms" in message.metadata:
        age_ms = (utc_now() - message.timestamp).total_seconds() * 1000
        return age_ms > message.metadata["ttl_ms"]
    return False

Message Routing

Routing determines which agent instance receives a given message based on the receiver field.

Direct Routing

The sender knows the exact network address of the receiver (e.g., http://data-agent-1.internal:8080). Simple but fragile – not recommended for production.

Registry‑Based Routing

A service registry (e.g., Consul, etcd, or a simple Redis map) maps logical agent names to current network addresses.

class AgentRegistry:
    def __init__(self):
        self._registry = {}  # logical_name -> address

    def register(self, logical_name: str, address: str):
        self._registry[logical_name] = address

    def resolve(self, logical_name: str) -> str:
        if logical_name not in self._registry:
            raise UnknownAgentError(f"No agent registered as {logical_name}")
        return self._registry[logical_name]

Dynamic Routing

For advanced scenarios, use a message router that inspects the message and decides the target based on content (e.g., payload.table routes to different database agents).

def route_message(message: AgentMessage) -> str:
    if message.type != "request":
        return message.receiver  # fallback
    operation = message.payload.get("operation")
    if operation == "query":
        table = message.payload.get("table")
        if table.startswith("user_"):
            return "agent/user-db"
        else:
            return "agent/analytics-db"
    return message.receiver

Implementation note: Keep routing logic out of individual agents. Implement a separate router agent or use a smart message broker (e.g., RabbitMQ topic exchanges with bindings).

Error Messages

Errors deserve their own message structure, even when embedded inside a response.

Standard Error Structure

{
  "type": "response",
  "payload": {
    "result": null,
    "error": {
      "code": "RATE_LIMIT_EXCEEDED",
      "message": "Too many requests per second",
      "details": {
        "limit": 100,
        "window_seconds": 60,
        "retry_after_ms": 5000
      },
      "retryable": true
    }
  }
}

Error Codes (Recommended Categories)

Category	Example Codes
Client errors (4xx equivalent)	`INVALID_MESSAGE_SCHEMA`, `MISSING_REQUIRED_FIELD`, `UNSUPPORTED_OPERATION`, `UNAUTHORIZED`
Server errors (5xx equivalent)	`INTERNAL_ERROR`, `TIMEOUT`, `DEPENDENCY_FAILURE`
Capacity errors	`RATE_LIMIT_EXCEEDED`, `AGENT_OVERLOADED`, `QUEUE_FULL`
Business errors	`RESOURCE_NOT_FOUND`, `CONFLICT`, `PRECONDITION_FAILED`

Retryable vs Non‑Retryable

Include a retryable boolean in the error object.

Retryable (true) – network timeouts, 5xx, rate limits, agent overloaded.
Non‑retryable (false) – schema violations, authentication failures, unknown operation, resource not found.

The sender’s retry logic checks this flag before attempting another send.

if error.get("retryable"):
    schedule_retry(message)
else:
    log.error("Non‑retryable error, failing immediately")
    raise NonRetryableError(error["code"])

Message Security

Secure messaging is mandatory when agents run in different trust domains (e.g., separate microservices, external agents). Use this checklist.

Security Checklist for Agent Messaging

Authentication – Every message must prove the sender’s identity.
- mTLS (mutual TLS) for HTTP/2 and gRPC
- JWT (JSON Web Token) with short expiration and signature verification
- Pre‑shared symmetric key (HMAC) for lightweight internal systems
Authorization – Verify that the sender is allowed to send this message type and/or operation.
- Implement policy checks (e.g., Open Policy Agent) before processing.
- Example: sender == "agent/research" is allowed to send operation: query but not operation: delete.
Integrity Verification – Ensure message was not tampered with in transit.
- TLS already provides integrity for the channel.
- For end‑to‑end integrity (beyond TLS), add a signature field over the canonicalised message.
Confidentiality – Protect sensitive payload fields (PII, API keys, credentials).
- Use TLS for transport encryption.
- For fields that must not be visible to intermediaries, encrypt at the field level using envelope encryption.
Sensitive Data Handling – Never log full messages that contain secrets or PII.
- Redact fields named password, secret, authorization, credit_card, ssn before logging.

Adding a signature:

import hmac, hashlib, json

def sign_message(message: dict, secret: bytes) -> str:
    # Sort keys to ensure deterministic serialisation
    canonical = json.dumps(message, sort_keys=True, separators=(',', ':'))
    return hmac.new(secret, canonical.encode(), hashlib.sha256).hexdigest()

def verify_signature(message: dict, signature: str, secret: bytes) -> bool:
    expected = sign_message({k:v for k,v in message.items() if k != "signature"}, secret)
    return hmac.compare_digest(expected, signature)

Message Monitoring

You cannot debug messaging failures without observability. Export the following metrics from every agent and router.

Metric	Type	Labels	Alert When
`agent_messages_sent_total`	Counter	`type`, `sender`, `receiver`	–
`agent_messages_received_total`	Counter	`type`, `sender`, `receiver`	–
`agent_message_size_bytes`	Histogram	`type`	p99 > 256 KB
`agent_message_delivery_latency_seconds`	Histogram	`sender`, `receiver`	p95 > 1s (sync) / > 5s (async)
`agent_message_validation_errors_total`	Counter	`error_code`	Any increase > 1%
`agent_message_duplicates_detected_total`	Counter	`sender`	Any (indicates retry issues)
`agent_message_retries_total`	Counter	`reason`	> 10% of messages

Logging recommendation: Emit a structured log for each message received and processed. Include message_id, type, sender, receiver, duration_ms, status (success/error). Redact sensitive fields.

{
  "event": "message_processed",
  "message_id": "0194f0a2-...",
  "type": "request",
  "operation": "query",
  "sender": "research-agent",
  "receiver": "data-agent",
  "duration_ms": 42,
  "status": "success"
}

Message Testing

Test messaging behaviour at all levels. Use real serialization and validation code, not mocks, where possible.

Unit Testing

Test message creation, validation logic, and serialization/deserialization in isolation.

def test_message_validation():
    valid_msg = {
        "message_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b",
        "sender": "test",
        "receiver": "test",
        "timestamp": "2025-06-10T14:30:00Z",
        "type": "request",
        "payload": {"operation": "ping"}
    }
    assert validate_envelope(valid_msg) is True

    invalid_msg = {**valid_msg, "type": "invalid"}
    with pytest.raises(InvalidMessageError):
        validate_envelope(invalid_msg)

Schema Validation Testing

Test that your JSON Schema rejects malformed payloads.

def test_query_payload_schema():
    schema = load_schema("query_operation.json")
    valid = {"operation": "query", "table": "users", "limit": 10}
    assert is_valid(valid, schema)

    invalid = {"operation": "query", "limit": -1}
    assert not is_valid(invalid, schema)

Integration Testing

Spin up a real receiver agent and sender client, send a message, and assert the response.

async def test_request_response_integration():
    receiver = DataAgent()
    await receiver.start("localhost", 8888)

    client = AgentClient("http://localhost:8888")
    response = await client.send_request("data-agent", {"operation": "ping"})

    assert response.type == "response"
    assert response.payload["result"] == "pong"

    await receiver.stop()

End‑to‑End Testing

Test a full workflow that involves multiple agents, a message broker, and external dependencies (using test containers).

def test_full_messaging_workflow():
    with DockerContainer("rabbitmq:3") as broker:
        with DockerContainer("data-agent") as data_agent:
            with DockerContainer("research-agent") as research:
                client = ResearchAgentClient(research.get_host())
                result = client.run_analysis("customer_churn")
                assert result.status == "completed"
                assert "churn_report.pdf" in result.artifacts

Agent Messaging Best Practices

Adopt these 12 implementation guidelines for production‑grade agent messaging.

Use unique message IDs – UUIDv7 is preferred. Never reuse IDs.
Keep payloads small – Under 256 KB. For larger data, return a URI.
Validate every message twice – Sender (fail fast) + receiver (defence in depth).
Log every message (with redaction) – Essential for debugging and auditing.
Use structured schemas – JSON Schema or Protobuf. Avoid ad‑hoc formats.
Version your message formats – Include a version field in the envelope. Never remove required fields; add optional ones.
Set explicit timeouts and TTLs – No message should live forever in a queue.
Implement idempotency – Store processed message_ids to safely retry.
Make retry policies exponential with jitter – Prevent retry storms.
Separate metadata from payload – Infrastructure fields in metadata, business data in payload.
Monitor all the metrics – Throughput, latency, errors, duplicates, retries.
Test schema changes – Use backward‑compatible evolution (add only, never remove/rename).

Common Messaging Mistakes

Mistake	Consequence	Solution
Missing or non‑unique message IDs	Cannot deduplicate, cannot correlate responses, logs useless.	Generate UUIDv7 for every message.
Large payloads (>1 MB)	Network congestion, queue overflow, high latency.	Use reference (URI) to blob storage.
No schema validation	Malformed messages cause cryptic runtime crashes.	Enforce JSON Schema or Protobuf.
No retry mechanism	Transient failures become permanent.	Implement exponential backoff with max retries.
Inconsistent message formats	Receivers need custom parsers per sender.	Adopt a single envelope format across all agents.
Putting business data in metadata	Confusion, metadata size limits (e.g., Kafka headers).	Use payload for business data.
No message logging	Impossible to debug failures.	Log each message (redacted) with correlation ID.
Ignoring duplicate messages	Double processing, data corruption.	Store processed IDs with TTL.
Blocking on all responses	Poor throughput, cascading failures.	Use async patterns with timeouts.

Case Study: Research Agent ↔ Data Agent Messaging

Scenario: A Research Agent needs to retrieve a list of high‑value customers from a Data Agent. The Data Agent exposes a query interface. The system uses JSON over HTTPS with registry‑based routing.

Step 1 – Message Structure Definition

Both agents share a common JSON Schema for the envelope and a schema for the query operation.

Envelope schema (excerpt):

{
  "message_id": {"type": "string", "format": "uuid"},
  "sender": {"type": "string"},
  "receiver": {"type": "string"},
  "type": {"enum": ["request", "response"]}
}

Query request payload schema:

{
  "operation": {"const": "query"},
  "table": {"type": "string", "enum": ["customers", "transactions"]},
  "filter": {"type": "object"},
  "limit": {"type": "integer", "minimum": 1, "maximum": 1000}
}

Step 2 – Request Flow

Research Agent constructs a message:

{
  "message_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b",
  "sender": "agent/research/v1",
  "receiver": "agent/data/v1",
  "timestamp": "2025-06-10T14:30:00.123Z",
  "type": "request",
  "payload": {
    "operation": "query",
    "table": "customers",
    "filter": {"lifetime_value": {"$gte": 10000}},
    "limit": 50
  },
  "metadata": {
    "correlation_id": "conv_456",
    "ttl_ms": 5000,
    "priority": 3
  }
}

The sender validates the envelope and payload schema. It then serialises to JSON and POSTs to https://data-agent.internal/v1/message (resolved from registry).

Step 3 – Response Flow

Data Agent receives the request, validates again, and processes. It returns:

{
  "message_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2c",
  "sender": "agent/data/v1",
  "receiver": "agent/research/v1",
  "timestamp": "2025-06-10T14:30:00.456Z",
  "type": "response",
  "payload": {
    "result": {
      "customers": [
        {"id": "cust_001", "name": "Acme Corp", "lifetime_value": 150000},
        {"id": "cust_002", "name": "Beta LLC", "lifetime_value": 120000}
      ],
      "count": 2
    },
    "error": null
  },
  "metadata": {
    "correlation_id": "0194f0a2-9e8c-7a3b-8b2a-1c3d5e7f9a2b",
    "processing_time_ms": 333
  }
}

Step 4 – Error Handling (Simulated)

If the Data Agent cannot connect to its database, it returns a retryable error:

{
  "type": "response",
  "payload": {
    "result": null,
    "error": {
      "code": "DATABASE_TIMEOUT",
      "message": "Connection to database timed out after 10 seconds",
      "retryable": true
    }
  }
}

Research Agent’s retry logic catches this, increments retry_count, and re‑sends the same request after a 1‑second delay (exponential). After 3 failures, it sends a notification to an operator.

Step 5 – Monitoring

Prometheus metrics from this interaction:

agent_messages_sent_total{type="request", sender="research", receiver="data"} 1
agent_messages_received_total{type="request", sender="research", receiver="data"} 1
agent_messages_sent_total{type="response", sender="data", receiver="research"} 1
agent_message_delivery_latency_seconds{sender="research", receiver="data"} 0.333
agent_message_validation_errors_total{error_code="none"} 0

Logs:

{"event": "message_sent", "message_id": "0194f0a2-...", "type": "request", "receiver": "data"}
{"event": "message_received", "message_id": "0194f0a2-...", "type": "request", "sender": "research"}
{"event": "message_processed", "message_id": "0194f0a2-...", "type": "request", "status": "success", "duration_ms": 333}
{"event": "message_sent", "message_id": "0194f0a2-...", "type": "response", "receiver": "research"}

FAQ

1. What is agent messaging?
Agent messaging is the structured, self‑describing mechanism through which AI agents exchange requests, responses, events, and notifications. It defines the format, validation, delivery, and observability of individual messages.

2. How is agent messaging different from agent communication?
Communication is the high‑level process (why and what agents exchange). Messaging is the concrete implementation (the actual bytes, structure, and delivery guarantees). Think of communication as the conversation and messaging as the sentences.

3. What fields are mandatory in every agent message?
message_id, sender, receiver, timestamp, type, and payload. metadata is strongly recommended but not strictly required.

4. Should messages be versioned?
Yes. Include a version field in the envelope (e.g., "version": "1.0"). When making breaking changes, increment the major version and support both versions during migration. Never remove required fields – add optional ones.

5. How do I choose between JSON and Protobuf?
Start with JSON + JSON Schema for flexibility and debugging. Move to Protobuf when you need smaller message size, faster parsing, or strict schema evolution in high‑throughput internal systems.

6. How do agents validate incoming messages?
They validate: (1) envelope schema (fields, types, required), (2) message type specific rules (e.g., request must have payload.operation), (3) payload schema based on operation, (4) authentication and authorisation.

7. What is the best way to correlate a response with its request?
The receiver copies the request’s message_id into the response’s metadata.correlation_id. The sender maintains a map from message_id to a future/promise/callback.

8. How do I handle duplicate messages?
Store processed message_ids in a fast key‑value store (Redis) with a TTL slightly longer than your maximum retry window (e.g., 1 hour). Before processing, check if the ID already exists.

9. What is a good TTL for messages?
Depends on your workflow. For interactive requests, 5–30 seconds. For background batch jobs, 5–60 minutes. Never set TTL to infinite – always have an expiry.

10. How do I test agent messaging without real agents?
Unit test message creation and validation with fake serializers. Integration test with an in‑memory channel or a lightweight message broker (e.g., Redis). For end‑to‑end, use test containers.

11. Should I use synchronous or asynchronous messaging?
Use synchronous (request‑response over HTTP) for low‑latency, interactive tasks. Use asynchronous (queues, event streams) for long‑running operations, high‑volume workloads, or when you need durability.

12. How do I secure messages between agents?
Use mTLS for transport, JWT or API keys for authentication, and field‑level encryption for sensitive data. Add message signatures for end‑to‑end integrity. Always authorise based on sender identity.

13. What metrics should I collect for agent messaging?
Sent/received counts, message size distribution, end‑to‑end latency, validation error rate, duplicate rate, retry rate, and success rate per message type.

14. How large can a message be?
Aim for < 256 KB. If you need more, design a two‑step pattern: first message contains a reference (URI), the receiver fetches the large data from object storage or a shared volume.

15. Can agents from different teams use different message formats?
Only if they agree on a common envelope (at least message_id, sender, receiver, type). The payload can be team‑specific, but the receiving agent must know how to deserialize it. Better to standardise fully.

16. How do I evolve a message schema without breaking existing agents?
Follow the robustness principle: be conservative in what you send, liberal in what you accept. Never remove or rename fields. Only add optional fields with default values. Use a version field and keep old handlers for at least one major version.

17. What happens if a message cannot be delivered after all retries?
Send it to a dead‑letter queue (DLQ) with the original metadata. Alert an operator. The DLQ can be replayed after fixing the issue.

Internal Linking Recommendations

Deepen your understanding of agent messaging by exploring these related implementation guides in the AgentDevPro Handbook:

/guides/a2a/ – A2A protocol fundamentals
/guides/a2a/agent-communication/ – The broader process of information exchange between agents
/guides/a2a/agent-collaboration/ – How agents work together using messages
/guides/agent-workflows/ – Orchestrating multi‑step tasks with message sequences
/guides/agent-tools/ – How agents expose and consume tools via messaging
/guides/agent-memory/ – Sharing semantic memory across agents using messages
/guides/mcp/client/ – Model Context Protocol client integration for tool‑augmented agents

This article is part of the AgentDevPro Handbook – practical, engineering‑focused guides for building production AI agent systems.

What Is Agent Messaging​

Why Agent Messaging Matters​

Messaging vs Communication​

Agent Message Lifecycle​

Core Message Components​

Agent Message Structure​

Response Message Example​

Message Types​

Request Message Example​

Event Message Example​

Message Payload Design​

Structured Payloads​

Schema Design for Payloads​

Including Context in Payload​

Metadata Handling​

Message Serialization​

Example: Protobuf Definition​

Message Validation​

Schema Validation​

Required Fields Validation​

Payload Validation​

Type Checking​

Reliable Message Delivery​

Retries​

Acknowledgements​

Duplicate Detection​

Timeout Handling​

Message Routing​

Direct Routing​

Registry‑Based Routing​

Dynamic Routing​

Error Messages​

Standard Error Structure​

Error Codes (Recommended Categories)​

Retryable vs Non‑Retryable​

Message Security​

Security Checklist for Agent Messaging​

Message Monitoring​

Message Testing​

Unit Testing​

Schema Validation Testing​

Integration Testing​

End‑to‑End Testing​

Agent Messaging Best Practices​

Common Messaging Mistakes​

Case Study: Research Agent ↔ Data Agent Messaging​

Step 1 – Message Structure Definition​

Step 2 – Request Flow​

Step 3 – Response Flow​

Step 4 – Error Handling (Simulated)​

Step 5 – Monitoring​

FAQ​

Internal Linking Recommendations​