Intermediate
kafkaeventspub-subreactivestreaming

Event-Driven Agent Architecture

Rather than having agents check for new work on a schedule, this pattern wires agents directly to the events that should trigger them. Agents spend zero compute while they wait and respond the moment something relevant happens.

Used in
Event-Driven Agent Architecture
Mini Map
Pan, zoom, and explore. Click export to download as PNG.

Interactive diagram โ€” pan, zoom, and explore. Click export to download as PNG.

๐Ÿ“Why asking "is there anything to do?" is the wrong model

From polling loops to reactive agents

The simplest way to build a reactive agent is a polling loop: wake up, check if there is new work, process it if there is, sleep briefly, repeat. This works. But it is wasteful in a predictable way โ€” if events happen once a minute and the agent checks every second, it is idle 98% of the time while burning compute and tokens for nothing. With fifty agents polling simultaneously, the waste multiplies across every service each agent checks.

Event-driven architecture flips the model entirely. Instead of agents reaching out to ask "is there anything for me?", the system reaches in and wakes them when there actually is. Events from producers โ€” an order was placed, a file was uploaded, an error was logged โ€” flow into a durable, ordered log. Each agent subscribes to the events it cares about and stays completely dormant otherwise. The moment a matching event arrives, the broker wakes the agent, the agent handles it, acknowledges that it is done, and returns to idle.

This consistently cuts response latency by 70โ€“90% versus polling (the agent reacts immediately rather than at the next poll interval) and reduces compute spend by nearly half. As a side effect, the event log gives you a complete, replayable history of everything that happened โ€” invaluable for debugging non-deterministic agent behavior.

Next up:The Event Broker
๐Ÿ”งA durable, ordered log at the center of the system

The Event Broker

The broker is the backbone of this architecture. Think of it as a database designed specifically for time-ordered messages. Producers write events to named topics. Consumers read from those topics at their own pace, independently, without needing to coordinate with each other or with the producers. The key property is durability. If an agent crashes while processing an event, the event is not lost. The agent restarts, picks up from where it left off, and processes the event again. The broker guarantees delivery โ€” events do not disappear because a consumer was temporarily offline.

The Event Broker
Mini Map
Pan, zoom, and explore. Click export to download as PNG.
๐Ÿ“œ

Partitioned Log

Events in a topic are split across partitions. Within each partition, events are strictly ordered by arrival time. Partitioning allows many consumers to read the same topic in parallel without losing ordering guarantees within each partition.

๐Ÿ“š

Schema Registry

A central registry that defines what each type of event looks like โ€” its fields, types, and version. Producers and consumers agree on this contract through versioned schemas, so teams can evolve their events independently without accidentally breaking each other.

โ™ป๏ธ

DLQ + Retries

When an agent fails to process an event, the broker retries it a configurable number of times. After repeated failures, the event is moved to a Dead Letter Queue โ€” a separate topic for events that need human investigation. Other events keep flowing uninterrupted.

๐Ÿ”

Replay

Because the log is durable and ordered, you can rewind to any past point and replay events through a new or updated version of the agent. This is the best debugging tool available for systems with non-deterministic behavior.

Next up:Stream Processing
๐Ÿ”งEnriching events before agents see them

Stream Processing

Not every event is ready to act on the moment it arrives. An agent responding to a fraud signal might need to know that three suspicious events from the same user happened within a five-minute window โ€” a single event does not tell that story. A stream processor sits between the raw event log and the agents, transforming, joining, and aggregating events before forwarding them downstream. Think of it as the layer that does the heavy lifting of pattern detection and data assembly, so that downstream agents receive clean, enriched, actionable signals rather than raw event firehose data.

Stream Processing
Mini Map
Pan, zoom, and explore. Click export to download as PNG.
๐Ÿ’จ

Flink / Kafka Streams

Distributed frameworks that maintain stateful computations across a stream of events. They can answer questions like "how many times did this user fail login in the last 10 minutes?" without a separate database query.

๐Ÿ”—

Event Joins

Correlates events from different topics into a single enriched event. For example, joining an "order placed" event with the matching "inventory check" event to produce a single event containing both signals.

๐Ÿ“ˆ

Windowed Aggregates

Groups events into time windows โ€” tumbling (non-overlapping), sliding (overlapping), or session-based โ€” to detect rate-based patterns, anomalies, or threshold breaches.

๐ŸŽฏ

Exactly-Once Semantics

Checkpointed state ensures that even if the stream processor crashes mid-computation, each input event is counted exactly once in the output. Prevents agents from being triggered twice for the same event.

Next up:Reactive Agent Consumers
๐Ÿ”งDormant until a matching event fires โ€” then instantly active

Reactive Agent Consumers

The agents in this pattern are designed around a single principle: do nothing until specifically asked to act. Each agent has a declared trigger โ€” the exact type of event that should wake it. Everything else in the topic flows past without consuming resources. When its event arrives, the agent wakes up, runs its logic, and acknowledges that it has processed the event by writing a commit back to the broker. This commit is like a bookmark โ€” "I have processed up to this point, so next time start from the next event." If the agent crashes before committing, it restarts from the last committed position, not from zero, which is what makes reliable processing possible.

Reactive Agent Consumers
Mini Map
Pan, zoom, and explore. Click export to download as PNG.
๐Ÿค–

Trigger Contract

A formal declaration of exactly which event types wake this agent. Other events in the same topic pass by without consuming any agent resources โ€” the agent is truly dormant between triggers.

โšก

Cold Spin-Up

On trigger, the agent process or container starts up, handles the event, and shuts down afterward. No memory or CPU consumed between events โ€” cost is proportional to actual work done.

๐Ÿงพ

Offset Commit

After successfully processing an event, the agent writes its current position (offset) back to the broker. This is the broker's record of how far the agent has read โ€” it is what enables crash recovery without reprocessing everything from the start.

๐ŸŽญ

Actor Semantics

Frameworks like AutoGen and LangGraph model agents as actors โ€” isolated units that communicate only through messages (events). This makes concurrent operation safe because agents never share mutable state directly.

Next up:When to Use This Pattern
๐ŸŽฏSigns this is the right architecture for your situation

When to Use This Pattern

Multiple independent agents need to react to the same upstream event without coordinating
You have polling loops that run constantly but rarely find new work โ€” wasted compute and latency
Workflows that pause for extended periods waiting for a human, an approval, or an external system
You need a complete, replayable history of every action the system has taken for debugging or compliance
Next up:Trade-offs
โš–๏ธWhat you gain โ€” and what it costs

Trade-offs

Benefit
Cost
Agents respond immediately when triggered โ€” 70โ€“90% lower latency than polling
Running a Kafka or Flink cluster adds infrastructure your team needs to operate and maintain
Zero compute spend between events โ€” agents truly do nothing while dormant
Distributed async flows are harder to debug than a polling loop you can step through in a debugger
Producers and consumers evolve independently โ€” no tight coupling between teams
Eventual consistency means agents may act on a slightly stale view of the world
The event log is a free, automatic audit trail of everything that happened
Changing event schemas requires coordination across every team that produces or consumes them