Agentic AI Is Distributed Systems. Except Where It Isn't.

Every major agentic AI pattern has a named distributed systems ancestor going back decades. MoA is ensemble computing. ReAct is a control loop. AutoGen is the actor model. LangGraph is a workflow engine. Multi-agent coordination is a blackboard system.

Part 1: The Map Every AI Engineer Should Have Memorized

This article maps each AI concept to its CS lineage, explains what that means for engineers building today, and names the three problems that genuinely have no distributed systems precedent — the real frontier where new thinking is required.

The Meeting Room

It was a Tuesday afternoon demo. The team was proud. Eighteen months of work, two engineers, one ambitious roadmap. They had built an agent orchestration framework from scratch — task routing, failure recovery, state handoff between agents, retry logic with exponential backoff. The architecture diagram on the screen was dense and impressive.

In the back of the room sat a woman who had spent the previous decade building telecommunications infrastructure. She had written Erlang — a programming language designed by Ericsson in the 1980s specifically for fault-tolerant, distributed, concurrent systems — before most of the team had graduated. She said nothing during the demo. Afterward, in the hallway, she pulled up the architecture diagram on her phone and quietly mapped each component to something she recognized.

Task routing with capability matching: service discovery with semantic predicates. Failure recovery with restart policies: OTP supervisor trees. State handoff between agents: process migration. Retry with exponential backoff: unchanged from distributed systems practice since the 1980s.

She was not dismissive. The team had built something real and it worked. But she knew something they did not: every hard problem they would encounter next — cascading failures, split-brain state, thundering herd on retry storms — had been solved before. The solutions were in papers from the 1990s and production systems from the 2000s. Nobody had told them to look there.

This article is for the engineers who deserve to know.

Why This Matters More Than It Seems

The agentic AI field is in a phase every young technology goes through: rediscovering solved problems without knowing they are solved, and labeling the rediscoveries as inventions. This is not a criticism. It is what happens when a new capability — the LLM as a reasoning component — enters a domain that already has decades of engineering behind it.

The cost of not knowing the lineage is concrete: teams build bespoke orchestration frameworks that duplicate Dapr. They invent retry logic that misses idempotency guarantees Sagas — a distributed systems pattern for managing long-running transactions across services — already provide. They design agent communication protocols that solve problems gRPC solved in 2015. They burn engineering months on infrastructure that could be replaced with existing, battle-tested tooling — leaving no time for the problems that are actually new.

One important distinction before the mapping begins: the orchestration primitives are old. What is genuinely newer is placing a probabilistic planner — an LLM that generates and mutates execution plans at inference time — on top of those primitives. Traditional distributed components execute a fixed plan. Agents generate the plan dynamically, and the plan itself is probabilistic. That changes verification, replayability, and testing in ways the mapping below will not fully cover. The infrastructure layer is solved. The reasoning layer that drives it is not. Keep that distinction in mind as you read.

The goal of this article is to give you the map. Once you have it, you know exactly where to apply proven solutions and exactly where you are standing on genuinely new ground.

The Map

Before the deep dives, here is the full picture. Every major agentic AI pattern, its distributed systems ancestor, and what that lineage has already solved.

Agentic AI Concept	DS / CS Ancestor	Year	What's Already Solved
MoA layered aggregation	Ensemble computing, quorum-style aggregation	1990s	Aggregation strategies, fault tolerance, layer isolation
ReAct Reason+Act loop	Control loop, event-driven I/O	1960s	Loop stability, termination, I/O error handling
Reflexion retry with context	Write-ahead log, Saga-style recovery	1987	Bounded retry, context injection, escalation
OpenAI Swarm handoff	Erlang OTP supervisor + process migration	1996	Handoff contracts, state serialization, scope boundaries
Multi-agent coordination	Blackboard systems (Hearsay-II, Linda)	1970s	Shared context, opportunistic specialization, decoupled coordination
AutoGen conversation	Actor model (Hewitt, 1973)	1973	Message passing, isolation, mailbox management
LangGraph state graph	DAG workflow / state machine	1976–2015	Graph execution, checkpointing, branching
Context window management	Cache eviction, working set model	1968	Eviction policies, tiered storage, compression
MCP tool discovery (series →)	Service discovery (Consul, etcd)	2013	Registration, health checks, capability negotiation
A2A inter-agent protocol	gRPC / Protocol Buffers	2015	Schema contracts, backward compatibility, streaming

The sections below unpack each row — what the AI framing adds, what the DS lineage already solved, and what it means for engineers building today.

The Mapping: Every Agentic AI Pattern Has an Ancestor

Mixture of Agents → Ensemble Computing

The MoA paper from Together AI (Wang et al., 2024) describes a layered architecture where multiple LLM agents each process the outputs of the previous layer, with a final aggregator synthesizing the results. The headline result: open-source models in a MoA configuration outperformed GPT-4 Omni on AlpacaEval 2.0, scoring 65.1% versus 57.5%.

Strip the LLM framing and this is ensemble computing — a technique from the 1990s where multiple weaker models vote on outputs to produce a result more reliable than any individual model. Random forests work this way. Bagging and boosting work this way. The idea that aggregating imperfect components produces a more reliable whole is older than the internet.

At the infrastructure level, MoA resembles quorum-style aggregation — the pattern distributed databases use when a write must be confirmed by a majority of replicas before being acknowledged. The analogy is useful but has an important limit: quorum systems assume replicas should converge on the same answer. MoA benefits when models diverge — diversity of reasoning is what produces a better aggregate output. The aggregator layer plays a role structurally analogous to quorum logic, but with the opposite objective function. Keep that distinction in mind when borrowing from quorum literature.

What this means for engineers: the aggregation strategies and failure modes of MoA are understood in ensemble literature. When a layer produces a bad output, the aggregator needs a strategy — confidence weighting, re-sampling, synthesis with divergence detection. You do not need to invent these.

ReAct Loop → Control Loop with External I/O

The ReAct paper (Yao et al., Google/Princeton, 2022, published at ICLR) introduced the pattern of interleaving reasoning traces and tool-use actions: Reason → Act → Observe → Reason → Act. The cycle continues until the task is complete or a termination condition is reached. On HotpotQA, this approach significantly reduced hallucination compared to chain-of-thought reasoning alone.

This is a control loop — a structure described formally in control theory in the 1960s and implemented in every thermostat, cruise control system, and industrial regulator ever built. Sense the environment. Compute a response. Act. Observe the result. Repeat. The LLM is the controller. The tools are the actuators. The observation is the sensor feedback.

More specifically, ReAct resembles an event-driven loop with external I/O — the same pattern used in operating system interrupt handlers, network event loops (Node.js, Nginx), and reactive systems. The loop runs until a terminal state. The external calls are side effects.

What this means for engineers: loop stability, termination guarantees, and I/O error handling in ReAct systems are control theory and event-loop problems. If your agent loops indefinitely, you have an unstable controller. The solution is a maximum iteration count and a defined fallback state. These are not AI problems. They are loop design problems.

Reflexion → Write-Ahead Log + Saga-Style Recovery

The Reflexion paper (Shinn et al., 2023) proposed that agents should verbally reflect on their failures and store that reflection in an episodic memory buffer, which is then injected into subsequent attempts. The result: 91% pass@1 on the HumanEval coding benchmark, compared to GPT-4's 80% at the time.

The mechanism maps precisely to a write-ahead log for semantic errors combined with bounded retry. When a transaction fails in a database, the write-ahead log records what was attempted and why it failed, enabling informed recovery. Reflexion does the same thing in natural language: the failure is logged as a verbal reflection, and the retry reads that log to avoid repeating the same mistake.

The recovery discipline resembles the Saga pattern — formalized by Hector Garcia-Molina at Princeton in 1987, a Saga breaks a long-running transaction into a sequence of smaller steps, each with a compensating action that can undo its effect if a later step fails. The analogy is not perfect: a true Saga has explicit compensating transactions that roll back prior state. Reflexion has no rollback — it has context-enriched retry. But the operational discipline Saga systems demand maps directly: bound your retries, define your recovery path, and escalate when recovery is exhausted. Reflexion systems that skip this discipline produce the same retry storms that plagued early distributed systems.

What this means for engineers: Reflexion without bounds is dangerous. Apply Saga discipline: maximum retry count, exponential backoff, dead letter queue for tasks that exhaust retries. The write-ahead log pattern tells you what to log. The Saga pattern tells you how to structure recovery.

OpenAI Swarm Handoff → Erlang OTP Supervisor + Process Migration

OpenAI Swarm (explicitly labeled an educational framework, not production tooling) demonstrates a core pattern: one agent transfers control to another when the task exceeds its capability or scope. The receiving agent takes over the conversation context and continues.

This is the Erlang OTP (Open Telecom Platform) supervisor tree with process migration — a pattern from 1996 where processes are supervised by parent processes, and work can be handed off to sibling processes when one reaches its boundary. OTP is Erlang's standard library for building fault-tolerant systems: supervisor trees define which processes to restart on failure, and how. In Erlang, handoff is implemented through process linking and message passing. In Swarm, it is implemented through conversation context transfer.

The production-grade equivalent for agent orchestration is Microsoft Orleans — a virtual actor framework where each agent is a "grain" (a lightweight virtual actor) that can be activated anywhere in the cluster, persists its state automatically, and can be called by any other grain without managing its physical location. Where Dapr provides actor primitives as part of a broader microservices sidecar, Orleans is designed specifically for fine-grained stateful objects at scale — making it the closer architectural match for agent orchestration, where you may have thousands of concurrent agents each maintaining their own state.

The handoff protocol — what state is transferred, what is discarded, how the receiving agent knows its scope — is the same design problem as session migration in distributed systems: what context does a session carry, and how much of it survives a handoff to a different server?

What this means for engineers: handoff correctness requires clear contracts. What state is canonical (must transfer)? What state is local (should not transfer)? What state is context-dependent (may transfer with modification)? Distributed systems solved this with session serialization formats and migration protocols. Agent handoffs need the same discipline — not ad hoc context dumping.

Multi-Agent Coordination → Blackboard Systems

This is the ancestor most AI engineers have never heard of, and it is the most direct historical predecessor to modern multi-agent orchestration.

In the 1970s, researchers at Carnegie Mellon built Hearsay-II — a speech recognition system organized around a blackboard architecture. The idea: a shared workspace (the blackboard) holds the current problem state. Specialist modules called knowledge sources monitor the blackboard and opportunistically contribute when they recognize something they can process. No central coordinator directs who does what. The system makes progress through decoupled, emergent cooperation.

The Linda coordination language (Gelernter, Yale, 1985) formalized this further with tuple spaces — a shared associative memory where processes post and retrieve data without knowing who produced or will consume it. Processes are fully decoupled. Coordination emerges from the shared space rather than from explicit orchestration.

Modern multi-agent systems are blackboard systems with LLM specialists. The shared context window or scratchpad is the blackboard. Each agent is a knowledge source that reads the current state and contributes its specialty — code generation, retrieval, critique, summarization. The orchestrator, when there is one, plays the role of the blackboard controller that decides which knowledge source to activate next.

What this means for engineers: blackboard architecture literature directly informs multi-agent design decisions: how to structure shared context so specialists can read state without coupling to each other, how to sequence specialist activation, and how to handle conflicting contributions to shared state. AutoGen, CrewAI, and similar frameworks are reinventing this. The 1970s papers are worth reading.

AutoGen → Actor Model (Hewitt, 1973)

Microsoft's AutoGen implements multi-agent collaboration through structured conversation: agents have roles, exchange messages, and maintain conversation history as shared state.

This is the actor model, formalized by Carl Hewitt at MIT in 1973 and productionized in Erlang (1986), Akka (2009), and Orleans (2014). Actors are isolated units of computation that communicate exclusively through message passing. They have local state that is not shared. They react to messages by updating state, sending messages, or spawning new actors.

AutoGen agents are actors. The conversation history is the mailbox. Role assignments are the actor's behavioral specification. The framework is a lightweight actor runtime without the formal guarantees of Akka or Orleans — no location transparency, no persistence, no supervision hierarchy.

What this means for engineers: if you are building production multi-agent systems using AutoGen's conversation pattern, you are building an actor system. Use an actual actor runtime. Orleans gives you virtual actors with transparent activation, automatic state persistence, and cluster placement. Dapr's actor model adds actors as one capability within a broader microservices sidecar. Do not reimplement either in Python on top of an LLM framework.

LangGraph → Workflow Orchestration

LangGraph describes itself as building "resilient language agents as graphs." Nodes are agent steps. Edges define control flow. State is passed between nodes. Conditional edges implement branching logic.

This is a workflow engine — the same architectural pattern as Apache Airflow (2015), Prefect, Dagster, and before them, Unix Make (1976). The graph defines what runs, in what order, with what dependencies. The runtime executes the graph and handles failures.

The key difference LangGraph introduces: cycles. Traditional workflow engines are built around DAGs — Directed Acyclic Graphs, where execution flows in one direction with no loops back to prior nodes. LangGraph allows cycles for agentic loops (ReAct, Reflexion). This makes it technically a directed cyclic graph engine, which is closer to a state machine than a classic DAG.

What this means for engineers: LangGraph's execution semantics are state machine semantics. If you need production-grade durability, look at AWS Step Functions, Azure Durable Functions, or Temporal — all of which provide checkpointing, replay guarantees, and long-running execution that LangGraph does not yet offer at the same maturity level.

Context Window Management → Cache Eviction

The LLM context window — the fixed-size memory available to a model during inference — creates a hard boundary that forces engineers to decide: what to keep, what to compress, what to discard.

This is cache management — one of the oldest problems in computer science. LRU (Least Recently Used) eviction, TTL-based expiration, cache compaction, and tiered storage are all directly applicable. The conversation history is the cache. The context window is the cache size. Compaction is cache compression.

Techniques like summarizing older conversation turns before they are evicted, storing them in external retrieval systems, and fetching them on demand are the working set model from virtual memory theory (Peter Denning, 1968) — the idea that a process only needs a subset of its full memory at any given time, and the system should keep that subset hot. Applied to LLM context: keep active task state hot, archive history to cold storage, retrieve on demand.

What this means for engineers: treat context management as a cache design problem. Define what is hot (recent turns, current task state), warm (relevant history), and cold (archivable context). The tooling — vector stores as secondary cache, summarization as compression — maps cleanly onto cache hierarchy design.

Tool and Agent Discovery → Service Discovery

In MCP (Model Context Protocol — Anthropic's open standard for connecting AI models to tools, data sources, and services) based architectures, agents discover available tools and capabilities at runtime. The protocol includes capability negotiation — an agent can query what a server supports before calling it.

This is service discovery — the pattern implemented by Consul (2014), etcd (2013), and ZooKeeper (2007). Services register their capabilities with a registry. Clients query the registry to find services that match their needs. Health checks remove unavailable services from the registry.

MCP adds a semantic layer: tools are described in natural language, allowing the LLM to reason about which tool to use. This is closer to semantic service discovery — matching by capability description rather than by name or endpoint. The infrastructure pattern is identical. The matching function is new.

What this means for engineers: the remaining gap is semantic routing — choosing the right tool when multiple tools match. One production team discovered this the hard way: renaming a tool from example_queries to known_good_queries moved it from unused to frequently used, because the LLM's tool selection is driven by name and description semantics. Tool naming is prompt engineering.

Agent-to-Agent Protocol (A2A) → gRPC / Protocol Buffers

Google's Agent-to-Agent (A2A) protocol, and similar inter-agent communication schemes, define structured message formats for agents to communicate across organizational boundaries — compact, typed, schema-enforced.

This is gRPC with Protocol Buffers — Google's own binary RPC framework from 2015. Compact binary serialization. Schema-defined contracts. Bidirectional streaming. A2A solves the same problem gRPC solved for microservices: how do heterogeneous services communicate efficiently with clear contracts?

What this means for engineers: study gRPC's design decisions — schema evolution strategies, backward compatibility guarantees, error propagation across service boundaries. Those decisions were made carefully. A2A is solving the same problem for agents. The lessons transfer directly.

Where the Map Ends: Three Genuine Frontiers — and One Emerging Fourth

The mapping above covers the infrastructure layer completely. But distributed systems has no answer for three problems that emerge specifically because the compute unit is a language model. And there is a fourth category beginning to surface that sits between all three.

One important qualification before the frontiers: the orchestration substrate is largely solved, but stochastic planners introduce new coordination pathologies atop that substrate. Semantic retry amplification, cascading hallucination propagation, planner oscillation, context poisoning across agents — these are not infrastructure failures. They emerge from the interaction between a probabilistic reasoning layer and a deterministic execution layer beneath it. Call it semantic coordination instability: systems whose orchestration topology is structurally sound, but whose semantic interactions destabilize execution in ways no circuit breaker or supervisor tree was designed to detect. This is the space between distributed systems, control theory, and probabilistic cognition systems. It does not yet have a field.

Frontier 1: The Compute Unit Has Opinions

A Redis node does not interpret your request. It executes it. A Kafka broker does not have a view on whether your message is correct. It delivers it.

An LLM interprets. Its interpretation is shaped by training data, reinforcement from human feedback, and alignment constraints you did not write and cannot fully inspect. You cannot patch it. You cannot read its source. When a microservice misbehaves, you get a stack trace. When an LLM misinterprets, you get a plausible-sounding response that is wrong in a way that may not surface for many steps downstream.

This is not non-determinism — distributed systems handles non-determinism well. This is non-transparency at the execution layer. The failure mode is not observable through standard infrastructure tooling.

Frontier 2: Semantic Failure Is Invisible to Infrastructure

When a microservice fails, you get an HTTP 500, an exception, a log entry, a spike in your error rate dashboard. Your circuit breaker trips. Your alerting fires. The failure is visible.

When an LLM "fails" — produces a subtly wrong answer, hallucinates a fact, misunderstands the intent of a request — it returns HTTP 200. Your circuit breaker does not trip. Your error rate dashboard is flat. Your monitoring system sees a healthy request. The failure is semantic, not structural, and your entire observability stack is blind to it.

There is no distributed systems primitive for "the node returned successfully but the answer was wrong in a way that won't surface for three reasoning steps." This is new.

Frontier 3: The Instruction/Data Boundary Does Not Exist

For real-world examples of how this boundary gets exploited — through deepfakes, social engineering, and AI-curated manipulation — see When AI Knows You Better Than You Know Yourself.

SQL injection is solved. Parameterized queries separate instructions from data at the protocol level. The database never confuses user-supplied data for SQL commands.

Prompt injection is not solved, because in natural language, instructions and data are the same type of thing: text. There is no type system that separates them. A malicious string in retrieved content can redefine an agent's behavior because the LLM cannot reliably distinguish "this is data I should process" from "this is an instruction I should follow."

This has no distributed systems equivalent. It is a new attack surface that emerges from the nature of language as both the instruction medium and the data medium simultaneously.

What This Means Right Now

Two things should change in how engineers approach agentic AI systems today.

First: stop building infrastructure from scratch. The orchestration layer, the communication layer, the state management layer, the retry and recovery layer — these are solved. (For a practitioner-focused take on this same principle — building with skills and lazy-loaded context rather than monolithic agents — see AI, Agents, and the Art of Orchestration.) For stateful agent orchestration, use Orleans (Microsoft's virtual actor framework — each agent is a grain that activates on demand, persists state automatically, and scales across a cluster without manual placement). For broader microservices coordination use Dapr (Distributed Application Runtime — a sidecar-based runtime that adds actors, pub/sub, state management, and service invocation to any language or framework). For durable multi-step workflows use Temporal. For inter-agent communication use a service mesh. For tool registration use existing service discovery patterns. Redirect the engineering time you save toward the three frontiers above.

Second: know where you are on the map. When you encounter a problem in your agentic system, ask first: does this problem have a distributed systems ancestor? If yes, look at how that ancestor was solved. The solution will almost certainly be directly applicable or adaptable. If the problem involves semantic failure visibility, execution opacity, or instruction/data separation — you are in genuinely new territory. Bring intellectual humility and expect to iterate.

What Comes Next

Part 2 of this series goes to the frontier. It surveys what the research community and production practitioners are doing today to address semantic failure visibility and output correctness — the three communities working this problem, what each has achieved, where each falls short, and a concrete engineering playbook for how to build production agentic systems that are as reliable as possible given the current state of the field.

The unsolved problem has a name: the semantic correctness guarantee across the non-deterministic compute boundary. Part 2 shows you how to get as close as today's tools allow.

Glossary

Bagging (Bootstrap Aggregating): A machine learning ensemble technique that trains multiple models on random subsets of training data and averages their outputs to reduce variance.

Boosting: A sequential ensemble technique where each model is trained to correct the errors of its predecessor, reducing bias.

DAG (Directed Acyclic Graph): A graph where nodes represent tasks and directed edges represent dependencies, with no cycles — execution flows in one direction only.

PID Controller (Proportional-Integral-Derivative): A control loop mechanism that continuously calculates an error value as the difference between a desired setpoint and a measured process variable, then applies corrections to minimize that error.

Tuple Space: A shared associative memory model (from Linda, 1985) where processes post and retrieve data by content rather than address, enabling fully decoupled coordination without explicit messaging.

References

MoA White Paper (Wang et al., Together AI, 2024

ReAct Paper (Yao et al., 2022

Reflexion Paper (Shinn et al., 2023)

OpenAI Swarm (reference architecture)

AutoGen (Microsoft)

LangGraph

Hearsay-II Blackboard System (Erman et al., 1980)

Linda Coordination Language (Gelernter, 1985)

Orleans Documentation

Dapr Actor Model

Temporal Workflow Engine

Saga Pattern (Garcia-Molina, 1987)

Working Set Model (Denning, 1968)

→ Continue to Part 2

Shaped in collaboration with Claude, an AI assistant by Anthropic, during sunny Pacific Northwest afternoons where engineering problems meet philosophical questions.