Advanced

datadatabasepostgresbranchingserverless

Agent-Native Data Infrastructure & Lakebase

Traditional databases were designed for humans making deliberate changes. Agents do the opposite — they branch constantly, test ideas in parallel, and throw most of them away. This pattern shows what a database looks like when it is built for that.

Used in

Agent-Native Lakebase Architecture

Layer 04 · Agentic

Agent A

Experiments on branch

Sandboxes queries to prevent prod impact

Layer 04 · Agentic

Agent B

Runs migration

Tests schema changes in isolation

Layer 04 · Agentic

Agent C

Rollback test

Can revert to any specific WAL LSN

⚡

Control Plane API

Fleet provisioning + branching

service

Coordinates data lifecycle and branches

🌿

Branch A

WAL LSN @ t0

storage

Zero-copy clone of production

🌿

Branch B

WAL LSN @ t1

storage

Branched at specific timestamp

🌿

Branch C

WAL LSN @ t2

storage

Independent transaction log

🧠

Serverless Compute

Scale-to-zero Postgres

service

Stateless query execution nodes

📚

Pageserver

Copy-on-write pages

storage

Serves 8KB pages to compute nodes

📜

Safekeeper (WAL)

Durable log, shared by branches

storage

Accepts WAL before S3 upload

S3 / GCS — immutable page archive

Object Storage

Bottomless, cheap storage tier

🏔️

Lakehouse Sync

Delta / Iceberg analytics

storage

Parquet-based analytical replica

Pan, zoom, and explore. Click export to download as PNG.

Interactive diagram — pan, zoom, and explore. Click export to download as PNG.

Why databases need to change for agents

Fleet Control Plane

O(1) Copy-on-Write Branching

Scale-to-Zero Serverless Compute

When to Use This Pattern

Trade-offs

📐The mismatch between how databases work and how agents think

Why databases need to change for agents

Think about how a developer uses a database. They write a careful migration, test it in staging, review it with the team, and deploy it once. That is a slow, deliberate process — because mistakes are expensive and hard to reverse.

An AI agent works completely differently. In a single reasoning loop, an agent might want to try ten different approaches simultaneously — run this migration, see if it breaks anything, roll it back, try a slightly different one. Not once, but hundreds of times per minute. It treats the database the same way it treats variables: something to create, modify, and discard as quickly as possible.

Traditional databases cannot serve this workload. Creating a branch means physically copying data — which is slow and expensive. Running compute costs money even when nothing is happening. An agent fleet with thousands of idle databases would be prohibitively wasteful.

Lakebase architectures solve this at the foundation. A branch is just a pointer to where your data diverges — not a copy of the data itself. Compute boots when a query arrives and shuts off when it is done. Storage lives cheaply in object storage. This makes it economically viable for agents to create and discard database environments as casually as they create variables.

Next up:Fleet Control Plane

🔧Creating and destroying databases through an API, not a ticket

Fleet Control Plane

In a traditional setup, getting a new database requires a ticket to a DBA, a provisioning script, and several minutes of waiting. That model falls apart when an agent needs a fresh database for each task it runs — potentially thousands of times per day. The control plane exposes a simple API that agents call directly. Create a database. Fork it from a parent. Delete it when the task ends. A coordinator agent can spin up one isolated database per subtask, keep the blast radius contained, and clean everything up automatically when it is done. No human involvement, no waiting, no sprawl.

Fleet Control Plane

Layer 04 · Agentic

Agent Fleet

Creates & reaps DBs

⚡

Provisioning API

Create / branch / delete

service

🧾

Per-Agent Quotas

Spend + blast-radius cap

service

🔑

Scoped Credentials

Short-lived roles

service

🧹

Auto-Reaper

TTL-based GC

service

Pan, zoom, and explore. Click export to download as PNG.

⚡

Provisioning API

Agents call an HTTP endpoint to create, fork, or delete a database. The operation completes in under a second — no provisioning pipeline, no waiting for storage to be allocated.

🧾

Per-Agent Quotas

Each agent or agent fleet gets a hard cap on how many databases it can create and how much storage it can use. If something goes wrong and an agent starts creating thousands of databases, the quota stops it before costs spiral.

🔑

Scoped Credentials

Each database branch gets its own short-lived credentials tied specifically to that branch. An agent with access to one branch cannot read or write any other branch in the fleet.

🧹

Auto-Reaper

Databases that have been idle longer than their configured time-to-live are automatically deleted. No manual housekeeping required — the fleet stays clean without anyone managing it.

Next up:O(1) Copy-on-Write Branching

🔧A branch is a pointer, not a copy of your data

O(1) Copy-on-Write Branching

The key technical idea is changing what "a branch" means at the storage level. In a traditional database, a branch means copying data — which means copying gigabytes or terabytes. That is slow, expensive, and fundamentally incompatible with agents that need to branch hundreds of times per minute. Lakebase branches work on a different principle. Every write to a database goes into a write-ahead log — a sequential record of changes. A branch is nothing more than a pointer to a specific position in that log. Two branches share all the same underlying data pages until one of them writes something. At that point, only the pages that changed diverge — everything else is still shared. This is called copy-on-write. The result: forking a 10TB database is just as fast as forking an empty one. There is no data to copy, only a pointer to set.

O(1) Copy-on-Write Branching

🌳

Parent DB

WAL LSN @ t0

storage

🌿

Branch Metadata

(parent, LSN, divergence)

service

📚

Pageserver

Shared unchanged pages

storage

📜

Safekeeper (WAL)

Durable log

storage

⏪

PITR

Any LSN is a branch

service

Pan, zoom, and explore. Click export to download as PNG.

🌿

Branch Metadata

A branch is stored as three things: which database it came from, the position in the write log where it starts, and a map of which pages have diverged since then. The data itself is not duplicated.

📚

Pageserver

A shared service that reconstructs the state of any page at any point in time. Unchanged pages are served from the same storage location for all branches — there is no redundant copying.

📜

Safekeeper (WAL)

The write-ahead log that all branches share. Every change is recorded here in order. This shared log is what makes instant branching and point-in-time recovery possible without copying data.

⏪

Point-in-Time Recovery

Any past position in the write log is a valid starting point for a new branch. Want to see what the database looked like ten minutes ago? Create a branch from that position. It is instant and costs nothing.

Next up:Scale-to-Zero Serverless Compute

🔧Compute only runs when someone is actually querying

Scale-to-Zero Serverless Compute

Even after solving the branching problem, there is still a cost problem with compute. A traditional database server runs continuously — it uses CPU and memory even when no queries are happening. An agent fleet might have thousands of databases, most of which are idle most of the time. Running all of them as always-on servers is extremely wasteful. Serverless compute solves this by treating database compute as something you rent per request. When a query arrives, a compute node starts up, handles the query, and then shuts itself down after a period of inactivity. Storage is completely separate from compute — it lives in cheap object storage and is always available. An idle database costs almost nothing, which makes large agent fleets economically viable.

Scale-to-Zero Serverless Compute

Layer 04 · Agentic

Agent Request

First query after idle

🧠

Autoscaler

Boot compute, size mem

service

💻

Postgres Compute

Ephemeral VM / container

service

💾

Object Storage

S3 / GCS pages

storage

🏔️

Lakehouse

Delta / Iceberg sync

storage

Pan, zoom, and explore. Click export to download as PNG.

🧠

Autoscaling Compute

The CPU and memory given to a database session scales based on what the query actually needs. Between sessions, it scales all the way to zero — no idle cost at all.

💾

Object-Storage Pages

Database pages are stored in S3 or equivalent object storage rather than on expensive attached disks. This makes idle storage cost close to zero and makes capacity effectively unlimited.

🏔️

Lakehouse Sync

Transactional changes flow automatically into analytics formats like Delta Lake or Apache Iceberg. Agents write to the operational database and the analytics layer stays in sync.

⏱️

Cold-Start Budget

The first query after the database has been idle for a while pays a warm-up cost — typically 100ms to 2 seconds. After that, the session is warm and subsequent queries are fast.

Next up:When to Use This Pattern

🎯Signs this is the right architecture for your situation

When to Use This Pattern

Agents need isolated, throw-away database environments for each test run or reasoning attempt

You need to support thousands or millions of separate database instances without the cost of always-on servers

Preview environments that need to mirror production data without putting the real database at risk

Workloads that are bursty — active for seconds, idle for long stretches — where you want to pay only for actual usage

Next up:Trade-offs

⚖️What you gain — and what it costs

Trade-offs

Benefit

Cost

Branch any database in milliseconds regardless of its size

The first query after idle pays a cold-start cost of 100ms–2s

Near-zero cost for idle databases — fleets of thousands are economical

Proprietary storage layers create dependency on specific vendors

Standard Postgres — no new query language or tooling to learn

High-throughput workloads require careful connection pool management at scale

Safe, cheap experimentation — fork, try, discard with no consequences

Write-ahead log replication adds some latency on the write path