Agent-Native Data Infrastructure & Lakebase
Traditional databases were designed for humans making deliberate changes. Agents do the opposite — they branch constantly, test ideas in parallel, and throw most of them away. This pattern shows what a database looks like when it is built for that.
Interactive diagram — pan, zoom, and explore. Click export to download as PNG.
Why databases need to change for agents
Think about how a developer uses a database. They write a careful migration, test it in staging, review it with the team, and deploy it once. That is a slow, deliberate process — because mistakes are expensive and hard to reverse.
An AI agent works completely differently. In a single reasoning loop, an agent might want to try ten different approaches simultaneously — run this migration, see if it breaks anything, roll it back, try a slightly different one. Not once, but hundreds of times per minute. It treats the database the same way it treats variables: something to create, modify, and discard as quickly as possible.
Traditional databases cannot serve this workload. Creating a branch means physically copying data — which is slow and expensive. Running compute costs money even when nothing is happening. An agent fleet with thousands of idle databases would be prohibitively wasteful.
Lakebase architectures solve this at the foundation. A branch is just a pointer to where your data diverges — not a copy of the data itself. Compute boots when a query arrives and shuts off when it is done. Storage lives cheaply in object storage. This makes it economically viable for agents to create and discard database environments as casually as they create variables.
Fleet Control Plane
In a traditional setup, getting a new database requires a ticket to a DBA, a provisioning script, and several minutes of waiting. That model falls apart when an agent needs a fresh database for each task it runs — potentially thousands of times per day. The control plane exposes a simple API that agents call directly. Create a database. Fork it from a parent. Delete it when the task ends. A coordinator agent can spin up one isolated database per subtask, keep the blast radius contained, and clean everything up automatically when it is done. No human involvement, no waiting, no sprawl.
Provisioning API
Agents call an HTTP endpoint to create, fork, or delete a database. The operation completes in under a second — no provisioning pipeline, no waiting for storage to be allocated.
Per-Agent Quotas
Each agent or agent fleet gets a hard cap on how many databases it can create and how much storage it can use. If something goes wrong and an agent starts creating thousands of databases, the quota stops it before costs spiral.
Scoped Credentials
Each database branch gets its own short-lived credentials tied specifically to that branch. An agent with access to one branch cannot read or write any other branch in the fleet.
Auto-Reaper
Databases that have been idle longer than their configured time-to-live are automatically deleted. No manual housekeeping required — the fleet stays clean without anyone managing it.
O(1) Copy-on-Write Branching
The key technical idea is changing what "a branch" means at the storage level. In a traditional database, a branch means copying data — which means copying gigabytes or terabytes. That is slow, expensive, and fundamentally incompatible with agents that need to branch hundreds of times per minute. Lakebase branches work on a different principle. Every write to a database goes into a write-ahead log — a sequential record of changes. A branch is nothing more than a pointer to a specific position in that log. Two branches share all the same underlying data pages until one of them writes something. At that point, only the pages that changed diverge — everything else is still shared. This is called copy-on-write. The result: forking a 10TB database is just as fast as forking an empty one. There is no data to copy, only a pointer to set.
Branch Metadata
A branch is stored as three things: which database it came from, the position in the write log where it starts, and a map of which pages have diverged since then. The data itself is not duplicated.
Pageserver
A shared service that reconstructs the state of any page at any point in time. Unchanged pages are served from the same storage location for all branches — there is no redundant copying.
Safekeeper (WAL)
The write-ahead log that all branches share. Every change is recorded here in order. This shared log is what makes instant branching and point-in-time recovery possible without copying data.
Point-in-Time Recovery
Any past position in the write log is a valid starting point for a new branch. Want to see what the database looked like ten minutes ago? Create a branch from that position. It is instant and costs nothing.
Scale-to-Zero Serverless Compute
Even after solving the branching problem, there is still a cost problem with compute. A traditional database server runs continuously — it uses CPU and memory even when no queries are happening. An agent fleet might have thousands of databases, most of which are idle most of the time. Running all of them as always-on servers is extremely wasteful. Serverless compute solves this by treating database compute as something you rent per request. When a query arrives, a compute node starts up, handles the query, and then shuts itself down after a period of inactivity. Storage is completely separate from compute — it lives in cheap object storage and is always available. An idle database costs almost nothing, which makes large agent fleets economically viable.
Autoscaling Compute
The CPU and memory given to a database session scales based on what the query actually needs. Between sessions, it scales all the way to zero — no idle cost at all.
Object-Storage Pages
Database pages are stored in S3 or equivalent object storage rather than on expensive attached disks. This makes idle storage cost close to zero and makes capacity effectively unlimited.
Lakehouse Sync
Transactional changes flow automatically into analytics formats like Delta Lake or Apache Iceberg. Agents write to the operational database and the analytics layer stays in sync.
Cold-Start Budget
The first query after the database has been idle for a while pays a warm-up cost — typically 100ms to 2 seconds. After that, the session is warm and subsequent queries are fast.