Antti Rauhala
Co-founder
May 15, 2026 • 13 min read
When a SaaS team evaluates Aito, the first technical question is rarely about accuracy. It is about scale. Will it still work at 1 million invoices? At 10 million? With dozens of tenants on shared infrastructure, all querying at the same time?
These are fair questions. A predictive database that delivers great accuracy on a 10,000-row demo but melts at 5 million rows is not a product, it is a science project. So we put Aito on a bench, ran a multi-tenant invoice routing workload from 1,000 rows up to 10 million, measured what actually happens, and made sure to measure the realistic thing: a real HTTP request hitting Aito's API the way a production app would hit it. Not an in-process call that skips the parser, the protocol, and the text-tokenisation work. Those numbers are easy to make look good. They are also, as we found out, not what production looks like.
This post walks through what the realistic numbers actually are.
Throughput benchmarks for normal databases focus on inserts per second and query latency. A predictive database has the same dimensions plus a few extra:
The benchmark below targets all five. The headline result is the one most people care about: steady-state predict latency at 10 million rows, end-to-end through HTTP, stays under 200 milliseconds.
The dataset is a synthetic invoice routing workload. The schema mirrors what we see in production: a 4-table linked structure with companies, employees, GL codes, and invoices. Each invoice has three text fields (sender, product, free-text description) and three prediction targets:
The query is the same one our customers actually run in production:
{
"from": "invoices",
"where": {
"sender": "...",
"product": "...",
"description": "...",
"processor.company": 13
},
"predict": "processor",
"limit": 144
}
The test loads a fresh database, optimises the state (a one-time merge step that reduces query latency at the cost of a couple of extra minutes during ingestion), and then fires 64 random predict queries at the HTTP API in batches of 1, 2, 4, 8, 16, and 32. The first batch is one query against a cold state. The later batches are run back-to-back so the JVM is hot, the on-disk pages are paged in, and the per-token caches are populated.
We ran this on a single developer workstation with JDK 17 and a 12 GB heap. No special tuning. The booktest sources are public in the Aito core repository.
Mean predict latency in milliseconds per query, on the optimised 10-million-row state, by batch:
| Batch | Processor | Acceptor | GL code |
|---|---|---|---|
| 0–1 (cold) | 5,547 ms | 1,753 ms | 957 ms |
| 1–3 | 342 ms | 195 ms | 118 ms |
| 3–7 | 213 ms | 192 ms | 102 ms |
| 7–15 | 231 ms | 136 ms | 169 ms |
| 15–31 | 204 ms | 163 ms | 109 ms |
| 31–63 (steady) | 186 ms | 144 ms | 109 ms |
Two stories in one table.
The cold first query is a few seconds. When Aito first opens a 10-million-row state and runs its first predict, it pays a roughly 5.5-second tax on the processor query. This is not a bug. It is the cost of warming up: the JVM JIT has to compile the predict path, the per-text-field token analysis structures need to be materialised in memory, and the on-disk index pages need to be paged in by the OS. None of this happens for free. The acceptor and GL code first queries are smaller (1.8 seconds and 1.0 second) because by the time they run, most of that work is already done. This processor cold cost used to be far worse — around 18 seconds in an earlier build — and engineering work over the spring brought it down roughly 3.3x, into normal-database territory. There is still headroom, but it is no longer the number the post has to explain away.
Once warm, latency settles in the low hundreds of milliseconds. From batch 1–3 onward, every query lands between roughly 100 and 340 ms across all three target fields, settling under 200 ms once fully warm. Variance is tight. There is no thermal runaway, no GC stall pile-up, no memory growth that gradually slows the system down across 63 queries.
For a SaaS use case, this means one thing: pre-warm Aito on deploy by firing a representative query against each tenant's state, and from then on you are in the predictable 100-to-200 ms band. That is comfortable territory for a predict-on-button-click UI feature, and it is fine for a workflow step in an automated pipeline.
The bigger question is what the scaling curve looks like across orders of magnitude. We ran the same benchmark at 1k, 10k, and 100k rows, and at 1M and 10M (post-optimize). Steady-state mean predict latency in milliseconds:
| Scale | Processor | Acceptor | GL code |
|---|---|---|---|
| 1 k | 40 ms | 35 ms | 28 ms |
| 10 k | 45 ms | 39 ms | 30 ms |
| 100 k | 51 ms | 40 ms | 28 ms |
| 1 M | 101 ms | 56 ms | 36 ms |
| 10 M | 186 ms | 144 ms | 109 ms |
A 10,000-fold increase in row count, from 1k to 10M, takes processor predict from 40 ms to 186 ms. That is a 4.6x latency increase for a 10,000x data increase. The scaling is dramatically sub-linear because Aito's per-row cost amortises against bitset operations and disk-resident indexes that scale logarithmically (or are constant) for most of the work.
Predictions stay fast enough for an interactive UI even as tenant data grows from thousands to tens of millions of rows. Under the hood, this comes from disk-resident bitset indexes that scale logarithmically with state size, a persistent cache for cross-table linkage construction so the same evidence is not rebuilt on every query, and lazy candidate scoring whose cost depends on the candidate set per query rather than the total row count. There is no separate retraining step to schedule, no model versioning to manage, and no infrastructure that scales differently from the rest of the database.
The shape of the curve is best read as two regimes. Below ~100k rows, latency is essentially flat (roughly 30 to 50 ms across all three targets) because per-token caches and per-field statistics dominate the cost and the dataset is small enough that they barely move. Above 100k, latency grows but slowly: the per-query cost is dominated by candidate-scoring math that depends on the candidate set size (around 144 employees per company), not the total invoice count. Going from 100k to 10M is a 100x data increase that takes processor predict from 51 ms to 186 ms, a 3.6x latency multiplier for 100x more data.
This is the property that makes Aito viable behind a SaaS UI. You do not have to design around prediction latency. You ship a query like any other.
Latency means nothing if accuracy collapses with scale. Top-1 accuracy across the 63-query test set, by target, at 10 million rows:
| Field | Top-1 accuracy | Top-3 accuracy |
|---|---|---|
| Processor | 78% | 87% |
| Acceptor | 94% | 97% |
| GL code | 87% | 97% |
The synthetic data is intentionally noisy (random syllable combinations rather than realistic supplier names, probabilistic routing rules rather than deterministic ones), which makes the absolute numbers a conservative floor for what real customer data would deliver. The point is the shape: more data does not break Aito's predictions, it improves them, and there is no regime where accuracy degrades as the dataset grows.
Two operational numbers worth knowing.
Optimisation wall. After bulk-loading data, Aito has an optional optimize step that merges segment files for faster steady-state predict. It is a one-time cost that pays itself back many times over.
| Scale | Optimise wall |
|---|---|
| 1 M invoices | 37 s |
| 10 M invoices | 535 s (~8.9 min) |
For a tenant that loads data once and then queries forever, optimisation is run once at onboarding. For a tenant that grows continuously, optimisation runs as a background maintenance task. Either way, predict latency stays in the steady-state band reported above. Without optimisation, the same query path works but with more variance and higher mean latency.
Heap and disk. At 1 million rows, the JVM heap settles around 800 MB during ingestion and drops back to a few hundred MB after optimisation. At 10 million, the heap profile is similar (Aito leans on memory-mapped on-disk indexes rather than caching everything in heap). Memory does not grow query-to-query at any scale we have tested. There is no leak. Capacity planning works the way it does for a normal database: look at row counts and disk size, not at how much GPU memory the next training run will need.
Production data does not arrive in a single bulk load. It arrives a row at a time, in small batches, as new invoices are processed. Single-row HTTP commits at the API level take a bit under a second on the legacy path. Bulk batched commits land much faster: a 100,000-row batch loads in 16 seconds (around 160 microseconds per row, end-to-end through the HTTP API including JSON parsing and disk write).
For a multi-tenant SaaS workflow where new invoices arrive a few at a time and need to be queryable immediately, the practical pattern is to batch incoming inserts into windows of a few hundred rows and commit each window in one HTTP call. That keeps the per-row cost amortised and lets predict queries in the same JVM stay in the steady-state band.
Two of our customers run roughly opposite shapes of this workload in production.
A high-volume accounting-automation partner routes invoices for accounting clients across many tenants on a dedicated Aito node. Each tenant has its own database with its own employees, its own accounts, and its own historical patterns. Aito sees query traffic that looks essentially like the benchmark: a where-clause over text fields, a predict on a categorical target, scoped to one tenant's candidate set. Steady-state latencies on real customer data are in the same band as the synthetic benchmark above.
Helsingö runs on a shared multitenant Aito instance with smaller per-tenant data. Same engine, same query interface, no per-tenant ML pipeline.
The same predictive engine runs in shared multitenant deployments, dedicated single-tenant nodes, and pure on-premise IP-licensed installations (used by Q-Automate and Sisua Digital). The latency numbers above are not "shared cluster" numbers and "dedicated cluster" numbers. They are the engine's behaviour, which is the same wherever it runs.
A few things this benchmark does not measure that you might want to know:
Translating the numbers into practical points:
You can ship predictions in your UI. Steady-state 100-to-200 ms is fast enough to put a prediction behind a button click without a loading spinner. No background queue, no asynchronous workflow, no "we will email you when it is ready."
You do not need a per-tenant ML pipeline. One Aito node serves many tenants. The engine handles tenant isolation at the database level. Onboarding a new tenant is a database creation, not a model training job.
You do not pay for retraining. Predictions come from queries, which run on whatever data is in the database at query time. New rows are immediately visible. Schema changes do not invalidate a model, because there is no model to invalidate.
You do need to think about warm-up. First query after open is slow. Pre-warm on deploy. After that, latency is predictable.
Capacity planning is normal-database planning. Look at row counts, disk size, and concurrent query rate. Not GPU memory.
The buyer concern that started this post (will it scale?) has a short answer: yes, on the production engine, end-to-end through the HTTP API, up to at least 10 million rows on a developer workstation with realistic SaaS query traffic.
A next-generation storage and query engine is in active development, currently pre-beta and hidden behind a feature flag. On text-free workloads it already runs predict around 25% faster than the production engine in steady state, and the hard case (cross-table link workloads) has closed most of the gap to the production path over the spring. We expect parity-and-faster across the board over the coming quarter. The numbers in this post are from the production engine.
Free trial is at aito.ai. The full benchmark code is open and runs on a developer workstation. If you want to discuss a specific deployment shape (multi-tenant economics, on-premise constraints, or your particular query patterns), reach us at hello@aito.ai. We read every inbound message.
Antti Rauhala is co-founder of Aito.ai. Aito is a predictive database for B2B SaaS platforms, headquartered in Helsinki.
Back to blog listEpisto Oy
Putouskuja 6 a 2
01600 Vantaa
Finland
VAT ID FI34337429