A predictive application is a SaaS product where every form, queue, and dashboard is backed by live predictions instead of hard-coded rules. Predictive accounting means GL codes that auto-fill, approvers that route themselves, bank transactions that match invoices automatically, and anomalies that surface before posting — every prediction explained, every override learned from.
Aito is the predictive database underneath. No model training, no MLOps, no data scientist on staff. Add a row, the next prediction reflects it. Companies like Posti run 3,000+ invoices monthly through this pattern at 95% accuracy.
accounting.aito.ai is our open-source reference implementation of a multi-tenant predictive accounting SaaS — 255 customer companies, 128K invoices, all served from a single Aito instance with customer_id in the where clause. Same pattern you'd build on top of Aito for your own accounting product.
Nine predictive features, all from SQL-like queries:
$why factor decomposition._predict invoice_id call.category=telecom & gl_code=6200 → approver=Timo, 15.8× lift) and promote them to rules with audit trail._recommend against click history (CTR-ranked, like product recommendations)._evaluate accuracy on held-out samples, with per-case green/red diff vs. always-predict-majority baseline._relate mines correction patterns into promotable rules.customer_id → different prediction. No per-tenant infra.Every prediction returns confidence, top-3 alternatives, and a $why decomposition with token-level highlighting — the same building blocks you'd use to ship explainable predictions in your own product.
Two things are true at once.
The verifiable facts. Posti runs predictive AP through Aito at 95% accuracy on 3,000+ invoices per month, on a SAP + UiPath + Aito stack — same operators you'd use. Industry-wide, automated AP error rates land in the 0.1-0.8% range when properly configured. Best-in-class AP teams hit 60-80% touchless processing rates and process invoices at $2.78 each (vs. $5-10 manual). Top performers achieve 3.1-day cycle times vs. 14.6 days manual. These are real, citable numbers, not aspirations.
The conditional answer. All accounting processes are super-regular at their core — the consistency principle, procurement contracts, and chart-of-accounts taxonomies see to that. What differs between customers is how much of your incoming transaction volume matches prior patterns vs. arrives as a first-time pattern. Predictive accounting is a conditional-probability engine that works the same way on each transaction; how many of your transactions benefit from it depends on the volume mix.
Before the variance, the structural point most ML pitches miss: accounting data is more regular than typical ML training data, by law and by contract.
The GAAP consistency principle and the equivalent IFRS rules require companies to apply the same accounting methods consistently from period to period. That's a legal and audit requirement, not best practice. Same vendor invoicing the same company in two consecutive months → presumed to hit the same GL account unless there's a documented reason otherwise. Procurement contracts and chart-of-accounts taxonomies lock the mapping further.
This is why on regular accounting data, three observations of a stable multi-field pattern produces ~90% confidence with realized error around 1%. By five observations, error rates well under 1% are typical. The top confidence bucket (>95%) carries near-zero realized error in production — Aito's confidence is calibrated, so when it says >95%, the realized rate matches.
Underneath all three, the process is the same — same coding logic, same conditional-probability math, same calibrated confidence. What changes between deployments is the volume mix: how much of the incoming transaction stream lands on patterns with ≥3 prior examples vs. arrives as a first-time pattern.
| Profile | Examples | What the volume mix looks like | Range we typically see |
|---|---|---|---|
| Mostly-recurring | Recurring B2B AP (utilities, telco, scheduled supplier payments — the Posti shape); sales invoicing; bank reconciliation; PO-backed direct procurement; period-end JV automation | Stable customer/supplier base, same vendors invoicing the same accounts every month — the bulk of incoming transactions match patterns already seen many times | Near the top of industry benchmarks — 70-90%+ touchless; Posti at 95% accuracy is the public anchor |
| Mixed | Typical mid-market AP at a mature tenant; industrial maintenance with stable supplier base; multi-channel retail buying | Combination of repeat-pattern transactions and one-offs; meaningful share of project-specific spend or new-supplier onboardings each month | Mid-tier of industry benchmarks — 50-70% touchless on common configurations; Jokiväri reports 80% GL automation on Lemonsoft, on the higher end |
| Mostly-snowflake | Employee-expense AP (business travel, restaurants, taxis, ad-hoc reimbursements); construction project AP with subcontractor sprawl; professional services with project-specific passthroughs | People-driven or project-driven spend by construction — many incoming transactions are first-time patterns (new vendor, new combination). The coding logic is just as regular; the share of incoming transactions that match prior patterns is low | Headline touchless rate lower — we've seen ranges in the 20-40% area; field-level prediction still produces 3-4 of 5 fields pre-filled per invoice with $why, so manual-effort reduction is much larger than the touchless headline implies |
The conditional-probability machinery is the same across all three profiles. Three observations of any stable pattern produces ~90% confidence with ~1% realized error, regardless of profile. What differs is how many of your incoming transactions land on patterns with ≥3 prior examples vs. arrive as novel ones.
Find your profile in 15 minutes: run _evaluate on a held-out slice of your real data. Per-field accuracy and per-confidence-bucket calibration tell you which range your tenants sit in. Cheaper than a procurement cycle.
Where consistency rules apply, accounting data isn't noisy. The same supplier maps to the same GL 95%+ of the time; the same vendor × category routes to the same approver. The conditional probability collapses fast because there's not much disagreement to average over.
| Repetitions of a stable pattern | Top-1 confidence | Typical realized error |
|---|---|---|
| 0 | base rate ~5-15% — "no prediction yet" | n/a |
| 1 | ~50-70% — single suggestion | high |
| 3 | ~85-92% — auto-fill territory | often ~1% on regular data |
| 5 | ~92-95%+ | well under 1%, approaching the calibration floor |
| 10+ | at the drift ceiling | calibrated to confidence; near zero in the top bucket |
A subtle but important distinction: even when the workload is regular, not every field is equally stable. The split matters because it tells you which predictions need to be more reactive to recent overrides.
| Field type | Stability | Why |
|---|---|---|
| GL account, cost center, expense category, VAT % | High — long-term stable | Locked by chart-of-accounts taxonomy and consistency principle. Same vendor → same account year over year. |
| Payment terms, payment method | High — derived from amount, country, category | Not vendor- or person-dependent. |
| Approver / processor / reviewer | Medium — changes with personnel | Vacations, role changes, team restructuring break recurring person-keyed patterns. |
| One-off vendor accounts (snowflake tail) | Low by construction | No prior history to learn from; vendor-locked fields require human judgement. |
Two practical implications: (a) person-keyed predictions need to be more reactive to recent overrides than account-keyed predictions — handled naturally because every override is the next prediction's training signal; (b) chart-of-accounts reorganizations are documented events, so a brief drop in account-keyed accuracy after a reorg is expected and recoverable within ~3 observations of the new mapping.
Purchase invoices in particular carry a heavier snowflake tail than other accounting workloads. Employee business trips alone generate transportation, accommodation, restaurant, and taxi receipts that may never repeat in the same tenant. Statistical snowflakes by construction. No amount of data resolves them.
The honest answer: vendor-keyed predictions don't work on the snowflake tail. A vendor seen for the first time has zero conditional history. Aito returns honest low confidence, not a wrong high-confidence answer.
Different prediction targets depend on different signals. Not every field is vendor-locked:
| Field on a snowflake invoice | Predicted from | Useful confidence? |
|---|---|---|
| GL category | Description text + amount range | ✅ "Lunch meeting" → meals & entertainment |
| Approver | Employee / submitter | ✅ Their manager approves their expenses |
| Cost center | Project code or department | ✅ Tied to the person, not the vendor |
| VAT % | Amount + country + category | ✅ Not vendor-dependent |
| Specific GL account | Vendor history | ❌ Vendor-locked, falls to low confidence |
| Vendor master ID | Vendor history | ❌ Vendor-locked, falls to low confidence |
Smart Form Fill on a snowflake doesn't return blank fields. It returns field-level predictions with field-level confidence, with $why rendered so the reviewer sees what was used. Three or four of five fields auto-fill; the vendor-locked fields surface for human judgement.
Anchor 1 — recurring B2B AP (the Posti shape, "highly regular"). A workload dominated by recurring supplier payments looks like:
Posti runs at 95% accuracy on a workload close to this anchor.
Anchor 2 — heavy snowflake tail (employee-expense-heavy AP, "highly diverse"). Splits roughly into thirds:
$why rendered, so the human judges 1-2 fields, not all 5.Both are real production observations. Most products serve a mix of tenant shapes and land between the two anchors.
_evaluate per fieldThe single best 15-minute test is to run _evaluate on a held-out slice of your real data. Pay attention to two things:
Your UI keys off field-level confidence per prediction: auto-fill the high-confidence fields, suggest the medium, route the low to human review with $why. Cold-start handles itself field by field, confidence-bucket by confidence-bucket — the first three observations of any stable pattern already produce auto-fill-quality confidence. See the Cold-Start Honesty use case for the full breakdown.
Per-field accuracy on the recurring core caps around 92-95% from drift (vendor account changes, tax-code shifts, org reorgs, personnel changes for person-keyed fields). Per-field accuracy on the snowflake tail caps lower; that ceiling is structural. Tenant-wide headline accuracy is the volume-weighted blend.
A predictive AP product covering employee-expense-heavy tenants will see lower headline accuracy than one serving recurring-B2B tenants — that's the data, not the model. Both numbers are correct, and both are useful, as long as your UI is field-level confidence-aware.
User changes a predicted GL from 6200 to 6210 at 14:00:23. The next request that hits a similar invoice — different invoice, same vendor, same description — reflects the new pattern in its conditional probability. No retrain queue. No model deploy. No A/B holdout.
The override IS the training signal because there is no separate model file. Aito stores rows; predictions are computed from rows at query time. Write the override row, the next prediction reads it.
| In a typical ML pipeline | In a predictive database |
|---|---|
| User correction → label store | User correction → INSERT |
| Nightly retraining job | (none) |
| Model artifact + deploy | (none) |
| Canary, holdout, regression monitoring | (none) |
| 24-72 hours from correction → live | <1 second |
Two consequences that matter to a product team:
$why from the prediction. "Right to be forgotten" → DELETE the rows. There's no model artifact derived from user data sitting in a separate registry.Run the same predictions used by accounting.aito.ai (and by Posti and Jokiväri in production) directly from this page. These queries hit our hosted demo database — no signup, no setup.
Run live queries against accounting.aito.ai's demo database — 255 customer companies, 128K invoices, predictions scoped per-tenant via customer_id
Predict the GL code for a Telia Finland Oyj invoice belonging to customer CUST-0000
{
"from": "invoices",
"where": {
"customer_id": "CUST-0000",
"vendor": "Telia Finland Oyj",
"amount": 890.5
},
"predict": "gl_code",
"select": [
"$p",
"feature",
"$why"
]
}The fastest way to disqualify Aito for your use case is to run _evaluate on a sample of your real invoices.
_evaluate on a held-out slice. Get per-class accuracy, top-3 accuracy, confusion matrix, and a baseline-vs-Aito diff.If the lift over baseline is convincing, schedule a technical review. If not, you've spent an afternoon and saved a procurement cycle.
_evaluate is the same operator that powers the Quality Dashboard at accounting.aito.ai. Same code path you'd use in production to monitor accuracy per tenant.
Step 1: Capture & extract — Invoice arrives via OCR, e-invoicing, or manual entry. Vendor, amount, line items, dates land in your accounting database.
Step 2: Predict & explain — Your app calls Aito with the captured fields. Aito returns the predicted GL code, approver, cost centre, and payment terms — each with a probability, top-3 alternatives, and a $why decomposition that shows which input fields drove the prediction.
Step 3: Auto-apply or escalate — High-confidence predictions auto-fill and submit. Mid-confidence rows show the prediction but ask for confirmation. Low-confidence rows route to human review with the $why rendered as a tooltip.
Step 4: Learn from corrections — Every override goes back into the same Aito table. The next prediction reflects it instantly — no retraining, no model deployment.
If you're a 5-15 person product team building a predictive accounting SaaS, here's the realistic comparison:
Build it yourself. A working prediction loop with multi-tenancy, retraining, evaluation, and override capture takes ~6 months and ~1.5 FTEs (one ML engineer, ~0.5 backend engineer for plumbing). At €100-130K loaded ML salary plus infrastructure: year-1 TCO €200-400K, year-2 €120-200K maintenance. You get total control over the model internals.
Drop in Aito. API key and SQL-like queries get you to a working prediction loop in 1-2 weeks. Multi-tenancy is customer_id in where. Aito plans start at €0 (free sandbox), €75/mo (small production), €350/mo (growth), with custom enterprise tiers — see pricing. You give up some control over model internals; you gain ~5 months of engineering you didn't have to spend.
The breakeven question. If your product needs predictions as a feature, build-vs-buy is rarely the right framing — the question is time to first paying tenant. Six months later means six months less revenue, six months more burn, and a smaller dataset to learn from when you do ship.
The three honest alternatives we hear most:
vs. Rossum / Hypatos / Klippa (invoice-extraction-as-a-service). They solve "extract structured fields from a PDF." Aito solves "given the structured fields, predict the next ones (GL code, approver, cost center) using your tenant's own history." They're complementary, not competitive — most predictive accounting products use one of these for OCR and Aito for downstream prediction.
vs. GPT-4 + a vector database. LLMs are great at one-shot reasoning over unstructured input; they're expensive, slow (~1-3 sec), opaque (no $why), and don't get smarter when a user corrects them. Aito returns conditional probability with confidence and explanation in 20-200ms. For volumetric AP work — thousands of invoices/day per tenant — the unit economics don't work with an LLM in the hot path.
vs. building it on scikit-learn / Vertex AI / SageMaker. You can. You'll spend 6 months on it (see Build vs. Buy above). The thing you're building is a worse version of a predictive database, with retraining infrastructure you have to operate.
The wedge: a predictive database is the substrate. OCR sits in front of it. LLMs sit beside it for the cases that need free-form reasoning. Aito does the part that's repetitive, volumetric, and per-tenant — which is the bulk of accounting work.
If you're not building a new SaaS but augmenting an existing accounting/ERP install, Aito works as a prediction engine called via API. No migration, no rip-and-replace. Send invoice data to Aito, get back predictions with $why explanations. Your ERP applies high-confidence predictions automatically and flags the rest.
Works with any ERP that has an API — SAP, Oscar Software, Lemonsoft, NetSuite, or custom-built systems. Typical integration takes days, not months. See the ERP solutions page if you're building a new predictive ERP product rather than augmenting an existing one.
Finland's largest postal service operator automated their entire invoice processing workflow with Aito, achieving 95% accuracy in production deployment.
"Aito has provided Posti a fast and easy tool to implement machine learning to business processes, adding new opportunities to our Intelligent Automation toolkit."
This 40-year-old home renovation company automated their complex invoice processing across multiple subcontracted projects using Aito's predictive capabilities.
"The purchase invoice robot is able to account for invoices almost independently using artificial intelligence."
This accounting and payroll service provider saved 4 hours per week of CEO time by automating ticket routing to the right accounting specialists.
"Ability to create smart automations with Aito.ai is fundamentally helping us as we are growing."
If you're building a predictive accounting product for multiple customers, the same Aito table serves every tenant — customer_id in the where clause scopes the conditional probability to that tenant's history. accounting.aito.ai runs 255 tenants this way (3 enterprise / 12 large / 48 midmarket / 192 small), one shared Aito DB, sub-100ms predictions.
{
"from": "invoices",
"where": { "customer_id": "CUST-0001", "vendor": "Telia Finland Oyj" },
"predict": "gl_code"
}
Same vendor, different customer_id returns a different prediction — because the conditional probability is computed only over that tenant's rows. No per-tenant index, no per-tenant model, no per-tenant retraining schedule.
customer_id values) is opt-in for products that want shared infrastructure.DELETE the rows and the predictions reflect it from the next query forward.We do not currently hold a SOC 2 Type II attestation; we share our security posture and pen-test results under NDA on request. If your procurement requires SOC 2 today, contact us — we work with several customers in regulated finance under DPA + security questionnaire.
Ready to ship a predictive accounting product — or add predictive accounting to one you already have?
Try accounting.aito.ai → — see what a predictive accounting SaaS looks like end-to-end before you build one.
Read the source on GitHub → — Apache 2.0, 255-tenant reference architecture, 9 use-case guides.
Schedule a Technical Review → — walk through your specific accounting use cases with our team.
Start Free Trial → — point Aito at your own invoice data in our sandbox environment.
Episto Oy
Putouskuja 6 a 2
01600 Vantaa
Finland
VAT ID FI34337429