
The problem
Unknown-vendor detection is the slowest fraud and shadow-spend pattern to surface in a typical AP system. A new vendor is added; one or two invoices go through; nobody notices for months until quarterly vendor-review surfaces an unfamiliar name. By then there's a vendor-master entry, several posted invoices, and possibly a small expense pattern that bypassed procurement.
The signal is available at entry time. A vendor with no prior history in the database is, by definition, unknown to the prediction system. The query returns low confidence; the system says "no prediction yet — first-time vendor." That should be a flag at PO entry, not a discovery at quarterly review.
How it works
_predict on the PO entry returns a calibrated probability per prediction target (account, cost center, approver). When the vendor has no prior history, all predictions return low confidence — typically below 30%. The application reads this as the unknown-vendor signal and flags the PO for manual review.
The mechanism composes with the broader anomaly pattern. Inverse-prediction anomaly catches misrouted invoices from known vendors; unknown-vendor anomaly catches the rest. Together they form the anomaly trio (with amount-spike); all three surface from the same _predict operator, the same calibration.
{
"from": "invoices",
"where": {
"vendor": "NewVendor LLC",
"category": "office_supplies",
"amount": 1240.00
},
"predict": "gl_code",
"select": ["$p", "$why"],
"limit": 1
}
// $p < 0.30 across all predictions → flag as unknown-vendor.For the full architecture, see the technology overview. For the broader narrative across multiple use cases, read The Predictive Application.
See it live
This use case runs in the 📋 ERP demo today. Click through to the live application and inspect the queries that produce the result. Source is on GitHub under Apache 2.0.
Frequently asked
How is this different from a procurement-policy vendor-master rule?
Procurement policy enforces "new vendors require approval before first invoice." Unknown-vendor detection enforces "first invoice from any vendor surfaces for review, regardless of whether the vendor was pre-approved." The two layers compose; the unknown-vendor detection catches the edge cases where the policy was bypassed.
Does this work in multi-tenant deployments?
Yes. Conditional probability is scoped per tenant via customer_id. A vendor known to one tenant is treated as unknown to another tenant the first time it appears there. Multi-tenant accounting deployments at accounting.aito.ai run this pattern across 255 tenants.
What threshold defines "unknown" vs "low-confidence-known"?
Typical threshold: top-1 prediction confidence < 30% across the prediction targets. Above 30%, the vendor is treated as known-but-uncertain (predictions surface as suggestions). Below, the vendor is treated as unknown (manual review required).
Can the detection be tuned per category or amount?
Yes. Stricter thresholds for high-amount or high-stakes categories; looser for routine spend. "Unknown-vendor + amount > €5,000" is the strictest cell; "unknown-vendor + amount < €100" can be looser. Application-level configuration on top of the prediction.
Does the system learn from approved unknown-vendor cases?
Yes. Once the vendor's first invoice is approved and posted, the data enters the index. The next invoice from the same vendor has 1 observation in history; by the third, it's typically known with usable confidence. The transition from unknown to known is automatic.



