The Science Behind Aito's Predictive Database

Deep Dive into Bayesian Inference, Representation Learning, and Statistical Modeling

From Statistics to Predictions in Milliseconds

How Aito combines Bayesian inference with database indexing for instant ML

Aito represents a fundamental shift in machine learning implementation. Instead of training models, it calculates statistics on-demand, enabling real-time predictions without the traditional ML pipeline.

✓Bayesian inference engine for probabilistic reasoning

✓Representation learning eliminates feature engineering

✓Statistical indexes enable millisecond-scale predictions

✓Query-time modeling instead of pre-trained models

The Bayesian Approach: Why Statistics Beat Models

Traditional ML trains fixed models on historical data. Aito calculates probabilities dynamically using Bayes' theorem, enabling more flexible and explainable predictions.

No Training Phase

Predictions are calculated from raw statistics, not trained models

P(Y|X) calculated directly using indexed conditional probabilities

Automatic Feature Discovery

Bayesian networks discover relationships without manual feature engineering

Mutual information metrics identify predictive relationships

Built-in Uncertainty

Every prediction includes confidence scores for automation decisions

Posterior probabilities provide calibrated uncertainty estimates

The Mathematics Behind Instant Predictions

Aito implements sophisticated Bayesian inference through specialized database indexes

P(Y|X) = P(X|Y) × P(Y) / P(X)

Likelihood P(X|Y): Pre-computed conditional statistics

Prior P(Y): Category frequencies indexed for instant access

Evidence P(X): Normalized across all possible outcomes

Result: Calibrated probabilities in 20-200ms

How Aito Calculates Statistics Fast Enough for Ad-Hoc Modeling

The key innovation: specialized indexes that make statistical calculations as fast as database queries

Inverted indexes for features

Similar to search engines, but using indexes optimized for statistical operations

Word 'invoice' → items {id1, id2, id3} with counts

Database optimized for statistical operations

Database contains indexes, caches and precomputation designed for fast inference

Precomputed indexes and offsets over links, row feature vectors

Deep integration between inference and database

Inference uses database primitives directly and maintains low-level caches

Inference operates directly on indexes and maintains specialized segment-level caches

Performance Optimizations

Lazy evaluation: Only compute statistics needed for specific query

Caching layer: Frequently accessed statistics kept in memory

Parallel computation: Multi-threaded statistical aggregation

Incremental updates: Statistics updated on write, not query

Three Inference Approaches for Different Use Cases

Aito provides specialized inference methods optimized for different prediction scenarios

Predict: Classification & Regression

Direct Bayesian inference for single-target predictions

{
  "from": "invoices",
  "where": { 
    "vendor": "Acme Corp" 
  },
  "predict": "category"
}

Mechanics: Calculates P(category|vendor) using Bayes' theorem

Use Case: Invoice categorization, churn prediction, fraud detection

Recommend: Collaborative Filtering

Find items that maximize a goal probability

{
  "from": "impressions",
  "where": { 
    "user": "alice" 
  },
  "recommend": "product",
  "goal": { 
    "purchased": true 
  }
}

Mechanics: Ranks items by P(goal|item,context)

Use Case: Product recommendations, next-best-action, content discovery

Generic Queries: Custom Statistics

Flexible statistical queries for complex scenarios

{
  "from": "impressions",
  "where": { 
    "user": "alice", 
    "product.name": {
      "$match": "milk"
    }
  },
  "get": "products",
  "orderBy": { 
    "$multiply": [
      { 
        "$similarity": { 
          "name": "milk" 
        } 
      },
      { 
        "$p": {
          "$context": { 
            "click": true 
          } 
        } 
      }
    ]
  }
}

Mechanics: Combines similarity search with probabilistic ranking

Use Case: Semantic search, anomaly detection, pattern discovery

Beyond Traditional ML: Automatic Context Recognition

Aito's representation learning automatically discovers high-level concepts from raw data, eliminating the need for manual feature engineering

Traditional ML requires explicit feature engineering - manually creating features like 'is_weekend' or 'high_value_customer'. Aito automatically discovers these patterns through representation learning based on minimum description length (MDL) principles.

Pattern Recognition

Identifies co-occurring features that form meaningful concepts

Walking like a duck + Swimming like a duck + Quacking like a duck
→ Concept: "duck"

Automatic Feature Combination

Combines low-level features into high-level representations

{
  "$and": [
    { "$has": "rent" },
    { "$has": "real" },
    { "$has": "estate" }
  ]
}
→ Concept: "real estate rental"

Context-Aware Matching

Recognizes entities and relationships without explicit configuration

"Invoice to Johnson in IT department"
→ Automatically matches to employee records and department patterns

Benefits of Representation Learning

✓No manual feature engineering required

✓Discovers domain-specific patterns automatically

✓Handles new entities through similarity matching

✓Adapts to data changes without retraining

Intelligent Matching and Prior Knowledge

How Aito uses word matching and Bayesian priors for robust predictions

Prediction target priors

Uses metadata to assign initial probabilities to prediction targets

Technique: Bayesian priors based on prediction target metadata

If customer prefers bread, increase purchase probability for items with bread property

Text match priors

Automatically leverages word and feature matches in inference

Technique: Bayesian priors based on cross-field matching statistics

If description field contains 'Sarah', assume correct processor is Sarah, even if this exact combination hasn't been seen before

Metadata-based priors

Leverages known metadata to inform predictions about new features

Technique: Bayesian priors based on metadata similarity

If user has vegetarian metadata tag, infer vegetarian preferences based on similar users

Smart Use of Prior Knowledge

Bayesian priors prevent overfitting and handle sparse data

New vendor with no history

Approach: Uses category distribution of similar vendors as prior

Benefit: Reasonable predictions even with zero examples

Rare edge case

Approach: Balances specific evidence with general patterns

Benefit: Avoids overfitting to anomalies

Evolving patterns

Approach: Continuous prior updating as data accumulates

Benefit: Adapts to drift without retraining

Advanced Query Operators for Ad-Hoc Feature Engineering

Powerful operators that enable custom feature engineering at query time

$on: Conditional Context

Focus predictions on specific data subsets

{
  "from": "invoices",
  "where": {
    "region": "EU", 
    "$on": [
      { 
        "amount": { 
          "$gt": 10000 
        } 
      },
      { "region": "EU" }
    ]            
  },
  "predict": "approval_time"
}

Use case: Segment-specific predictions without separate models

$knn: K-Nearest Neighbors

Find similar items using weighted feature similarity

{
  "from": "products",
  "where": {
    "$knn": {
      "k": 10,
      "near": { 
        "name": "Acme CRM", 
        "category": "software"
      }
    }
  }
}

Use case: Similarity-based recommendations and matching

$numeric: Smart Binning

Automatic discretization of continuous variables

{
  "from": "customers",
  "where": {
    "age": { "$numeric": 25 }
  },
  "predict": "churn_risk"
}

Use case: Handle numeric features without manual binning

Combining Operators for Complex Logic

{
  "from": "transactions",
  "where": {
    "$on": [
      {
        "$knn": { 
          "k": 5, 
          "near": { "description": "AWS Consulting" }
        },
        "amount": { "$numeric": { "$gte": 1000 } }
      }
      },
      { "region": "EU" }
    ]
  },
  "predict": "fraud_risk"
}

Build sophisticated feature engineering logic without code

Performance Validation: Benchmarks and Evaluations

Rigorous testing demonstrates Aito's performance compared to traditional ML approaches

Accuracy Comparison Across Datasets

Comprehensive benchmarks across multiple UCI datasets compared to traditional ML methods

Dataset	Aito	Random Forest	SVM	Insight
Spam Detection	95%	97%	87%	Competitive with best ensemble method
Splice Junction	93%	96%	94%	Strong performance on genomic data
Shuttle	98%	99%	78%	Near-optimal multi-class classification
Invoice Processing	98%	99%	26%	Excellent real-world business performance

Aito achieves 96% mean accuracy across benchmarks, ranking 3rd out of 8 methods while requiring zero feature engineering

Enterprise-Scale Performance Metrics

Query Response Time

20-200ms

Even at 10M+ record scale

Massive Scale Processing

Sub-200ms at 10M records

Linear scalability with 100% reliability

High-Accuracy Predictions

84-91% at scale

Better accuracy with larger datasets

Memory Efficiency

150-300MB for 10M records

Predictable resource scaling

Feature Engineering

Automatic

vs weeks of manual work

Model Management

Zero maintenance

vs continuous retraining pipelines

Production Deployments

Posti Group

Purchase invoice automation

Scale: 3,000 invoices/month

Accuracy: Automated majority of invoices

Implementation: RPA integration via UiPath

Helsingö

Product catalogue matching (IKEA to Helsingö)

Scale: Started with 47 samples, demonstrating small-data capability

Accuracy: Learns and adapts continuously

Implementation: Replaced unmaintainable if-else logic

Jokiväri

Purchase invoice processing

Scale: Home renovation company operations

Accuracy: 80% automation rate

Implementation: Hours from evaluation to deployment

Try It Yourself: Interactive Query Examples

Explore different query types with our interactive widget

Prediction with Confidence

Classify with calibrated confidence scores

Aito Query

{
  "from": "invoices",
  "where": {
    "Description": "Aws Cloud"
  },
  "predict": "Processor",
  "select": [
    "$p",
    "Name",
    "Department",
    "Role"
  ]
}

Ready to Explore the Science?

Dive deeper into Aito's technical documentation or try it with your own data

Address

Episto Oy

Putouskuja 6 a 2

01600 Vantaa

Finland

VAT ID FI34337429

Contact

About us Contact us Partner with us Join our Slack community

The Science Behind Aito's Predictive Database

From Statistics to Predictions in Milliseconds

How Aito combines Bayesian inference with database indexing for instant ML

The Bayesian Approach: Why Statistics Beat Models

No Training Phase

Automatic Feature Discovery

Built-in Uncertainty

The Mathematics Behind Instant Predictions

How Aito Calculates Statistics Fast Enough for Ad-Hoc Modeling

Inverted indexes for features

Database optimized for statistical operations

Deep integration between inference and database

Performance Optimizations

Three Inference Approaches for Different Use Cases

Predict: Classification & Regression

Recommend: Collaborative Filtering

Generic Queries: Custom Statistics

Beyond Traditional ML: Automatic Context Recognition

Pattern Recognition

Automatic Feature Combination

Context-Aware Matching

Benefits of Representation Learning

Intelligent Matching and Prior Knowledge

Prediction target priors

Text match priors

Metadata-based priors

Smart Use of Prior Knowledge

New vendor with no history

Rare edge case

Evolving patterns

Advanced Query Operators for Ad-Hoc Feature Engineering

$on: Conditional Context

$knn: K-Nearest Neighbors

$numeric: Smart Binning

Combining Operators for Complex Logic

Performance Validation: Benchmarks and Evaluations

Accuracy Comparison Across Datasets

Enterprise-Scale Performance Metrics

Query Response Time

Massive Scale Processing

High-Accuracy Predictions

Memory Efficiency

Feature Engineering

Model Management

Production Deployments

Posti Group

Helsingö

Jokiväri

Try It Yourself: Interactive Query Examples

Try It Yourself: Interactive Query Examples

Prediction with Confidence

Ready to Explore the Science?

New integration! Aito Instant Predictions app is now available from Airtable Marketplace.

Address

Contact

Shortcuts