The Science Behind Aito's Predictive Database

Deep Dive into Bayesian Inference, Representation Learning, and Statistical Modeling

From Statistics to Predictions in Milliseconds

How Aito combines Bayesian inference with database indexing for instant ML

Aito represents a fundamental shift in machine learning implementation. Instead of training models, it calculates statistics on-demand, enabling real-time predictions without the traditional ML pipeline.

Bayesian inference engine for probabilistic reasoning
Representation learning eliminates feature engineering
Statistical indexes enable millisecond-scale predictions
Query-time modeling instead of pre-trained models

The Bayesian Approach: Why Statistics Beat Models

Traditional ML trains fixed models on historical data. Aito calculates probabilities dynamically using Bayes' theorem, enabling more flexible and explainable predictions.

No Training Phase

Predictions are calculated from raw statistics, not trained models

P(Y|X) calculated directly using indexed conditional probabilities

Automatic Feature Discovery

Bayesian networks discover relationships without manual feature engineering

Mutual information metrics identify predictive relationships

Built-in Uncertainty

Every prediction includes confidence scores for automation decisions

Posterior probabilities provide calibrated uncertainty estimates

The Mathematics Behind Instant Predictions

Aito implements sophisticated Bayesian inference through specialized database indexes

P(Y|X) = P(X|Y) × P(Y) / P(X)
Likelihood P(X|Y): Pre-computed conditional statistics
Prior P(Y): Category frequencies indexed for instant access
Evidence P(X): Normalized across all possible outcomes
Result: Calibrated probabilities in 20-200ms

How Aito Calculates Statistics Fast Enough for Ad-Hoc Modeling

The key innovation: specialized indexes that make statistical calculations as fast as database queries

Inverted indexes for features

Similar to search engines, but using indexes optimized for statistical operations

Word 'invoice' → items {id1, id2, id3} with counts

Database optimized for statistical operations

Database contains indexes, caches and precomputation designed for fast inference

Precomputed indexes and offsets over links, row feature vectors

Deep integration between inference and database

Inference uses database primitives directly and maintains low-level caches

Inference operates directly on indexes and maintains specialized segment-level caches

Performance Optimizations

Lazy evaluation: Only compute statistics needed for specific query
Caching layer: Frequently accessed statistics kept in memory
Parallel computation: Multi-threaded statistical aggregation
Incremental updates: Statistics updated on write, not query

Three Inference Approaches for Different Use Cases

Aito provides specialized inference methods optimized for different prediction scenarios

Predict: Classification & Regression

Direct Bayesian inference for single-target predictions

{
  "from": "invoices",
  "where": { 
    "vendor": "Acme Corp" 
  },
  "predict": "category"
}
Mechanics: Calculates P(category|vendor) using Bayes' theorem
Use Case: Invoice categorization, churn prediction, fraud detection

Recommend: Collaborative Filtering

Find items that maximize a goal probability

{
  "from": "impressions",
  "where": { 
    "user": "alice" 
  },
  "recommend": "product",
  "goal": { 
    "purchased": true 
  }
}
Mechanics: Ranks items by P(goal|item,context)
Use Case: Product recommendations, next-best-action, content discovery

Generic Queries: Custom Statistics

Flexible statistical queries for complex scenarios

{
  "from": "impressions",
  "where": { 
    "user": "alice", 
    "product.name": {
      "$match": "milk"
    }
  },
  "get": "products",
  "orderBy": { 
    "$multiply": [
      { 
        "$similarity": { 
          "name": "milk" 
        } 
      },
      { 
        "$p": {
          "$context": { 
            "click": true 
          } 
        } 
      }
    ]
  }
}
Mechanics: Combines similarity search with probabilistic ranking
Use Case: Semantic search, anomaly detection, pattern discovery

Beyond Traditional ML: Automatic Context Recognition

Aito's representation learning automatically discovers high-level concepts from raw data, eliminating the need for manual feature engineering

Traditional ML requires explicit feature engineering - manually creating features like 'is_weekend' or 'high_value_customer'. Aito automatically discovers these patterns through representation learning based on minimum description length (MDL) principles.

Pattern Recognition

Identifies co-occurring features that form meaningful concepts

Walking like a duck + Swimming like a duck + Quacking like a duck
→ Concept: "duck"

Automatic Feature Combination

Combines low-level features into high-level representations

{
  "$and": [
    { "$has": "rent" },
    { "$has": "real" },
    { "$has": "estate" }
  ]
}
→ Concept: "real estate rental"

Context-Aware Matching

Recognizes entities and relationships without explicit configuration

"Invoice to Johnson in IT department"
→ Automatically matches to employee records and department patterns

Benefits of Representation Learning

No manual feature engineering required
Discovers domain-specific patterns automatically
Handles new entities through similarity matching
Adapts to data changes without retraining

Intelligent Matching and Prior Knowledge

How Aito uses word matching and Bayesian priors for robust predictions

Prediction target priors

Uses metadata to assign initial probabilities to prediction targets

Technique: Bayesian priors based on prediction target metadata

If customer prefers bread, increase purchase probability for items with bread property

Text match priors

Automatically leverages word and feature matches in inference

Technique: Bayesian priors based on cross-field matching statistics

If description field contains 'Sarah', assume correct processor is Sarah, even if this exact combination hasn't been seen before

Metadata-based priors

Leverages known metadata to inform predictions about new features

Technique: Bayesian priors based on metadata similarity

If user has vegetarian metadata tag, infer vegetarian preferences based on similar users

Smart Use of Prior Knowledge

Bayesian priors prevent overfitting and handle sparse data

New vendor with no history

Approach: Uses category distribution of similar vendors as prior

Benefit: Reasonable predictions even with zero examples

Rare edge case

Approach: Balances specific evidence with general patterns

Benefit: Avoids overfitting to anomalies

Evolving patterns

Approach: Continuous prior updating as data accumulates

Benefit: Adapts to drift without retraining

Advanced Query Operators for Ad-Hoc Feature Engineering

Powerful operators that enable custom feature engineering at query time

$on: Conditional Context

Focus predictions on specific data subsets

{
  "from": "invoices",
  "where": {
    "region": "EU", 
    "$on": [
      { 
        "amount": { 
          "$gt": 10000 
        } 
      },
      { "region": "EU" }
    ]            
  },
  "predict": "approval_time"
}

Use case: Segment-specific predictions without separate models

$knn: K-Nearest Neighbors

Find similar items using weighted feature similarity

{
  "from": "products",
  "where": {
    "$knn": {
      "k": 10,
      "near": { 
        "name": "Acme CRM", 
        "category": "software"
      }
    }
  }
}

Use case: Similarity-based recommendations and matching

$numeric: Smart Binning

Automatic discretization of continuous variables

{
  "from": "customers",
  "where": {
    "age": { "$numeric": 25 }
  },
  "predict": "churn_risk"
}

Use case: Handle numeric features without manual binning

Combining Operators for Complex Logic

{
  "from": "transactions",
  "where": {
    "$on": [
      {
        "$knn": { 
          "k": 5, 
          "near": { "description": "AWS Consulting" }
        },
        "amount": { "$numeric": { "$gte": 1000 } }
      }
      },
      { "region": "EU" }
    ]
  },
  "predict": "fraud_risk"
}

Build sophisticated feature engineering logic without code

Performance Validation: Benchmarks and Evaluations

Rigorous testing demonstrates Aito's performance compared to traditional ML approaches

Accuracy Comparison Across Datasets

UCI Machine Learning Repository benchmark results

DatasetAitoRandom ForestSVMInsight
Spam Detection95%97%87%Competitive with ensemble methods
Shuttle98%99%78%Excellent on multi-class problems
Letter Recognition88%91%82%Strong pattern recognition
Adult Income82%85%79%Handles mixed data types well

Aito achieves 96% mean accuracy across standard benchmarks, ranking 3rd out of 8 methods tested

Speed vs Accuracy Trade-offs

Query Response Time

20-200ms

vs hours/days for model training

Accuracy

90-98% typical

vs 92-99% for specialized models

Feature Engineering

Automatic

vs weeks of manual work

Model Management

Zero

vs continuous retraining

Production Deployments

Posti Group

Purchase invoice automation

Scale: 3,000 invoices/month
Accuracy: Automated majority of invoices
Implementation: RPA integration via UiPath
Helsingö

Product catalogue matching (IKEA to Helsingö)

Scale: Started with 47 samples, demonstrating small-data capability
Accuracy: Learns and adapts continuously
Implementation: Replaced unmaintainable if-else logic
Jokiväri

Purchase invoice processing

Scale: Home renovation company operations
Accuracy: 80% automation rate
Implementation: Hours from evaluation to deployment

Try It Yourself: Interactive Query Examples

Explore different query types with our interactive widget

Try It Yourself: Interactive Query Examples

Explore different query types with our interactive widget

Prediction with Confidence

Classify with calibrated confidence scores

Aito Query
{
  "from": "invoices",
  "where": {
    "Description": "Aws Cloud"
  },
  "predict": "Processor",
  "select": [
    "$p",
    "Name",
    "Department",
    "Role"
  ]
}

Ready to Explore the Science?

Dive deeper into Aito's technical documentation or try it with your own data

New integration! Aito Instant Predictions app is now available from Airtable Marketplace.