Customer Segmentation

Behavioral segments derived from purchase and engagement history. No pre-labeled training set; segments emerge from co-occurrence in the data.

Customer Segmentation in production โ€” screenshot from the ๐Ÿ›’ E-commerce demo
๐Ÿ›’ E-commerce_relate_predictE-commerceCross-vertical
Production anchorBehavioral segments derived from purchase and engagement history; no pre-labeled training set, segments emerge from co-occurrence in the data.

The problem

Customer segmentation is the layer where most analytics teams burn time without producing actionable insight. The team picks a clustering algorithm (k-means, hierarchical, DBSCAN), feature-engineers some columns, runs it, gets segments that don't map to anything anyone can act on, then redoes the work because the segments need to be "more interpretable." The cycle repeats quarterly.

The segments the business actually needs are conditional on behavior, not derived from feature engineering. "Customers who buy category A also buy category B at high lift." "Customers from region X have a particular browse pattern." The conditional probability defines the segment; the clustering is unnecessary.

How it works

_relate over the customer ร— behavior matrix surfaces the strongest behavioral patterns. The resulting "segments" are defined by conditional behavior โ€” customers who exhibit pattern A together with pattern B at high lift form a coherent behavioral cluster. No clustering algorithm needed; the conditional probability does the work.

Segments derived this way are interpretable by construction. "Segment X = customers who buy dog-dryfood AND dog-dentaltreats AND filter for grain-free." The segment definition IS its description; the team can act on it directly. As behavior shifts (new product, new pattern, seasonality), the conditional probability updates and the segment definitions evolve naturally.

{
  "from": "order_lines",
  "where": {
    "product.category": "dog-dryfood",
    "product.tag": "grain-free"
  },
  "relate": ["customer.region", "customer.lifetime_value_band"],
  "select": ["$lift", "$p", "$support", "customer.region"],
  "limit": 10
}

For the full architecture, see the technology overview. For the broader narrative across multiple use cases, read The Predictive Application.

See it live

This use case runs in the ๐Ÿ›’ E-commerce demo today. Click through to the live application and inspect the queries that produce the result. Source is on GitHub under Apache 2.0.

Open the live demo โ†’

Frequently asked

How does this compare to k-means or hierarchical clustering?

Clustering algorithms produce numeric cluster IDs that need post-hoc interpretation. _relate-based segmentation produces interpretable conditions by construction โ€” the segment IS its behavioral pattern. Both can be useful; the lift-based approach is faster to act on.

Can segments be exported to email or campaign tools?

Yes. The customer list for a segment is the where clause that defines it. Export the list of customer_ids matching the condition; pipe to the campaign tool of your choice (HubSpot, Mailchimp, Klaviyo). The integration is HTTP/JSON.

How do segments handle customers who span multiple behaviors?

A customer can be in multiple segments โ€” there is no requirement for mutual exclusivity. The behaviors that define the segments overlap; customers who exhibit multiple behaviors appear in multiple segments. Campaign logic can filter to primary-segment based on lift or last-action.

Does this work for B2B customer segmentation (smaller customer counts, longer cycles)?

Yes. B2B has fewer customers but richer per-customer signals (contract value, contact patterns, account-level events). The same _relate machinery applies; thresholds tune lower because the absolute volume is smaller.

Can segments be used as input to other predictions?

Yes. Once a segment is defined, it can be used in the where clause of any downstream prediction. "Predict churn for customers in segment X" is just _predict with the segment condition in where. Segments compose with the rest of the predictive operators.