4.1 Customer Segmentation

A useful segmentation project should deliver an operating artifact, not just a slide deck. The real output is a segment_id that integrates with your CRM and ad platforms, along with a one-page brief for each segment: who they are, why they matter, and what your team should do differently.

This chapter builds that artifact in three steps:

  1. Rank customers by current value. Decile analysis and RFM give you a fast operating layer.
  2. Group customers by behavior. Behavioral clustering uncovers patterns that simple value tiers miss.
  3. Add purchase meaning. Embedding-based segmentation reveals what customers actually buy, not just how much.

Build Segments from Behavior, Profile Them with Demographics

Before building segments, separate two types of variables.

Behavioral variables describe what customers do: how often they buy, how recently they bought, how much they spend, which categories they buy, how broadly they shop, how often they return products, and how dependent they are on discounts.

Profile variables describe who they are or where you can reach them: age, income, household composition, geography, loyalty tier, and channel preference.

Use behavioral variables to create the segments. Use profile variables afterward to describe, target, and communicate with those segments.

This rule holds throughout the chapter. RFM uses behavioral signals to build value tiers. K-Means adds richer behavioral features to uncover broader patterns. Embedding-based segmentation brings in purchase text to capture what customers actually buy. Demographics still matter, but their main role comes after the segments are built: to explain, reach, and activate them.

Start Simple: Deciles and RFM

The quickest way to a usable segmentation skips clustering algorithms entirely. Start by ranking customers by revenue decile, then layer on RFM if you need more nuance.

Decile Analysis

Decile analysis divides customers into ten equal groups by revenue. The pattern usually follows the Pareto principle: a small share of customers drives most of the revenue. D1 gets retention and VIP treatment. D2 to D5 get growth campaigns—this is where the biggest upside is. D8 to D10 get only low-cost automated nurture. You can set this up in under an hour and give your team a clear action plan.

Figure 1: Typical decile distribution. Decile 1 contributes roughly 35% of revenue, with a steep drop-off.

RFM: Adding Behavioral Nuance

RFM (Recency, Frequency, Monetary) gives a more nuanced view. It is the most widely used segmentation framework in direct marketing because marketing teams already think in these terms: lapsed high-spenders, new frequent buyers, and so on.

Table 1: RFM variables and their interpretation.
Variable Definition Better Direction
Recency (R) Days since the customer’s last purchase Lower is better
Frequency (F) Number of purchases in the observation period Higher is better
Monetary (M) Total spend in the observation period Higher is better

Independent sorting splits each variable into quintiles separately, creating a three-digit score (R=5, F=4, M=3) and up to 125 groups. Sequential sorting nests the splits: first by Recency, then Frequency within each Recency group, then Monetary — so Recency gets the most weight.

Decile and RFM are quick to set up and easy to use. But they only create value tiers, not full customer profiles. Two customers with similar RFM scores can behave very differently: one might buy full-price items across many categories, while another buys only discounted items in a single category. To see those differences, you need to move from value tiers to richer behavioral variables.

Behavioral Clustering with K-Means

K-Means lets you cluster on multiple behavioral variables without manual cutoffs. The practical setup is straightforward: scale features with StandardScaler so dollar amounts do not overwhelm the others, and start with 6 to 10 behavioral variables. Using fewer often just recreates RFM with extra steps.

Choosing K is part statistics, part judgment. The elbow method and silhouette score help narrow the range. For most businesses, 4 to 8 segments is the practical sweet spot. If your marketing team cannot describe a different action for each segment, you probably have too many.

Figure 2: Elbow (left) and silhouette (right) plots; both point to K=3.

Numeric clustering still flattens product meaning. With inputs like frequency, spend, recency, category breadth, and discount dependency, two customers can look similar even if the products they buy are very different. To capture that meaning, bring product text directly into the segmentation.

Embedding-Based Segmentation

To capture what customers buy, treat each customer’s purchase history as text. Concatenate their purchased product descriptions into a single string—the customer document—and use an embedding model to turn that string into a dense vector. In this embedding space, birthday candles and party plates end up close together, while antique teapot sits far away. Cluster on these vectors and the segments split by purchase meaning, not just purchase volume.

Constructing the Customer Document

Three practical decisions shape the quality of the segments more than anything else.

Which products go in. Top-N purchased items (by frequency or revenue) usually outperform the full purchase history. The full list adds noise from one-off buys; the top items capture the customer’s actual pattern.

Whether to weight by revenue. Including a high-value item once versus a low-value item once can misrepresent what the customer cares about. Revenue-weighting — repeating product mentions in proportion to spend — sharpens the segments for customers with mixed baskets. Cap the weights or use log-spend weighting when price differences across products are large; otherwise a single luxury purchase can dominate the document.

Filtering generic products. Items like “Gift Card,” “Postage,” or generic SKUs are noise: they appear across all segments and dilute the keywords. Filter them out by StockCode patterns or maintain a small exclusion list. This single step often makes the difference between segments that mean something and segments that mean nothing.

A lightweight sentence-embedding model is enough for most retail catalogs. The quality of your segments depends much more on how you design and validate the customer document than on which embedding model you pick.

Reading the Output

The payoff is a keyword fingerprint for each segment:

Top distinctive keywords per segment, learned from product-description embeddings. *Candle, holder, glass, vintage* signals a home-décor segment; *bag, lunch, school, kids* signals a children's-product segment. Numeric-only clustering on RFM cannot reveal these differences.
Figure 3: Top distinctive keywords per segment from product-description embeddings.

Give these panels to a marketer and they should quickly see which categories, campaigns, and creative fit each group.

When One Theme Dominates

Real catalogs often have one broad theme covering most customers and a tail of sharp niches at the edges. HDBSCAN surfaces both — but after outlier reduction the broad theme inflates into a single mega-topic that is too large to act on as one segment. The fix is surgical: run K-Means on just the mega-topic’s embeddings and split it into activation-sized sub-segments, while keeping the small niche topics as discovery findings. This applies the “HDBSCAN for discovery, K-Means for operations” pattern inside the dominant cluster, not on the full base. The companion notebook does this automatically when any topic exceeds 50% of the base.

The LLM-Naming Trap

LLM-assisted naming can make segments feel more credible than they really are. Feed any cluster—even a random one—into an LLM and ask for a name and persona, and you will get a polished, convincing description. The narrative sounds plausible whether the cluster is meaningful or not.

Two safeguards:

  • Validate the keyword evidence before reading the LLM’s narrative. If you cannot see a clear theme in the top keywords yourself, the LLM’s name is fiction.
  • Test the name with marketing. If they would write the same campaign brief for two differently-named segments, the names are decorative, not informative.
Note

Companion notebooks

The full working code is split across two notebooks, each on the dataset that suits its method:

  • traditional_segmentation.py — decile, RFM, and K-Means on the Dunnhumby “The Complete Journey” dataset. This is the same dataset the CLV chapter uses, so segment membership carries through.
  • semantic_segmentation.py — the embedding-based pipeline (customer-document construction with revenue weighting, generic-product filtering by StockCode, embedding, clustering, keyword extraction, and LLM-assisted naming) on the UCI Online Retail II dataset, which has the rich product descriptions this method needs.
NoteRun these notebooks

Committed with outputs; Colab is best-effort. Full list on the Notebooks & Code page.

Is Your Segmentation Any Good?

Three checks, in priority order.

Actionability is what matters most. Show the segment profiles to your marketing team. For each pair of segments, ask: “Would you use a different message, channel, offer, or frequency?” If the answer is “I’d do the same thing for both,” the segmentation is not actionable, no matter how clean the math looks. The usual culprit is a segment that is average on everything; merge it with a neighbor or revisit your variable selection.

Stability. Re-run clustering on two non-overlapping time periods and check that customers mostly land in the same segments. The Adjusted Rand Index (ARI) puts a number on this; if too many customers swap segments between periods, you cannot rely on them for campaigns. The companion notebook shows the ARI computation alongside a migration-matrix view.

Segment profile heatmap. Each row is a segment, columns are base variables, colors show deviation from the population mean. Two rows that look nearly identical signal segments that should be merged.
Figure 4: Segment profile heatmap: rows are segments, columns are base variables, color is deviation from the mean.

Size. For broad campaign planning, each segment should usually cover at least 10 to 15 percent of the base. Niche but high-value segments can be exceptions. The practical range is 4 to 8 segments total.

Segments are built from behavioral differences, but the campaigns assigned to those segments still need to be tested. Refer back to A/B Test Design before declaring victory.

Method Cheat Sheet

Table 2: Choose the method by the question it answers, not by its sophistication.
Question Method Input Output
Which customers deserve more attention right now? Decile / RFM Transaction log Value tiers or R/F/M scores
What behavioral patterns exist beyond value tiers? K-Means Numeric behavioral features Behavioral segment labels
What do customers actually buy? Embedding-based clustering Customer documents from product descriptions Meaning-based segments with keyword fingerprints

Key Takeaways

  • Build segments from behavior, profile with demographics. Put behavioral variables into the algorithm; use profile variables afterward to describe, target, and reach the segments.
  • Each method answers a different question. RFM ranks customers by current value. K-Means uncovers behavioral patterns beyond value tiers. Embedding-based segmentation reveals what customers actually buy. Choose the method that matches your decision.
  • Validate first for actionability, then stability and size. If your marketing team would treat two segments the same way, merge them.