1.2 The Data Landscape of Marketing
Marketing data is never just one table. It is a stack of sources, each capturing a different part of the customer journey. In this section, I will map that stack using three lenses: the funnel, the KPIs at each stage, and the datasets behind them.
The Funnel as a Measurability Map
Customers rarely buy on first exposure. They move — sometimes over months, sometimes minutes — from “not knowing the brand exists” to “buying” to “buying again.” The most common model of this journey is the funnel:
Awareness → Consideration → Conversion → Retention
The funnel is not useful because customers move through it in order. It is useful because it acts as a measurability map. As you move down the funnel, data gets more granular, measurement is more direct, signals come in faster, and experiments are easier to run.
This is why method choice depends on the business model. E-commerce and SaaS teams can often test behavior directly. CPG and brand advertisers have to infer upstream effects from signals that are noisier, slower, and more aggregated.
A common trap is the streetlight effect: focusing only on what is easy to measure. Performance channels can report ROAS every day, but brand campaigns cannot. The first step is to ask which part of the funnel a KPI, dataset, or method actually covers.
KPIs across the Funnel
Each funnel stage has its own set of KPIs. You do not need to memorize them, but you should be able to tell which stage a metric belongs to. That way, you can spot when a dashboard is mixing up different layers.
| Funnel Stage | Typical KPIs | What They Measure |
|---|---|---|
| Awareness | Reach, Impressions, Aided / Unaided Brand Awareness, Share of Voice | Whether people know you exist |
| Consideration | CTR, Site Visits, Time on Site, Search Volume, Branded Search Lift | Whether people are evaluating you |
| Conversion | CVR, CAC, AOV, ROAS | Whether people buy, and how efficiently |
| Retention | Repeat Rate, Churn Rate, Retention Curve, LTV / CLV, NPS | Whether people come back |
Typical Datasets in Marketing
The Data box in Figure 1 lists the main dataset categories that drive everything downstream. Here is what those categories look like in practice, and the pitfalls to watch for as a data scientist.
| Category | Examples | What a DS Newcomer Should Know |
|---|---|---|
| Customer / User | CRM, loyalty data, demographics, account / contract data | Schemas drift; definitions of “active user” change with org restructures. Identity resolution across IDs is harder than it looks. |
| Transaction | Purchase history, subscription / billing, cart, returns | Owned by EC and SaaS; only partially seen in CPG (POS via retailers). Granularity (line item vs order vs daily) matters for every downstream model. |
| Product / Pricing | SKU master, price lists, inventory, planograms, promotion calendars | SKU hierarchies are messy. Prices move with promotions, competitor moves, and seasons — a “price” column is rarely as stable as it looks. |
| Digital behavior | Clicks, page views, sessions, in-app events, email opens / clicks | Volume is huge, but explains only a small fraction of actual purchase variance. Easy to over-trust. |
| Media | Spend, impressions, GRP, creative metadata, campaign tags, email sends, push logs | Every platform reports differently. Attribution windows, “conversion” definitions, and viewability standards diverge. Joining across platforms is a recurring tax. |
| Survey | Brand tracking, NPS / CSAT, ad recall, intent surveys | Pre-aggregated and sample-based. Useful for awareness-stage signals you cannot get from logs. Watch for sampling and response bias. |
| Third-party | Panel data (NielsenIQ, Circana), syndicated category reports, search trends, weather, economic indicators, competitor data | Pre-aggregated and often delivered with a lag. Use for context, not for fine-grained causal claims. |
| Voice of customer | Reviews, support tickets, social mentions | Unstructured. Strong selection bias (only certain customers post). Useful for hypothesis generation, dangerous as a representative signal. |