Notebooks & Code
Notebooks & Code
Every hands-on chapter has a companion notebook, committed with its outputs — so you can read the code and see the figures and results right on GitHub, open it in Google Colab and run it yourself with zero local setup, or read a static render on nbviewer.
Colab is best-effort. The heavier notebooks (PyMC-Marketing, Google Meridian, CausalML, BERTopic) pull large dependencies and some need external datasets, so a Colab session may need extra pip installs or be slow. For a reliable run, follow the local setup. The CLV chapter also ships a three-step LightGBM companion pipeline under sec4.2-clv/lightgbm-companion/ (01_data_features.py, 02_modeling.py, 03_evaluation.py), linked directly from that chapter.
Status legend. Done — ready to read as-is, outputs match the chapter prose. Almost Done — published output is usable as a draft; minor caveats (noted below each table) remain. WIP — a re-execution, dependency, or chapter-prose alignment is still unresolved, so treat the current output as a preview only.
Part 3 — Causal Inference for Marketing
| Notebook | Topic | Status | Run |
|---|---|---|---|
simple_criteo_meta_learners_econml.ipynb |
Meta-learners, decile targeting, and uplift / Qini diagnostics (EconML, DoWhy, scikit-uplift) | Done | Colab · nbviewer · GitHub |
The meta-learners notebook is executed against the real Criteo 10 % subset (~1.4 M samples), so CATE distributions, decile targeting, and the uplift / Qini diagnostics in §10 all reflect production-grade data. §3.5 (Uplift Modeling for Targeting) explains the conceptual framing — four customer types, uplift-tree splitting criterion, AUUC / Qini — and reuses this same notebook for the runnable workflow.
Part 4 — Customer Analytics
| Notebook | Topic | Status | Run |
|---|---|---|---|
traditional_segmentation.ipynb |
Segmentation — decile / RFM / K-Means (scikit-learn) | Done | Colab · nbviewer · GitHub |
semantic_segmentation.ipynb |
Segmentation — embedding / BERTopic (BERTopic, sentence-transformers) | Done | Colab · nbviewer · GitHub |
PyMC_Marketing_CLV_demo.ipynb |
Customer lifetime value (PyMC-Marketing) | Done | Colab · nbviewer · GitHub |
traditional_segmentation agrees with the chapter (silhouette and elbow both point to K=3) and ships with per-cluster action tiers and a CRM-ready export schema. semantic_segmentation runs the LLM-naming cell against a real API in the published outputs; in the marketing-science conda env litellm and anthropic are pre-pinned, while Colab users still need to pip install litellm and set OPENAI_API_KEY or ANTHROPIC_API_KEY to reproduce the naming step. PyMC_Marketing_CLV_demo fits BG/NBD and Gamma-Gamma with NUTS MCMC, exports per-customer revenue and profit CLV at 3-/5-/10-year horizons with HDI bounds, a behavioral-segment join from sec4.1, an allowable-CPA column with a conservative (HDI 3%) variant, a 30-day P(alive)-drop churn watchlist, and a 180-day holdout calibration table (Pearson r ≈ 0.80, cohort revenue ratio 0.92 — inside the chapter’s ±20% retrain tolerance). The 18 high-frequency customers whose 10-year DCF posterior becomes numerically unstable are flagged clv_estimation_status="nonfinite_review" rather than silently zeroed.
Part 5 — Commercial Analytics
| Notebook | Topic | Status | Run |
|---|---|---|---|
price_elasticity_scanner_data.ipynb |
Price elasticity from scanner data (statsmodels) | Done | Colab · nbviewer · GitHub |
assortment_optimization.ipynb |
Assortment optimization — revenue + basket-reach ABC, substitution, matched-store DiD (statsmodels) | Done | Colab · nbviewer · GitHub |
demand_forecasting.ipynb |
Demand forecasting — models, evaluation & hierarchical reconciliation (Nixtla) | Done | Colab · nbviewer · GitHub |
Chapter 5.2 (Product Assortment Optimization) ships both the runnable notebook above and the companion assortment_optimization.py script, linked from that chapter.
Part 6 — Media Investment and Optimization
| Notebook | Topic | Status | Run |
|---|---|---|---|
mmm_end_to_end_demo.ipynb |
Media mix modeling (Google Meridian) | Done | Colab · nbviewer · GitHub |
The MMM demo is now end-to-end re-executable: TF 2.19 / TFP 0.25 / Meridian 1.6 are pinned in environment.yml, the published outputs use the current synthetic-data CSV with positive ground-truth ROI for all five channels, the pre-fit identifiability check from §6.3’s Reliability Checklist runs in Step 2b (history, OFF rate, share std, channel correlations) with concrete PASS/WARN/FAIL verdicts, a random 25% week-level holdout produces in-sample R² ≈ 0.95 and out-of-sample R² ≈ 0.94, and the optimizer’s reallocation table (channel-level deltas plus expected lift ≈ +7%) lands in the notebook itself. The uncalibrated 90% credible interval contains the true ROAS for four of the five channels; the fifth (Google Search) misses by a hair, and the notebook then runs the full §6.3 calibration loop on Google: convert a hypothetical geo-lift DID into a tight LogNormal prior, refit the model, compare before/after, and re-optimize from the calibrated posterior. After calibration, all five channels’ 90% CIs contain the true ROAS, the other four channels are essentially unchanged, and Google’s posterior mean shrinks from 0.957 (5× truth) to 0.144 (close to truth 0.18). The calibrated optimizer recovers Google’s true ROAS exactly (0.18) and shifts the TV/CTV recommendation from -13% to -19% while easing TikTok’s cut from -15% to -11% — calibration changes both the estimate and the budget recommendation it drives. A credible interval guardrail cell converts each channel’s ROI posterior into an act/hold rule on the uncalibrated fit; the resulting planner-ready move is much smaller than the optimizer’s raw point-estimate recommendation. The §6.3 worked TikTok example (mu ≈ 1.42, sigma ≈ 0.81) is reproduced byte-for-byte via an embedded assert. Local runtime on CPU is roughly ten minutes with the demo budget (4 chains × 1000 keep, two MCMC passes share an XLA cache); production should still use 10 chains × 2000 keep.