Propensity Model Or Lookalike Model, That Is The Question

This article introduces two typical targeting models to identifying ideal customer segmentation. Put the concept aside, they are defined by different underlying assumptions and objectives in marketing practice.

Before start a new marketing campaign, advertisers need to figure out media budget, targeting audience, ads platform and ads creatives, etc. With all above factors in mind, the advertiser have to understand the likelihood of meeting ROI goal by the campaign. This process usually involves analytics team and data scientist. Sometimes there are historical customer and transaction data for similar marketing campaigns or products in database and sometimes there are not. In both cases, data scientist can build a predictive model to answer two fundamental questions that the advertiser want to know. First, what customer segment should be targeted? Second, what is the expected ROI by the campaign?

Targeting: Lookalike Model vs Propensity Model

A best practice approach is illustrated as follows.

Step 1: Are quality historical data available?

If answer is Yes:

Collect all customer profile and historical transaction data for similar products. Pre-process the data and be ready to build a predictive model in step 2.


Need a “prior” estimate based on the practitioner’s domain knowledge and experience or take a Bayesian perspective.

Step 2: Is there tight deadline for building a model?

If answer is Yes, within 1–2 month.

Go with Lookalike model. Set up a reasonable goal to find out the top 10 percentile of audience with x factor of higher response rate than the lower 50 percentile audience. It could be already a big success by implementing the result in the campaign.


With another 1–2 months, a Propensity model can be further developed for measuring the causality of the campaign. In other words, it’s feasible to predict incremental response rate driven by the campaign and not by other factors.

Step 3: Not enough?

If answer is No:

Propensity model plus Lookalike model sounds perfect in theory but in practice a compromise almost always is necessary between time and effort, and prediction accuracy. Not surprised, a “perfect” model can be imperfect as situations change. Underlying assumptions are not held in new market dynamics. Revise the assumption and update the model with new data are useful measures to take for a better model in following iterations.


Lookalike model is often good enough in most cases.

Some candidate models are logistic regression, decision tree, or neutral network. When building Propensity model, there are some statistical tricks (matching) to adjust self-selection bias and help to extract causality rather than correlation. Another advice is to choose a better model when you need to score new audience. Tree based model has some advantages over other machine learning models with respect to interpretability and scoring.

For more hands-on practice, please go to my Github repo.