Propensity Score
Overview
In this post, I introduce the propensity score that is widespread in causality.
Note that this post is regarded as a refinement, re-management, and reminder of related knowledge. If you are interested in this topic, please find more details in the references.
This blog is my first post in English. I want to utilize the 5W2H method, one of the most efficient management tools. 5W2H includes What, Why, Where, When, Who, How, and How much.
Taking this post as an example, I’ll introduce what is the propensity score. Why is it important? Where can we use it? When is it originated? Who invented and developed the propensity score? And How can we use it?
I think that the 5W2H method helps understand a research problem or method.
Introduction
- What is the propensity score? The propensity score is the probability of treatment assignment conditional on observed covariates. The propensity is a
balancing scoreso that the distribution of observed covariates will be similar between treated and untreated groups. - Why is it important? && Where can we use it? In any applications that relate to observational studies instead of randomized control trials, we can calculate the propensity score and integrate derived works like IPS, IPTW to evaluate the effects of one intervention or treatment. For example, we may be curious about whether one treatment is helpful for smokers or whether a new recommender policy is better than the existing one.
- When is it originated? && Who invented and developed the propensity score? The propensity score was defined by Rubin in 1983 to be the probability of treatment assignment conditional on observed baseline covariates. We mainly started from the work by Austin in 2011. Follow-up works build on Rubin’s work and derive propensity score matching (Rubin 1985), inverse probability of treatment weighting (Austin 2015), Normalized Inverse Propensity Scoring (Tobias Schnabel 2016), Doubly robust methods (Heejung Bang 2005; Jonsson Funk 2011) and so on.
- How can we use it? This post introduces the propensity score matching method and presents a valid experiment to help understand its usage.
Background
It’s expected to figure out if a treatment has any effect on a specific outcome. Randomised controlled trials (RCTs) are considered as the gold standard approach for estimating the treatment effects. However, RCTs are time-consuming and sometimes immoral.
There is a growing interest in using observational (or nonrandomized) studies to estimate the effects of treatments on outcomes. In observational studies, treatment selection is often influenced by subject characteristics.
Let’s use some math. Given a binary treatment, each sample has a pair of potential outcomes:
We want the average treatment effect (ATE) as
Propensity Score Matching
The propensity score is the probability of treatment assignment conditional on observed covariates and can be defined as:
There are two assumptions for the unbiased estimator.
; no unmeasured confounders assumption: all variables that affect treatment assignment and outcome have been measured. ; every sample has a nonzero probability of receiving each treatment.
Recall that the propensity score is used to balance the distributions between treatments.
There are mainly four different propensity score methods:
- propensity score matching;
- Stratification on the propensity score;
- inverse probability of treatment weighting using the propensity score;
- covariate adjustment using the propensity score;
In this part, I introduce propensity score matching, especially 1:1 matching with replacement. You can find more details in ‘An introduction to propensity score methods for reducing the effects of confounding in observational studies.’.
I use the experiment from ‘Propensity Score Matching in Python’ to express the method.
The matching method includes:
- Calculate the propensity score based on observational data by using logistic regression.
- Use Nearest Neighbors to identify matching candidates. Then perform 1-to-1 matching by isolating/identifying groups of (T=1,T=0).
- For each treated sample, get the matching untreated sample from matching candidates. In this case, the number of the whole data reduces from 712 to 282 in the experiment.
- Calculate the average treatment effect with the matching dataset.
In the experiment, the author performs visualizing distribution to help understand the propensity score, such as:

References