Propensity Score

Overview

In this post, I introduce the propensity score that is widespread in causality.

Note that this post is regarded as a refinement, re-management, and reminder of related knowledge. If you are interested in this topic, please find more details in the references.

This blog is my first post in English. I want to utilize the 5W2H method, one of the most efficient management tools. 5W2H includes What, Why, Where, When, Who, How, and How much.

Taking this post as an example, I’ll introduce what is the propensity score. Why is it important? Where can we use it? When is it originated? Who invented and developed the propensity score? And How can we use it?

I think that the 5W2H method helps understand a research problem or method.

Introduction

  1. What is the propensity score? The propensity score is the probability of treatment assignment conditional on observed covariates. The propensity is a balancing score so that the distribution of observed covariates will be similar between treated and untreated groups.
  2. Why is it important? && Where can we use it? In any applications that relate to observational studies instead of randomized control trials, we can calculate the propensity score and integrate derived works like IPS, IPTW to evaluate the effects of one intervention or treatment. For example, we may be curious about whether one treatment is helpful for smokers or whether a new recommender policy is better than the existing one.
  3. When is it originated? && Who invented and developed the propensity score? The propensity score was defined by Rubin in 1983 to be the probability of treatment assignment conditional on observed baseline covariates. We mainly started from the work by Austin in 2011. Follow-up works build on Rubin’s work and derive propensity score matching (Rubin 1985), inverse probability of treatment weighting (Austin 2015), Normalized Inverse Propensity Scoring (Tobias Schnabel 2016), Doubly robust methods (Heejung Bang 2005; Jonsson Funk 2011) and so on.
  4. How can we use it? This post introduces the propensity score matching method and presents a valid experiment to help understand its usage.

Background

It’s expected to figure out if a treatment has any effect on a specific outcome. Randomised controlled trials (RCTs) are considered as the gold standard approach for estimating the treatment effects. However, RCTs are time-consuming and sometimes immoral.

There is a growing interest in using observational (or nonrandomized) studies to estimate the effects of treatments on outcomes. In observational studies, treatment selection is often influenced by subject characteristics.

Let’s use some math. Given a binary treatment, each sample has a pair of potential outcomes: Yi(0) and Yi(1). And we use Z as an indicator denoting the treatment received (Z=0 for the control group and Z=1 for the treated group).

We want the average treatment effect (ATE) as E[Yi(1)Yi(0)]. In RCTs, due to randomized assignment, we can get unbiased estimator of the ATE by E[Yi(1)Yi(0)]=E[Yi(1)]E[Yi(0)]. So it’s easy to answer the ATE with RCTs. However, we get data E[Y|Z]E[Y] in observational studies. We cannot get an unbiased estimation for ATE. In observational studies, duo the user’s selection bias, the distribution in the treated group is different from the control group. That’s why the propensity score is proposed. We’ll use the propensity score to balance the two distributions.


Propensity Score Matching

The propensity score is the probability of treatment assignment conditional on observed covariates and can be defined as: ei=Pr(Zi=1|Xi) Usually, we can estimate the propensity score by using a logistic regression model. And the use of bagging or boosting and neural networks are also examined.

There are two assumptions for the unbiased estimator.

  1. (Y(1),Y(0))Z|X; no unmeasured confounders assumption: all variables that affect treatment assignment and outcome have been measured.
  2. 0<P(Z=1|X)<1; every sample has a nonzero probability of receiving each treatment.

Recall that the propensity score is used to balance the distributions between treatments.

There are mainly four different propensity score methods:

  • propensity score matching;
  • Stratification on the propensity score;
  • inverse probability of treatment weighting using the propensity score;
  • covariate adjustment using the propensity score;

In this part, I introduce propensity score matching, especially 1:1 matching with replacement. You can find more details in ‘An introduction to propensity score methods for reducing the effects of confounding in observational studies.’.

I use the experiment from ‘Propensity Score Matching in Python’ to express the method.

The matching method includes:

  1. Calculate the propensity score based on observational data by using logistic regression.
  2. Use Nearest Neighbors to identify matching candidates. Then perform 1-to-1 matching by isolating/identifying groups of (T=1,T=0).
  3. For each treated sample, get the matching untreated sample from matching candidates. In this case, the number of the whole data reduces from 712 to 282 in the experiment.
  4. Calculate the average treatment effect with the matching dataset.

In the experiment, the author performs visualizing distribution to help understand the propensity score, such as:

https://tva1.sinaimg.cn/large/e6c9d24egy1gzu2qlvdv8j218c0u0q5q.jpg

References

Guanglin Zhou
Guanglin Zhou
PhD Candidate at School of Computer Science and Engineering

My research interests include Causal Representation Learning, Reinforcement Learning and Recommendation Systems.