Conditional Exchangeability

Uriah Finkel

Causal Question

What is the Average Treatment Effect of Smoking 🚬 (vs not smoking 🚭) for Lung Cancer πŸ¦€?

Need for confounder adjustment:

  1. Parent that smokes 😀 (unlike non-smoker parent 😐) increases the probability of a child that smokes 🚬.

  2. Parent that smokes 😀 increases the probability of a lung cancer for a child through passive smoking.

πŸ˜‡ God-Given Counterfactual Data

  • By God-Given knowledge we have access to the true counterfactual outcomes for each individual for smoking 🚬 and for not smoking 🚭

πŸ˜‡ God-Given Counterfactual Data

  • By God-Given knowledge we have access to the true counterfactual outcomes for each individual for smoking 🚬 and for not smoking 🚭

  • We can also calculate the Causal Individual Treatment Effect and the true Causal Average Treatment Effect:

\(\text{Causal ATE} = \frac{11}{16} - \frac{7}{16} = \frac{4}{16} = 0.25\)

πŸ˜‡ God-Given Counterfactual Data

  • By God-Given knowledge we have access to the true counterfactual outcomes for each individual for smoking 🚬 and for not smoking 🚭

  • We can also calculate the Causal Individual Treatment Effect and the true Causal Average Treatment Effect:

\(\text{Causal ATE} = \frac{11}{16} - \frac{7}{16} = \frac{4}{16} = 0.25\)

  • The observed outcome is the suitable counterfactual outcome to the given treatment.

😈 Observed Data

  • By God-Given knowledge we have access to the true counterfactual outcomes for each individual for smoking 🚬 and for not smoking 🚭

  • We can also calculate the Causal Individual Treatment Effect and the true Causal Average Treatment Effect:

\(\text{Causal ATE} = \frac{11}{16} - \frac{7}{16} = \frac{4}{16} = 0.25\)

  • The observed outcome is the suitable counterfactual outcome to the given treatment.

  • In real life we can analyse only the observed outcomes.

The naive Associational Average Treatment Effect will be:

\(\text{Associational ATE} = \frac{7}{10} - \frac{2}{6} \approx 0.366\)

Which is obviously biased.

First Solution: IPW βš–οΈ

IPW βš–οΈ

  • IPW is a method for manipulating the sample while creating pseudo-population. Therefore, we don’t need the outcome for the first step.

IPW βš–οΈ

  • IPW is a method for manipulating the sample while creating pseudo-population. Therefore, we don’t need the outcome for the first step.

  • But we do need to estimate the probability of having a treatment for each individual given the confounders.

\(p(A|X)\)

On the non-parametric case it’s just the proportions of the treated patients from the subpopulation \(X=x\)

IPW βš–οΈ

  • IPW is a method for manipulating the sample while creating pseudo-population. Therefore, we don’t need the outcome for the first step.

  • But we do need to estimate the probability of having a treatment for each individual given the confounders.

\(p(A = 0|X = 0) = \frac{4}{6}\)

IPW βš–οΈ

  • IPW is a method for manipulating the sample while creating pseudo-population. Therefore, we don’t need the outcome for the first step.

  • But we do need to estimate the probability of having a treatment for each individual given the confounders.

\(p(A = 0|X = 0) = \frac{4}{6}\)

\(p(A = 1|X = 0) = \frac{2}{6}\)

IPW βš–οΈ

  • IPW is a method for manipulating the sample while creating pseudo-population. Therefore, we don’t need the outcome for the first step.

  • But we do need to estimate the probability of having a treatment for each individual given the confounders.

\(p(A = 0|X = 0) = \frac{4}{6}\)

\(p(A = 1|X = 0) = \frac{2}{6}\)

\(p(A = 0|X = 1) = \frac{2}{10}\)

IPW βš–οΈ

  • IPW is a method for manipulating the sample while creating pseudo-population. Therefore, we don’t need the outcome for the first step.

  • But we do need to estimate the probability of having a treatment for each individual given the confounders.

\(p(A = 0|X = 0) = \frac{4}{6}\)

\(p(A = 1|X = 0) = \frac{2}{6}\)

\(p(A = 0|X = 1) = \frac{2}{10}\)

\(p(A = 1|X = 1) = \frac{8}{10}\)

Weights for Non-Smoking Parents 😐

Weights for Non-Smoking Parents 😐

Weights for Smoking Parents 😀

Weights for Smoking Parents 😀

Calculate estimate for Causal ATE βš–οΈ

Non-Smokers outcome proportion (Psuedo-population)
  • 6.5 cases (half a person is not a problem in pseudo population) of lung out of 16.

Calculate estimate for Causal ATE βš–οΈ

Non-Smokers outcome proportion (Psuedo-population)
  • 6.5 cases (half a person is not a problem in pseudo population) of lung out of 16.
Smokers Outcome Proporiton (Psuedo-population)
  • 10.5 cases out of 16.

Calculate estimate for Causal ATE βš–οΈ

Non-Smokers outcome proportion (Psuedo-population)
  • 6.5 cases (half a person is not a problem in pseudo population) of lung out of 16.
Smokers Outcome Proporiton (Psuedo-population)
  • 10.5 cases out of 16.
Estimate of Causal Average Treatment Effect

\(\hat{ATE} = \frac{10.5}{16} - \frac{6.5}{16} = 0.25\)

Second Solution: Standardization πŸ—ƒ

Standardization πŸ—ƒ

  • The marginal counterfactual risk \(Pr[Y^{a=1}]\) is the weighted average of the stratum-specific risks \(Pr[Y^{a=1}|X =0]\) and \(Pr[Y^{a=1}|X =1]\), in other words:

\(Pr[Y^{a=1}] = \sum_{X=x}Pr[Y^{a=1}|X =x]*Pr(X=x)\)

  • Under conditional exchangeability:

\(\hat{Pr[Y^{a=1}]} = \sum_{X=x}Pr[Y=1|X =x, A=1]*Pr(X=x)\)

  • Therefore we can estimate directly the estimate for the average treatment effect if we will use estimates for every stratum of the confounders (X) and the Treatment (A).

  • The same goes for:

\(\hat{Pr[Y^{a=0}]} = \sum_{X=x}Pr[Y=1|X =x, A=0]*Pr(X=x)\)

Standardization πŸ—ƒ

\(\hat{Pr[Y^{a=1}]} = \frac{1}{2}*\frac{6}{16} + \frac{6}{8}*\frac{10}{16} = 0.625\)

\(\hat{Pr[Y^{a=0}]} = \frac{1}{4}*\frac{6}{16} + \frac{1}{2}*\frac{10}{16} = 0.425\)

Estimate of Causal Average Treatment Effect

\(\hat{ATE} = \frac{10.5}{16} - \frac{6.5}{16} = 0.25\)

Parametric Versions

  • Parametric versions of IPW and Standardization will yield different estimates unlike the Non-Parametric Versions.

  • For IPW: We can use ML for propensity! πŸ€–

  • For Standardization: We can use ML πŸ€– (T/S/X Learners) or old-school outcome regressions πŸ‘΄ (just don’t use colliders).

Benefits

  • IPW: Model for the treatment without overfitting the for the outcome.

  • Standardization: CATE / ITE.

  • Doubly Robust: Estimates are consistent even if we are wrong about treatment model or outcome model.

Check out our paper!