6 Introduction to causal mediation analysis
In this session we will go over fundamental concepts of causal inference in general and then focus more specifically on causal mediation analysis.
6.1 What is a cause?
We employ causal inference constantly in our daily lives. Consider a simple example: when you press a light switch, you expect the light to turn on. Or in another scenario, you anticipate that performing action A will result in outcome Y. In some instances, we can identify deterministic relationships where action A will lead to outcome Y. While these straightforward cause-and-effect relationships are intuitive and help us navigate the world, they become significantly more complex when we explore health outcomes.
A = received treatment/intervention/exposure (e.g., 1 = intervention, 0 = no intervention)
Y = observed outcome (e.g., 1 = developed the outcome, 0 = no outcome)
\(Y^{a=1}\) = Counterfactual outcome under treatment a=1 (i.e., the outcome had everyone, counter to the fact, received treatment a = 1)
\(Y^{a=0}\) = Counterfactual outcome under treatment a =0 (i.e., the outcome had everyone, counter to the fact, received treatment a = 0)
6.1.1 Individual causal effect
When investigating health outcomes, we would ideally want to know if you do X, then Y will happen. We could have a specific question:
Will eating more red meat give me higher blood glucose in 1 year?
To answer this question we would ideally have you consume more red meat over 1 year and measure your blood glucose levels. Then, we would turn back time, and make you eat something else over 1 year and then measure your blood glucose levels again. If there is a difference between your two outcomes, then we say there is a causal effect.
But we can never do this in the real world.
6.1.2 Average causal effect
Instead, we can perform a randomized controlled trial. We now ask a slightly different question:
Will eating more red meat give adults higher blood glucose in 1 year?
We can randomely assigning one group to consume more red meat and the other group to consume more of something else over 1 year. Then we compare the average blood glucose levels after 1 year in each of the groups. If there is a difference, we could say there is an average causal effect.
We now modify the terms a little.
\(E[Y^{a=1}]\) the average counterfactual outcome, had all subjects in the population received treatment a = 1.
\(Pr[Y^{a=1}]\) the proportion of subjects that would have developed the outcome Y had all subjects in the population of interest received treatment a = 1.
\(E[Y^{a=0}]\) the average counterfactual outcome, had all subjects in the population received no treatment a = 0.
\(Pr[Y^{a=0}]\) the proportion of subjects that would have developed the outcome Y had all subjects in the population of interest received no treatment a = 0.
6.1.3 Definition of a causal effect
More formally we can now define a causal effect (1):
\(E[Y^{a=1} = 1] - E[Y^{a=0} = 1] \ne 0\)
6.2 Association vs causation
What makes it complicated to estimate a causal effect is that we cannot observe the outcome under different treatments.
When we only have a subset of the outcomes, we have an association. This is illustrated in Figure 6.1.

{#fig-causation-association}
If we want to infer a causal effect (i.e., what would have happened, had everyone done A=1 vs A=0), we need three assumptions to be fulfilled:
Exchangeability
Consistency
Positivity
Causal Assumptions- Exchangeability
The risk of the outcome in A = 1 would have been the same as the risk of the outcome in A = 0, had those in A = 1 received A = 0.
Think about a randomized trial where you, by mistake, give the intervention to the other group. The effect should be the same as the one you would have observed had the groups been correct.
We can also have conditional exchangeability. The risk is similar in subsets of the population. That could be within the levels of a variable W (e.g., education).
- Consistency
The treatments under comparison are well-defined and correspond to the versions of treatment observed in the data:
- Precise definition of \(Y^a\) via a
- Link counterfactual outcomes with observed outcomes
\(Y^a = Y\) for every individual with A = a.
The observed outcome for all treated equals the outcome if they had received the treatment.
- Positivity
The probability of receiving every value of treatment conditional on L is greater than zero. In other words, there must be a probability of being assigned to each treatment level.
Image if we don’t have anyone with A = 0 among those with L=1 (e.g. long education), then we cannot estimate the conditional probability of the outcome.
6.3 Causal mediation analysis
- Non-linearity
- Interactions
- Confounding
- Multiple mediators
Causal mediation analysis is an extension of the traditional approach by:
- Outlining all confounding assumptions needed
- Handling non-linearity and interaction
- Clearly defining estimands of interest
6.4 Confounding assumptions
By using DAGs, the assumptions about confounding are made much more explicit. DAG under no mediator-outcome relation affected by treatment:
We can see that we not only have to take confounding between the treatment and outcome into account, but we also have to take mediator-outcome (\(A \leftarrow W \rightarrow Y\)) confounding into account.
In addition, we can have a more complicated situation where the treatment also impacts another mediator that is also a mediator-outcome confounder.
Below DAG shows where the mediator-outcome relation is affected by treatment:
From the DAG rules, we have a special problem that we cannot solve with traditional regression approaches. If we adjust for W2 we open a backdoor path by adjusting for the collider \(W1 \rightarrow W2 \leftarrow A\). We will work on how to solve this problem later in the course.
6.5 Non-linearity and interactions
Neither the product method nor the difference method in traditional mediation analysis can effectively account for interaction effects or non-linear relationships between variables.
Causal mediation analysis addresses these limitations through more sophisticated approaches. Specifically, it can:
- Utilize regression-based methods and alternative causal inference techniques like g-computation
- Fundamentally differ from traditional approaches by:
- Constructing a comprehensive causal model that explicitly incorporates non-linear relationships and interaction effects
- Artificially manipulating the data to set the treatment and the mediator to certain values
- Predicting outcomes using the developed causal model
- Comparing predicted outcomes to understand causal mechanisms
Note: additional approaches also exists, but we will not focus on these in this course.
6.6 Causal mediation estimation
Counterfactual-based direct and indirect effects can be estimated from regression models, provided that the no confounding assumption hold. To be able to do so, we need a model for the mediator and a model for the outcome.
Model for the mediator:
\(E(M|A = a, W = w) = \beta_0 + \beta_1a + \beta'_2w\)
Model for the outcome:
\(E(Y|A = a, M = m, W = w) = \sigma_0 + \sigma_1a + \sigma_2m + \sigma_3am + \sigma'_4w\)
From these two regression models we can estimate the CDE, NDE and NIE.
6.7 Example
The example here include a continuous treatment, mediator and outcome. We use these equations on an additive scale because it is easier to explain the intuition behind the estimands. Similar equations exist for other types and combinations of variables.
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# This includes readr, or use library(readr) specifically
<- read_csv(here::here("data/NHANES.csv")) nhanes
Rows: 10000 Columns: 76
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (31): SurveyYr, Gender, AgeDecade, Race1, Race3, Education, MaritalStatu...
dbl (45): ID, Age, AgeMonths, HHIncomeMid, Poverty, HomeRooms, Weight, Lengt...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# keep the variables used in the analyses
<- nhanes %>%
data ::select(ID,
dplyrw1 = Age,
w2 = Gender,
w3 = Education,
w4 = Smoke100,
a = PhysActive, # this is the exposure
m = BMI, # this is the mediator
y = Pulse, # this is the outcome
y2 = Diabetes ) %>%
na.omit()
We can now run a model for the mediator only adjusting for a single variable. In practice you would adjust for many more variables to satisfy the confounding assumptions.
<- lm(m ~ a + w1 , data = data)
lm_m summary(lm_m)
Call:
lm(formula = m ~ a + w1, data = data)
Residuals:
Min 1Q Median 3Q Max
-14.741 -4.578 -1.086 3.421 51.417
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.440321 0.267707 109.972 <2e-16 ***
aYes -1.883085 0.162184 -11.611 <2e-16 ***
w1 0.007546 0.004793 1.575 0.115
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.627 on 6913 degrees of freedom
Multiple R-squared: 0.02101, Adjusted R-squared: 0.02073
F-statistic: 74.19 on 2 and 6913 DF, p-value: < 2.2e-16
Here, we see that physical activity (a) is associated with lower pulse (y).
Now, we run the model for the outcome, notice we have included an interaction term.
<- lm(y ~ a + m + a:m + w1 , data = data)
lm_y summary(lm_y)
Call:
lm(formula = y ~ a + m + a:m + w1, data = data)
Residuals:
Min 1Q Median 3Q Max
-32.464 -8.254 -1.007 7.254 68.847
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 75.200976 0.954088 78.820 < 2e-16 ***
aYes -5.625108 1.252664 -4.491 7.22e-06 ***
m 0.143529 0.028201 5.090 3.68e-07 ***
w1 -0.122102 0.008402 -14.533 < 2e-16 ***
aYes:m 0.125536 0.042449 2.957 0.00311 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11.61 on 6911 degrees of freedom
Multiple R-squared: 0.04701, Adjusted R-squared: 0.04646
F-statistic: 85.23 on 4 and 6911 DF, p-value: < 2.2e-16
In this model we can see:
higher physical activity (a) decreases pulse, independent of the mediator (BMI)
excess body weight (BMI) is associated with higher pulse, independent of physical activity
there is statistically significant interaction between physical activity (a) and BMI (m)
Based on the coefficients of these two models, we can now estimate the different effects.
The equations are from (2) and (3).
6.7.1 Controlled direct effect
The controlled direct effect can be obtained using this formula based on the regression coefficients:
\(CDE(m) = (\sigma_1 + \sigma_3*m)(a - a^*)\)
\(a^*\) is for a change from level \(a^*\) (=control) to level \(a\) (=intervention).
For the controlled direct effect we set m to a specific value.
Let’s look at the distribution of m (BMI).
%>%
data select(m) %>%
summary()
m
Min. :15.02
1st Qu.:24.10
Median :27.80
Mean :28.80
3rd Qu.:32.22
Max. :81.25
<- round(median(data$m), digits = 1) median_m
In this example we will set it to the median of BMI, at 27.8 which is a low level. You can also try to set BMI to 25kg/m2.
<- (lm_y$coefficients[2] + lm_y$coefficients[5] * median_m) * (1 - 0)
CDE_m round(as.numeric(CDE_m), digits = 2)
[1] -2.14
The CDE is how much the outcome, here the pulse, would change on average if the mediator, here BMI, were fixed at level (m = 27.8) uniformly in the population but the treatment, physical activity, was changed from 0 to 1.
The CDE answer the question, what would be the effect of A on Y, when fixing M at a specific value for everyone in the population (4)?
6.7.2 (Pure) natural direct effect
PNDE can be obtained using this formula:
\(PNDE = (\sigma_1 + \sigma_3 * (\beta_0 + \beta_1*a^* + \beta'_2*w))(a - a^*)\)
We set the value for the confounder w1 for the interaction to be the mean value of w1.
<- mean(data$w1)
mean_w1 mean_w1
[1] 47.17467
Then we calculate the PNDE.
<- (lm_y$coefficients[2] + (lm_y$coefficients[5] * (lm_m$coefficients[1] + lm_m$coefficients[2] * 0 + lm_m$coefficients[3] * mean_w1))) * (1 - 0)
PNDE round(as.numeric(PNDE), digits = 2)
[1] -1.88
The PNDE is how much the outcome, pulse, would change if the treatment a, physical activity, was set at 1 versus 0 but for each individual the mediator was kept at the level it would have taken, for that individual, in the absence of the exposure.
how pulse rate would change if the treatment (or exposure) physically active (a=1) was set at 1 versus 0, while for each individual the mediator (BMI) was kept at the level it would naturally be in the absense of physical activity(a=0).
In other words, it measures the direct effect of physical activity on pulse rate that doesn’t work through changing BMI.
6.7.3 (Total) natural direct effect
TNDE can be obtained by this formula:
\(TNDE = (\sigma_1 + \sigma_3 * (\beta_0 + beta_1*a + \beta'_2*w))(a - a^*)\)
<- (lm_y$coefficients[2] + (lm_y$coefficients[5] * (lm_m$coefficients[1] + lm_m$coefficients[2] * 1 + lm_m$coefficients[3] * mean_w1))) * (1 - 0)
TNDE
round(as.numeric(TNDE), digits = 2)
[1] -2.12
Here, the value of M is enabled to act (as opposed to the PNDE). Here we see a little lower pulse rate, this means we’re measuring the direct effect of physical activity on pulse rate while allowing BMI to be at the level it would naturally be with physical activity (a=1).
The TNDE asks the question: “to what extent does physical activity cause lower pulse via pathways other than through BMI, allowing BMI to boost up or tune down such effect at the same time?” (4)
6.7.4 Total natural indirect effect
The TNIE can be obtained by this formula:
\(TNIE = (\sigma_2 * \beta_1 + \sigma_3 * \beta_1 * a)(a - a^*)\)
<- ((lm_y$coefficients[3] * lm_m$coefficients[2]) + (lm_y$coefficients[5] * lm_m$coefficients[2] * 1)) * (1 - 0)
TNIE
round(as.numeric(TNIE), digits = 2)
[1] -0.51
The TNIE is how much the outcome would change on average if the treatment was fixed at level a = 1 but the mediator was changed from the level it would take if a* = 0 (physical inactive) to the level it would take if a =1 (physical active).
Note that exposure has to have an effect on M otherwise this will be zero.
The TNIE asks the question: to what extent does physical activity cause pulse via BMI and the possible interaction between physical activity and BMI in affecting pulse rate? In other words, the effect of exposure that ‘would be prevented if the exposure did not cause the mediator’ (i.e., the portion of the effect for which mediation is ‘necessary’) (4).
This is often the effect we are interested in in biomedical research for questions regarding mediation.
6.7.5 Pure natural indirect effect
The PNIE can be obtained by this formula:
\(PNIE = (\sigma_2 * \beta_1 + \sigma_3 * \beta_1 * a^*)(a - a^*)\)
<- ((lm_y$coefficients[3] * lm_m$coefficients[2]) + (lm_y$coefficients[5] * lm_m$coefficients[2] * 0)) * (1 - 0)
PNIE
round(as.numeric(PNIE), digits = 2)
[1] -0.27
The PNIE is different from the TNIE because it does not include the interaction effect. We estimate the effect of red meat on inflammation and then subsequent effect of inflammation on blood glucose.
The PNIE answer the question: “to what extent does physical activity cause lower pulse rate via BMI only (i.e., due to physical activity affecting BMI and subsequently, BMI affecting pulse rate), not accounting for the possible interaction between physical activity and BMI? In other words, the effect that the exposure would have had if ‘its only action were to cause the mediator’ (i.e., the portion of the effect for which mediation is ‘’sufficient’’)” (4).
6.7.6 Total effect
The total effect can be decomposed as:
\(TE = PNDE + TNIE\)
<- PNDE + TNIE
TE
round(as.numeric(TE), digits = 2)
[1] -2.39
This is the overall effect of red meat on blood glucose levels. Higher red meat is associated with higher blood glucose.
6.7.7 Proportion mediation
From this, we can calculate the proportion mediated.
\(PM = \frac{TNIE}{TE}\)
<- TNIE / TE
PM
as.numeric(PM) * 100
[1] 21.18849
21% of the association between physical activity and pulse rate is mediated by the mediator BMI.
6.8 Using R package
We can easily conduct mediation analysis using R package once you understand the basic concepts of mediation.
library(CMAverse)
<- cmest(
res_rb data = data,
model = "rb", #rb means regression based
outcome = "y", #this is the name of outcome
exposure = "a", #this is the name of exposure
mediator = "m", #this is the name of mediator
basec = c("w1"), #this is confounding factors, you can add more confounders here
EMint = TRUE, #whether include interaction, TRUE indicates to include
mreg = list("linear"), #specify the mediator model, we use linear regression here
yreg = "linear", #specify the outcome model, we use linear regression here
astar = 0, # the level of exposure (not exposed)
a = 1, # the level of exposure (exposed)
mval = list(27.8), #specifiy the value of controlled mediator
estimation = "paramfunc", #parameteric function
inference = "delta"
)
Warning in estinf(): a is not a value of the exposure; Yes is used
Warning in estinf(): astar is not a value of the exposure; No is used
summary(res_rb)
Causal Mediation Analysis
# Outcome regression:
Call:
glm(formula = y ~ a + m + a * m + w1, family = gaussian(), data = getCall(x$reg.output$yreg)$data,
weights = getCall(x$reg.output$yreg)$weights)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 75.200976 0.954088 78.820 < 2e-16 ***
aYes -5.625108 1.252664 -4.491 7.22e-06 ***
m 0.143529 0.028201 5.090 3.68e-07 ***
w1 -0.122102 0.008402 -14.533 < 2e-16 ***
aYes:m 0.125536 0.042449 2.957 0.00311 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 134.882)
Null deviance: 978151 on 6915 degrees of freedom
Residual deviance: 932170 on 6911 degrees of freedom
AIC: 53553
Number of Fisher Scoring iterations: 2
# Mediator regressions:
Call:
glm(formula = m ~ a + w1, family = gaussian(), data = getCall(x$reg.output$mreg[[1L]])$data,
weights = getCall(x$reg.output$mreg[[1L]])$weights)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 29.440321 0.267707 109.972 <2e-16 ***
aYes -1.883085 0.162184 -11.611 <2e-16 ***
w1 0.007546 0.004793 1.575 0.115
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 43.91474)
Null deviance: 310099 on 6915 degrees of freedom
Residual deviance: 303583 on 6913 degrees of freedom
AIC: 45790
Number of Fisher Scoring iterations: 2
# Effect decomposition on the mean difference scale via the regression-based approach
Closed-form parameter function estimation with
delta method standard errors, confidence intervals and p-values
Estimate Std.error 95% CIL 95% CIU P.val
cde -2.135202 0.289670 -2.702945 -1.567 1.69e-13 ***
pnde -1.884592 0.290945 -2.454834 -1.314 9.33e-11 ***
tnde -2.120987 0.289390 -2.688180 -1.554 2.32e-13 ***
pnie -0.270278 0.057982 -0.383921 -0.157 3.14e-06 ***
tnie -0.506673 0.073986 -0.651682 -0.362 7.48e-12 ***
te -2.391265 0.286308 -2.952417 -1.830 < 2e-16 ***
intref 0.250610 0.086005 0.082043 0.419 0.003570 **
intmed -0.236395 0.082486 -0.398066 -0.075 0.004159 **
cde(prop) 0.892918 0.029698 0.834710 0.951 < 2e-16 ***
intref(prop) -0.104802 0.038003 -0.179288 -0.030 0.005821 **
intmed(prop) 0.098858 0.036181 0.027944 0.170 0.006290 **
pnie(prop) 0.113027 0.027197 0.059721 0.166 3.24e-05 ***
pm 0.211885 0.038694 0.136046 0.288 4.35e-08 ***
int -0.005945 0.006247 -0.018189 0.006 0.341319
pe 0.107082 0.029698 0.048875 0.165 0.000311 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(cde: controlled direct effect; pnde: pure natural direct effect; tnde: total natural direct effect; pnie: pure natural indirect effect; tnie: total natural indirect effect; te: total effect; intref: reference interaction; intmed: mediated interaction; cde(prop): proportion cde; intref(prop): proportion intref; intmed(prop): proportion intmed; pnie(prop): proportion pnie; pm: overall proportion mediated; int: overall proportion attributable to interaction; pe: overall proportion eliminated)
Relevant variable values:
$a
[1] "Yes"
$astar
[1] "No"
$mval
$mval[[1]]
[1] 27.8
$basecval
$basecval[[1]]
[1] 47.17467
6.9 Defining estimands
Imagine we have a hypothetical randomized controlled trial where we give participants treatment or no treatment on a specific outcome Y.
\(Y^{a=1} - Y^{a=0}\)
For mediation, we are also interested in the effect of a mediator on this pathway. Now image that we also intervene on the mediator in a new hypothetical randomized controlled trial.
\(Y^{m=1} - Y^{m=0}\)
Now consider if we, in the same trial, could intervene on both because we are interested in whether treatment causes the outcome because it causes the mediator.
\(Y^a\) = a subject’s outcome if treatment A were set, possible contrary to fact, to a
\(M^a\) = a subject’s value of the mediator if the exposure A were set to the value of a
\(Y^{a,m}\) = a subject’s outcome if A were set to a and M were set to m
\(Y^{a,M_a}\) = a subject’s outcome if A were set to a and M were set the value m would have had had a been set to a. Note, this is a nested counterfactual
We can now define these estimands:
- the controlled direct effect (CDE)
- natural direct effect (NDE)
- natural indirect effect (NIE)
6.9.1 Controlled direct effect
The effect of A on Y not mediated through M. Fixing the value of M to m.
\(Y^{a=1,m}\) - \(Y^{a=0,m}\)
We intervene on \(a\) but fix \(m\) to a certain value. The CDE is how much the outcome would change on average if the mediator were fixed at level m uniformly in the population but the treatment were changed from 0 to 1.
This could be relevant in the context of a change in a policy that impacted the mediator for everyone. For instance, if air pollution was a mediator between physical activity and cardiovascular disease risk. If a new policy would change the level of air pollution for all while we implement an intervention to increase biking in the city.
This effect is not used that often. But can be highly relevant in some situations.
6.9.2 (Pure) Natural direct effect
The effect that would remain, if we were to disable the pathway from exposure to mediator.
\(Y^{a=1,M_a=0}\) - \(Y^{a=0,M_a=0}\)
The PNDE is how much the outcome would change if the exposure was set at a = 1 versus a* = 0 but for each individual the mediator was kept at the level it would have taken, for that individual, in the absence of the exposure.
Note that the word “natural” refers to the nested counterfactual, the level the mediator would have taken in the absence of exposure. What it would naturally have been in the absence of exposure.
6.9.3 Total natural direct effect
\(Y^{a=1,M_a=1}\) - \(Y^{a=0,M_a=1}\)
Note, different from above in that the mediator is kept at the level it would have taken in the presence of the exposure.
6.9.4 (Total) Natural indirect effect
The effect of the mediator pathway.
\(Y^{a=1,M_a=1}\) - \(Y^{a=1,M_a=0}\)
The NIE is how much the outcome would change on average if the exposure were fixed at level a = 1 but the mediator were changed from the level it would take if a* = 0 to the level it would take if a = 1.
Note that exposure has to have an effect on M otherwise this will be zero.
6.9.5 Pure natural indirect effect
\(Y^{a=0,M_a=1}\) - \(Y^{a=0,M_a=0}\)
Note, this is different from the TNIE in that the exposure is set to no intervention.
6.9.6 Interaction effects
More information on the interaction effects can be found in (3).
Reference interaction:
\(INT_{ref} = PNDE - CDE\)
Mediation interaction:
\(INT_{med} = TNIE - PNIE\)
6.9.7 Effect decomposition
Using the causal inference framework also allow for effect decomposition, even when there are interaction and non-linearity.
Effect decomposition is important when we want to assess relative contributions such as the proportion mediated and eliminated.
TE = PNDE + TNIE = TNDE + PNIE
More information about decomposition of effects can be found in (3).
6.9.8 Proportions
Proportion mediated:
\(PM = NIE / TE\)
Proportion eliminated:
\(PE = (TE-CDE(m)) / TE\)