Comorbidity

Comorbidity as a covariate: individual diagnoses or a combined score

Published

July 21, 2026

Under development. Structural outline - to be expanded with concrete examples.

Comorbidity at baseline is a common covariate. How you handle it belongs to your analysis plan and is typically governed by your DAG (see Phase 1 - Choose your covariates using a DAG). There are essentially two approaches:

Adjust for individual comorbidities as separate covariates, e.g. specific diagnoses you identified as confounders in your DAG. You extract these as ordinary diagnosis variables (the same LPR pattern as Outcomes).
Adjust for the overall comorbidity burden with one summary index - a multimorbidity index based on hospital ICD-10 diagnoses (a dedicated one is in development). Use a ready-made, validated index rather than coding it from scratch. Note: NMI, despite its name, is a mortality-prediction score, not a comorbidity-burden measure - use it only for mortality risk (see NMI).

Which is best: individual comorbidities or a combined score?

It depends on your DAG and your outcome:

Individual comorbidities give the most control and transparency, but require you to know in advance exactly which diagnoses are confounders. Use this when your DAG points to a few specific comorbidities.
A combined multimorbidity index captures the general comorbidity burden in one variable and is practical when many comorbidities are potential confounders, or when you just want to describe the burden in Table 1. (NMI is not such an index - it targets mortality; see NMI.)
Beware of over-adjustment: a score like NMI is built to predict mortality, and your own outcome must not be part of the score you adjust for. If your outcome (or a strong predictor of it) is one of the diagnoses the score weights, you partly adjust for the outcome itself. NMI handles dementia studies, for example, by dropping the dementia predictor (see NMI).

If you use a combined index, the pattern is: take your LPR diagnoses (filtered to the cohort and to the time before index), run them through the algorithm, and get a score per pnr, which you save as .rds and link in Phase 12.

read_register() requires fastreg set up with the path to your registers - see Phase 4 if you did not convert them from SAS yourself.

cohort_pnrs <- unique(readRDS("sti/til/full_cohort.rds")$pnr)

diagnoses_before_index <- read_register("lpr_diag") %>% # without fastreg: open_dataset("path/to/lpr_diag/")
  rename_with(tolower) %>%
  semi_join(tibble(pnr = cohort_pnrs), by = "pnr") %>% # ONLY your cohort
  # ... get the contact date, keep diagnoses before index_date ...
  collect()
# → pass the diagnoses to the multimorbidity algorithm to get a score per pnr

See also