Comorbidity
Comorbidity as a covariate: individual diagnoses or a combined score
Under development. Structural outline - to be expanded with concrete examples.
Comorbidity at baseline is a common covariate. How you handle it belongs to your analysis plan and is typically governed by your DAG (see Phase 1 - Choose your covariates using a DAG). There are essentially two approaches:
- Adjust for individual comorbidities as separate covariates, e.g. specific diagnoses you identified as confounders in your DAG. You extract these as ordinary diagnosis variables (the same LPR pattern as Outcomes).
- Adjust for the overall comorbidity burden with one summary index, e.g. NMI (the Nordic Multimorbidity Index), a validated, weighted multimorbidity score. Use a ready-made, validated index rather than coding it from scratch, see NMI and Algorithms & special packages.
Which is best: individual comorbidities or a combined score?
It depends on your DAG and your outcome:
- Individual comorbidities give the most control and transparency, but require you to know in advance exactly which diagnoses are confounders. Use this when your DAG points to a few specific comorbidities.
- A combined score (e.g. NMI) captures the general comorbidity burden in one variable and is practical when many comorbidities are potential confounders, or when you just want to describe the burden in Table 1.
- Beware of over-adjustment: a score like NMI is built to predict morbidity and mortality outcomes, and your own outcome must not be part of the score you adjust for. If your outcome (or a strong predictor of it) is one of the diagnoses the score weights, you partly adjust for the outcome itself. NMI handles dementia studies, for example, by dropping the dementia predictor (see NMI).
If you use a combined index, the pattern is: take your LPR diagnoses (filtered to the cohort and to the time before index), run them through the algorithm, and get a score per pnr, which you save as .rds and link in Phase 12.
cohort_pnrs <- unique(readRDS("sti/til/full_cohort.rds")$pnr)
diagnoses_before_index <- open_dataset("path/to/lpr_diag/") %>%
rename_with(tolower) %>%
semi_join(tibble(pnr = cohort_pnrs), by = "pnr") %>% # ONLY your cohort
# ... get the contact date, keep diagnoses before index_date ...
collect()
# → pass the diagnoses to the NMI algorithm to get a score per pnr