Medication (ATC)
Extract drug exposure from LMDB - filtered to your cohort
Under development. Structural outline - to be expanded. The pattern is the same as in Extract from LPR: open the register, filter to the cohort, keep your codes, reduce to one row per person.
Prescription medication lives in LMDB (the Prescription Register). The register has one row per dispensed prescription, so the same person can have hundreds of rows. The two columns you almost always use are:
atc: the drug’s ATC code (Anatomical Therapeutic Chemical), e.g."A10BA02"for metformin.eksd: the dispensing date (when the prescription was collected) - use it as the prescription date, just likedate_contactin LPR.
The full confirmed column list is in Register reference.
LMDB only covers prescriptions dispensed at community pharmacies. Three things are therefore systematically missing: drugs given during hospital admission, drugs dispensed directly by hospitals (e.g. chemotherapy and immunosuppressants) and drugs for certain institutionalized people (Pottegård et al. 2017). If you need in-hospital medication, a newer Hospital Medication Register (Sygehusmedicinregisteret, data from 2018) exists, but it is still unvalidated, incomplete and accessed via the Danish Health Data Authority, not DST - use it with caution (Rosenkrantz et al. 2024).
Exposure, outcome or covariate? Same extract
Medication can be all three. It is solved exactly as for diagnoses (Extract from LPR): you make one extract of the dispensings, and the role is decided by when eksd falls relative to index_date - not by a separate page per role:
| Role | When | What you do | Page |
|---|---|---|---|
| Exposure | The date exposure starts | First eksd = the person’s index date |
Phase 10 |
| Covariate (medication at baseline) | eksd < index_date (e.g. 6-12 months before) |
Ever/never or a count in a window before index | Comorbidity |
| Outcome (new treatment) | eksd > index_date |
First eksd after index |
Outcomes |
| Time-varying (on/off treatment) | Changes during follow-up | Start/stop format | Time-varying |
So you do not write three different medication extracts. You write one, and the date filter relative to index determines the role.
The pattern
ATC codes have no D-prefix (unlike ICD in LPR, see Understand LPR), so you match directly on the start of the code:
library(arrow); library(dplyr)
cohort_pnrs <- unique(readRDS("path/to/full_cohort.rds")$pnr) # your cohort from Phase 10
medication <- open_dataset("path/to/lmdb/") %>%
rename_with(tolower) %>%
semi_join(tibble(pnr = cohort_pnrs), by = "pnr") %>% # ONLY your cohort
filter(substr(atc, 1, 5) == "A10BJ") %>% # GLP-1 analogues - filter BEFORE collect
select(pnr, atc, eksd, vnr) %>%
collect() %>% # only HERE is data pulled into RAM
group_by(pnr) %>% arrange(eksd) %>% slice(1) %>% ungroup() # first dispensing per person
saveRDS(medication, "path/to/extract_medication.rds")substr(atc, 1, 5) == "A10BJ" matches the whole ATC level (all GLP-1 analogues). For several groups at once, use regex as in LPR: grepl("^A10BJ|^A10BA", atc). The code-matching pattern (regex, %in% with a code list, !!) is explained in Extract from LPR and Function guide. Reducing to one row per person is explained in Long ↔︎ wide format - here slice(1) for the first dispensing, but it could also be ever/never or a count in a window (see the role table above).
ATC is not enough: same substance, different product
ATC classifies by active substance, not by product or indication. Two brands with the same substance therefore get the same ATC - and cannot be told apart on atc alone:
- Ozempic (semaglutide, type 2 diabetes) and Wegovy (semaglutide, weight loss) both have ATC
A10BJ06. Filter on ATC only, and you mix diabetes treatment together with weight-loss treatment.
Two columns separate them:
vnr(varenummer): the unique key to the actual product (package). It is the only reliable way to isolate one specific product. Thevnr-to-product lookup comes from the medicine taxonomy (KAT / Danish Health Data Authority);name/packtexthold the product text if you want to recognise it by eye.indo(indication code): a coded indication (from the Medicinpriser catalogue, LMS 25), not free text. The code is recorded only when the prescriber picks an indication from the drop-down menu in the electronic prescription. If the doctor types the indication as free text instead, it is not transferred to the register andindois left blank. So it can in principle separate the same substance across indications, but it is often empty. Usevnras the primary product key andindoas a supporting signal, not a clean filter.
# 1. Build your own list of the varenumbers that belong to the product
# (look them up in the medicine taxonomy - one product has several varenumbers):
ozempic_vnr <- c("xxxxxx", "yyyyyy") # placeholders - replace with your looked-up numbers
# 2. Keep only the dispensings whose vnr is on your list:
medication %>%
filter(atc == "A10BJ06") %>% # semaglutide (Ozempic AND Wegovy)
filter(vnr %in% !!ozempic_vnr) # keep only rows with a vnr from your listKeep the two names in the last line apart: vnr (on its own) is the register’s column holding the varenummer - the product’s ID, just as pnr is the person’s ID. ozempic_vnr is your own R vector of the varenumbers you looked up for Ozempic; you choose the name yourself (same pattern as the code list in Extracting data step by step). %in% keeps the rows where vnr is found in your list, and !! sends your local list into the lazy query (explained in Extract from LPR and Function guide).
Which vnr belongs to which product depends on package and strength (one product has several varenumbers). Look them up in the medicine taxonomy for your exact study period rather than assuming, and document the list in your code. For indo, Harbi & Pottegård 2024 found a recorded indication code on 82% of prescriptions (about 88% corrected) and almost 100% correct when present - but missingness is markedly higher before 1 October 2017 (when electronic prescribing became mandatory) and varies by drug class (about 8% missing for systemic anti-infectives versus 28% for blood-related agents). 5.6-36% of codes are nonspecific (e.g. “for the heart” for beta-blockers); whether a nonspecific code is usable depends on your question. The value set is the Danish Health Data Authority’s drug classification (Medicinpriser). As a side note: a validly recorded code does not mean the prescriber chose the correct indication - drop-down menus make wrong choices easy, so the code is not necessarily the clinical truth.
A dispensing is not the same as intake. eksd tells you the prescription was collected at the pharmacy, not that the patient took the drug, and certainly not for how long. For an exposure that lasts over time - treatment episodes, grace periods (the allowed gap between two prescriptions before a person counts as having stopped) and adherence measures such as PDC (proportion of days covered: the share of follow-up time covered by medicine) or MPR (medication possession ratio: amount dispensed divided by the length of the period) - you must build exposure windows from quantity and strength, not just count dispensings. The relevant LMDB fields are apk (number of packages), packsize (pack size, i.e. units per package) and strnum (the numeric strength; the unit is in strunit). Note that the dosage field itself (doso) is essentially empty (recorded for ~0.06% of prescriptions in the same validation), so dose must be derived from these package fields. Ready-made tools for this are in heaven (drug exposure windows from LMDB). If your index date or exposure status depends on future medication, you get immortal time bias: if you require a person to fill the prescription (or two prescriptions) after index to count as exposed, then by construction they survived until that date - and the “immortal” time between index and the first fill makes the exposure look artificially protective. Instead, start follow-up for the exposed at the first dispensing, or treat medication as a time-varying exposure.
See also
- Extract from LPR: code matching in detail (regex, code lists,
!!) - same technique without the D-prefix - Outcomes: where a codelist comes from (applies to ATC too)
- Register reference: all confirmed LMDB columns
- Algorithms & special packages:
heavenfor exposure windows; OSDC and NMI already use ATC codes - Time-varying variables: on/off treatment over time
- Phase 12 - Assemble and prepare the dataset: join the medication extract onto the cohort
- RECORD-PE: reporting standard for pharmacoepidemiology on routinely collected data (extends STROBE/RECORD)