Medication (ATC)

Extract drug exposure from LMDB - filtered to your cohort

Published

July 2, 2026

Warning

Under development. Structural outline - to be expanded. The pattern is the same as in Extract from LPR: open the register, filter to the cohort, keep your codes, reduce to one row per person.

Prescription medication lives in LMDB (the Prescription Register). The register has one row per dispensed prescription, so the same person can have hundreds of rows. The two columns you almost always use are:

The full confirmed column list is in Register reference.

Note

LMDB only covers prescriptions dispensed at community pharmacies. Three things are therefore systematically missing: drugs given during hospital admission, drugs dispensed directly by hospitals (e.g. chemotherapy and immunosuppressants) and drugs for certain institutionalized people (Pottegård et al. 2017). If you need in-hospital medication, a newer Hospital Medication Register (Sygehusmedicinregisteret, data from 2018) exists, but it is still unvalidated, incomplete and accessed via the Danish Health Data Authority, not DST - use it with caution (Rosenkrantz et al. 2024).

Exposure, outcome or covariate? Same extract

Medication can be all three. It is solved exactly as for diagnoses (Extract from LPR): you make one extract of the dispensings, and the role is decided by when eksd falls relative to index_date - not by a separate page per role:

Role When What you do Page
Exposure The date exposure starts First eksd = the person’s index date Phase 10
Covariate (medication at baseline) eksd < index_date (e.g. 6-12 months before) Ever/never or a count in a window before index Comorbidity
Outcome (new treatment) eksd > index_date First eksd after index Outcomes
Time-varying (on/off treatment) Changes during follow-up Start/stop format Time-varying

So you do not write three different medication extracts. You write one, and the date filter relative to index determines the role.

The pattern

ATC codes have no D-prefix (unlike ICD in LPR, see Understand LPR), so you match directly on the start of the code:

library(arrow); library(dplyr)

cohort_pnrs <- unique(readRDS("path/to/full_cohort.rds")$pnr)   # your cohort from Phase 10

medication <- open_dataset("path/to/lmdb/") %>%
  rename_with(tolower) %>%
  semi_join(tibble(pnr = cohort_pnrs), by = "pnr") %>%   # ONLY your cohort
  filter(substr(atc, 1, 5) == "A10BJ") %>%               # GLP-1 analogues - filter BEFORE collect
  select(pnr, atc, eksd, vnr) %>%
  collect() %>%                                          # only HERE is data pulled into RAM
  group_by(pnr) %>% arrange(eksd) %>% slice(1) %>% ungroup()   # first dispensing per person

saveRDS(medication, "path/to/extract_medication.rds")

substr(atc, 1, 5) == "A10BJ" matches the whole ATC level (all GLP-1 analogues). For several groups at once, use regex as in LPR: grepl("^A10BJ|^A10BA", atc). The code-matching pattern (regex, %in% with a code list, !!) is explained in Extract from LPR and Function guide. Reducing to one row per person is explained in Long ↔︎ wide format - here slice(1) for the first dispensing, but it could also be ever/never or a count in a window (see the role table above).

ATC is not enough: same substance, different product

ATC classifies by active substance, not by product or indication. Two brands with the same substance therefore get the same ATC - and cannot be told apart on atc alone:

  • Ozempic (semaglutide, type 2 diabetes) and Wegovy (semaglutide, weight loss) both have ATC A10BJ06. Filter on ATC only, and you mix diabetes treatment together with weight-loss treatment.

Two columns separate them:

  • vnr (varenummer): the unique key to the actual product (package). It is the only reliable way to isolate one specific product. The vnr-to-product lookup comes from the medicine taxonomy (KAT / Danish Health Data Authority); name/packtext hold the product text if you want to recognise it by eye.
  • indo (indication code): a coded indication (from the Medicinpriser catalogue, LMS 25), not free text. The code is recorded only when the prescriber picks an indication from the drop-down menu in the electronic prescription. If the doctor types the indication as free text instead, it is not transferred to the register and indo is left blank. So it can in principle separate the same substance across indications, but it is often empty. Use vnr as the primary product key and indo as a supporting signal, not a clean filter.
# 1. Build your own list of the varenumbers that belong to the product
#    (look them up in the medicine taxonomy - one product has several varenumbers):
ozempic_vnr <- c("xxxxxx", "yyyyyy")   # placeholders - replace with your looked-up numbers

# 2. Keep only the dispensings whose vnr is on your list:
medication %>%
  filter(atc == "A10BJ06") %>%         # semaglutide (Ozempic AND Wegovy)
  filter(vnr %in% !!ozempic_vnr)       # keep only rows with a vnr from your list

Keep the two names in the last line apart: vnr (on its own) is the register’s column holding the varenummer - the product’s ID, just as pnr is the person’s ID. ozempic_vnr is your own R vector of the varenumbers you looked up for Ozempic; you choose the name yourself (same pattern as the code list in Extracting data step by step). %in% keeps the rows where vnr is found in your list, and !! sends your local list into the lazy query (explained in Extract from LPR and Function guide).

Note

Which vnr belongs to which product depends on package and strength (one product has several varenumbers). Look them up in the medicine taxonomy for your exact study period rather than assuming, and document the list in your code. For indo, Harbi & Pottegård 2024 found a recorded indication code on 82% of prescriptions (about 88% corrected) and almost 100% correct when present - but missingness is markedly higher before 1 October 2017 (when electronic prescribing became mandatory) and varies by drug class (about 8% missing for systemic anti-infectives versus 28% for blood-related agents). 5.6-36% of codes are nonspecific (e.g. “for the heart” for beta-blockers); whether a nonspecific code is usable depends on your question. The value set is the Danish Health Data Authority’s drug classification (Medicinpriser). As a side note: a validly recorded code does not mean the prescriber chose the correct indication - drop-down menus make wrong choices easy, so the code is not necessarily the clinical truth.

Warning

A dispensing is not the same as intake. eksd tells you the prescription was collected at the pharmacy, not that the patient took the drug, and certainly not for how long. For an exposure that lasts over time - treatment episodes, grace periods (the allowed gap between two prescriptions before a person counts as having stopped) and adherence measures such as PDC (proportion of days covered: the share of follow-up time covered by medicine) or MPR (medication possession ratio: amount dispensed divided by the length of the period) - you must build exposure windows from quantity and strength, not just count dispensings. The relevant LMDB fields are apk (number of packages), packsize (pack size, i.e. units per package) and strnum (the numeric strength; the unit is in strunit). Note that the dosage field itself (doso) is essentially empty (recorded for ~0.06% of prescriptions in the same validation), so dose must be derived from these package fields. Ready-made tools for this are in heaven (drug exposure windows from LMDB). If your index date or exposure status depends on future medication, you get immortal time bias: if you require a person to fill the prescription (or two prescriptions) after index to count as exposed, then by construction they survived until that date - and the “immortal” time between index and the first fill makes the exposure look artificially protective. Instead, start follow-up for the exposed at the first dispensing, or treat medication as a time-varying exposure.

See also

Back to top