Extract from LPR

Practical recipes - the two approaches, helper functions and integration with the cohort

Published

July 2, 2026

This page shows how to extract diagnoses from LPR with code. It builds on the structure from Understand LPR: the periods (LPR2/LPR3), the D-prefix, the diagnosis types (A/B/G) and the filter for retracted diagnoses. Read that page first if you haven’t already.

You use the same extraction pattern as in Phase 5 and Phase 6 - just applied to LPR’s two generations. It is the most important and probably the most complex part of the guide.

Note

Circular dependency with Phase 10 - deliberate. To extract diagnoses (this page) we assume you already have a cohort; but you build the cohort in Phase 10 using exactly the extraction pattern you learn here. Read the phases in order, and come back to the code when your cohort is ready. The code uses inner_join() and bind_rows(); if these are new, they are explained in detail in Phase 12.

Warning

LPR3 files and column names - check before you load.

  • Use the LPR_A files consistently. DST has switched LPR3 from the old LPR_F format (kontakter, diagnoser, forloeb) to the new LPR_A format (lpr_a_kontakt, lpr_a_diagnose). Both versions may sit in your folder and cover the same years - loading both (or mixing old and new) gives you duplicated rows. Use lpr_a_* as in the examples here, and leave the LPR_F files alone. Some projects also have older data inside lpr_a_kontakt that is already in LPR2; filter it out with lprindberetningssystem == "LPR3". This is project-specific - on DARTER it is handled in pitfall 5; otherwise check with your data manager.
  • Verify column names. The switch to LPR_A changed a number of variable names (often opaque Danish abbreviations). The examples use the names the registers typically have, but always check before relying on a column name. On a lazy open_dataset() object use arrow::schema(your_data) - it lists the column names and their types without reading data into RAM; colnames(your_data) also works (and is what you use for the DuckDB-backed objects from read_register()/read_register()). Look names up in Overview of registers.
  • Mind the 2019-2020 diagnosis spike. The LPR3 contact model registers far more outpatient diagnoses than LPR2 did, producing a spike in diagnosis counts around 2019-2020 that matters for any count- or trend-based analysis. See the diagnosis spike.

Fetch diagnoses from LPR - choose your approach

Choose one of two approaches depending on your study:

Approach 1 - direct extraction Approach 2 - alle_dx
Best when You have fewer outcomes You have multiple outcomes from LPR
Workflow Fetch specific codes → Exclude → done Fetch all → Exclude → filter per outcome
Advantage Simpler and faster for single-outcome studies LPR queried only once; reused for all outcomes

Approach 1 is best for a smaller number of outcomes. Filter on specific ICD codes directly in the filter() step before collect(). DuckDB/Arrow pushes the filter down to the storage layer - only matching rows are loaded into RAM.

Approach 2 is best when your study has multiple outcomes. You query LPR once and build alle_dx: a shared table with all A and B diagnoses. For each new outcome, filter alle_dx on the relevant codes - the only line you change is the code list.

Note

The examples require parquet files and a completed study population. kohort is the data.frame with pnr and index_date per person - see Phase 10.

Adapt the path in open_dataset(...) to your project’s data folder (the examples’ E:/workdata/[projectnumber]/... is project-specific). Note: LPR is one of the largest registers, so the data must be in parquet. That lets Arrow/DuckDB filter before loading and fetch only the rows you ask for; reading LPR directly from e.g. SAS pulls the whole register into RAM. Convert SAS to parquet first: Phase 4 - SAS to Parquet.

Tip

Why semi_join(tibble(pnr = ...)) and not filter(pnr %in% ...)? Both keep only the cohort’s rows, but a semi_join against a small tibble of pnr’s pushes down into Arrow/DuckDB more efficiently and reliably. A large %in% filter with an R vector can get slow or be rejected outright (especially with an older duckplyr). It also frees you from !!: semi_join takes an ordinary local table as its argument. You still need !! when you pass a local R vector into a filter(), e.g. a code list (substr(c_diag, 2, 4) %in% !!CODES).

Approach 1 - fetch specific diagnoses directly (start here for one outcome)

Filter on specific codes before collect(). The example fetches diabetes mellitus (E10–E14) - replace CODES_REGEX with your own codes.

#=====================================================
# Extract diabetes diagnoses from LPR (Approach 1)
#=====================================================
library(arrow)
library(dplyr)

cohort_pnrs <- unique(kohort$pnr)
CODES_REGEX <- "^DE1[0-4]"   # diabetes mellitus (E10–E14) - with D-prefix

#-----------------------------------------------------
# LPR2 (somatic): lpr_adm + lpr_diag
#-----------------------------------------------------
lpr_adm  <- open_dataset("E:/workdata/[projectnumber]/cleaned-data/parquet-registers/lpr_adm/")  %>% rename_with(tolower)
lpr_diag <- open_dataset("E:/workdata/[projectnumber]/cleaned-data/parquet-registers/lpr_diag/") %>% rename_with(tolower)

# Here we combine LPR2's two tables: the contact register lpr_adm (pnr + dates) and the diagnosis
# register lpr_diag (the ICD code), filter on your codes/diagnosis types, and select the columns you need.
# You can pull as many codes at once as you like (via CODES_REGEX), and you choose the object name YOURSELF
# (here 'lpr2_dm' = LPR2 + diabetes mellitus, because the example is diabetes).
lpr2_dm <- lpr_adm %>%
  semi_join(tibble(pnr = cohort_pnrs), by = "pnr") %>%   # ONLY your cohort. OMIT this line if you want the WHOLE population
  select(pnr, recnum, date_contact = d_inddto) %>%  # pick columns from LPR2; d_inddto is RENAMED to date_contact (see note below)
  inner_join(
    lpr_diag %>%                                    # diagnosis register: has recnum + ICD code, but neither pnr nor date
      filter(c_diagtype %in% c("A", "B"),           # A + B = action/secondary diagnoses; add "G" (-> c("A","B","G")) if you also want underlying conditions
             grepl(CODES_REGEX, c_diag)) %>%         # keep only your codes - filter BEFORE collect (D-prefix in the regex)
      select(recnum, c_diag, c_diagtype),           # the diagnosis columns we need
    by = "recnum"                                   # recnum = the key linking contact and diagnosis in LPR2
  ) %>%
  collect() %>%                                      # ONLY here is data pulled into RAM (everything above runs in the database)
  mutate(icd3 = substr(c_diag, 2, 4))                # strip the D-prefix: "DE11" -> "E11"

#-----------------------------------------------------
# LPR3: lpr_a_kontakt + lpr_a_diagnose
#-----------------------------------------------------
lpr3_k <- open_dataset("E:/workdata/[projectnumber]/cleaned-data/parquet-registers/lpr_a_kontakt/")  %>% rename_with(tolower)
lpr3_d <- open_dataset("E:/workdata/[projectnumber]/cleaned-data/parquet-registers/lpr_a_diagnose/") %>% rename_with(tolower)

lpr3_dm <- lpr3_k %>%                                 # same pattern as LPR2 - but LPR3 tables and different column names
  semi_join(tibble(pnr = cohort_pnrs), by = "pnr") %>%   # ONLY your cohort. OMIT this line for the WHOLE population
  select(pnr, dw_ek_kontakt, date_contact = kont_starttidspunkt) %>%  # LPR3's date column renamed to the SAME name: date_contact
  inner_join(
    lpr3_d %>%
      filter(diag_kode_type %in% c("A", "B"),         # as in LPR2: add "G" if you also want underlying conditions
             is.na(senere_afkraeftet) | senere_afkraeftet != "Ja",  # drop withdrawn diagnoses (LPR3 only)
             grepl(CODES_REGEX, diag_kode)) %>%        # same codes; filter BEFORE collect
      select(dw_ek_kontakt, c_diag = diag_kode, c_diagtype = diag_kode_type),  # rename LPR3 names -> same names as LPR2
    by = "dw_ek_kontakt"                               # LPR3's contact key (NOT recnum as in LPR2)
  ) %>%
  collect() %>%
  mutate(date_contact = as.Date(date_contact),         # the LPR3 date is a datetime -> make it a plain date
         icd3 = substr(c_diag, 2, 4))                  # strip the D-prefix

#-----------------------------------------------------
# Combine LPR2 + LPR3 into one extract
#-----------------------------------------------------
# Works because we gave the two extracts the SAME column names above (date_contact, c_diag, ...):
dm_dx <- bind_rows(lpr2_dm, lpr3_dm)                   # one combined extract (LPR2 + LPR3)
# Columns: pnr | date_contact | c_diag | c_diagtype | icd3
Note

What is date_contact = d_inddto? (why is date_contact coloured?) Inside select(), new_name = old_name means you rename a column. d_inddto is LPR2’s actual date column, and date_contact is a name you choose - the editor colours it like an argument, but it’s just the new column name (not a function argument). We rename because LPR3 has a different date column (kont_starttidspunkt); giving both the name date_contact makes the two extracts share column names, so bind_rows() can stack them.

Must vs. choice: the register names (pnr, recnum, d_inddto, c_diag, c_diagtype, dw_ek_kontakt, kont_starttidspunkt, diag_kode …) must be spelled exactly as in the register. The new names (date_contact, icd3) and the object name (lpr2_dm) are yours to choose - just keep the harmonized names identical across LPR2 and LPR3.

Tip

Many specific codes? Build the regex programmatically:

codes <- c("E10", "E11", "E12", "E13", "E14")
CODES_REGEX <- paste0("^D(", paste(codes, collapse = "|"), ")")
Note

Do you have F-codes (e.g. dementia, depression)? Extend the regex to include them, e.g. "^DE1[0-4]|^DF0[0-3]|^DG30", and add psychiatric LPR2 - see Approach 2 below for the code.

Alternative: compact extraction (single-table approach)

A colleague may have shown you this shorter approach:

lpr <- left_join(lpr_adm, lpr_diag, by = "RECNUM") %>%
  filter(C_DIAGTYPE == "A",
         grepl("^S72", C_DIAG)) %>%
  group_by(PNR) %>%
  filter(D_INDDTO == min(D_INDDTO)) %>%
  slice(1) %>%
  ungroup()

It is shorter but has three pitfalls on DST data:

  1. D-prefix error: "^S72" does NOT match "DS72..." in DST data - returns zero rows with no error message. Use "^DS72" (with D) or strip the prefix first.
  2. left_join instead of inner_join: Keeps all admissions from lpr_adm - including those with no matching diagnosis. Unnecessarily heavy on national registers.
  3. No pnr filter: Loads the entire population’s data. Correct when building a cohort (Phase 10), not when extracting from an existing one.
Approach 2 - fetch all diagnoses + filter outcome (for multiple outcomes)

Part 1 - build alle_dx

#=====================================================
# Fetch ALL diagnoses from LPR (Approach 2)
#=====================================================
library(arrow)
library(dplyr)

cohort_pnrs <- unique(kohort$pnr)

#-----------------------------------------------------
# LPR2 somatic (up to March 2019)
#-----------------------------------------------------
lpr_adm  <- open_dataset("E:/workdata/[projectnumber]/cleaned-data/parquet-registers/lpr_adm/")  %>% rename_with(tolower)
lpr_diag <- open_dataset("E:/workdata/[projectnumber]/cleaned-data/parquet-registers/lpr_diag/") %>% rename_with(tolower)

lpr2_dx <- lpr_adm %>%
  semi_join(tibble(pnr = cohort_pnrs), by = "pnr") %>%
  select(pnr, recnum, date_contact = d_inddto) %>%
  inner_join(
    lpr_diag %>%
      filter(c_diagtype %in% c("A", "B")) %>%
      select(recnum, c_diag, c_diagtype),
    by = "recnum"
  ) %>%
  collect() %>%
  mutate(icd3 = substr(c_diag, 2, 4))

#-----------------------------------------------------
# LPR3 (March 2019 and onwards)
#-----------------------------------------------------
lpr3_k <- open_dataset("E:/workdata/[projectnumber]/cleaned-data/parquet-registers/lpr_a_kontakt/")  %>% rename_with(tolower)
lpr3_d <- open_dataset("E:/workdata/[projectnumber]/cleaned-data/parquet-registers/lpr_a_diagnose/") %>% rename_with(tolower)

lpr3_dx <- lpr3_k %>%
  semi_join(tibble(pnr = cohort_pnrs), by = "pnr") %>%
  select(pnr, dw_ek_kontakt, date_contact = kont_starttidspunkt) %>%
  inner_join(
    lpr3_d %>%
      filter(diag_kode_type %in% c("A", "B"),
             is.na(senere_afkraeftet) | senere_afkraeftet != "Ja") %>%
      select(dw_ek_kontakt, c_diag = diag_kode, c_diagtype = diag_kode_type),
    by = "dw_ek_kontakt"
  ) %>%
  collect() %>%
  mutate(date_contact = as.Date(date_contact), icd3 = substr(c_diag, 2, 4))

#-----------------------------------------------------
# Combine LPR2 + LPR3 into one extract
#-----------------------------------------------------
alle_dx <- bind_rows(lpr2_dx, lpr3_dx)
# Columns: pnr | date_contact | c_diag | c_diagtype | icd3
Note

Do you have F-codes (e.g. dementia, depression)? Psychiatric diagnoses recorded before March 2019 are in separate registers. Add them before bind_rows():

psyk_adm  <- open_dataset("E:/workdata/[projectnumber]/cleaned-data/parquet-registers/t_psyk_adm/") %>%
  rename_with(tolower) %>% rename(pnr = v_cpr, recnum = k_recnum)
psyk_diag <- open_dataset("E:/workdata/[projectnumber]/cleaned-data/parquet-registers/t_psyk_diag/") %>%
  rename_with(tolower) %>% rename(recnum = v_recnum)

lpr2_psyk_dx <- psyk_adm %>%
  semi_join(tibble(pnr = cohort_pnrs), by = "pnr") %>%
  select(pnr, recnum, date_contact = d_inddto) %>%
  inner_join(psyk_diag %>% filter(c_diagtype %in% c("A", "B")) %>%
               select(recnum, c_diag, c_diagtype), by = "recnum") %>%
  collect() %>% mutate(icd3 = substr(c_diag, 2, 4))

alle_dx <- bind_rows(lpr2_dx, lpr2_psyk_dx, lpr3_dx)
Tip

Using duckplyr? union_all() combines tables before collect() and requires identical column names and types. Rename LPR3 columns to match the LPR2 format before combining - see the onboarding document for an example.

Filter your extracted table for specific outcomes

CODES <- c("G30", "F00", "F01", "F02", "F03")   # dementia - change to your outcome

outcome <- alle_dx %>%
  filter(icd3 %in% CODES) %>%
  inner_join(cohort %>% select(pnr, index_date), by = "pnr") %>%   # use cohort_clean after exclusion (Phase 10 Step 2)
  filter(date_contact > index_date) %>%   # post-index; use < for baseline covariate
  group_by(pnr) %>%
  arrange(date_contact) %>%
  slice(1) %>%
  ungroup() %>%
  select(pnr, event_date = date_contact)

# Join to cohort - NA = no event (censored at end of study)
result <- cohort %>%
  select(pnr) %>%
  left_join(outcome, by = "pnr")

saveRDS(result, "sti/til/extract_dementia.rds")   # change filename for each new outcome
Note

Exclusion of prevalent cases - persons who already had the diagnosis before index date - happens in Phase 10, Step 2. Use cohort_clean instead of cohort in the code above after completing that step.


Try it yourself - runnable example with synthetic data (Approach 1)
Important

This example requires RStudio installed locally on your computer - not the DST server. The synthetic dataset (fakeregs) is not available on DST. Download R: cran.r-project.org · Download RStudio: posit.co/download/rstudio-desktop It uses open_dataset() on local synth_data/ folders; fastreg’s read_register() is for a configured DST project, not ad-hoc local folders.

The example extracts CVD diagnoses (ischaemic heart disease, ICD-10 I20–I25) from LPR2 and LPR3 combined - the complete pattern from the theory section above, but runnable locally with synthetic data. It follows Approach 1: specific codes are filtered out before collect().

The synthetic LPR data is generated with the fakeregs package, which you already know from Phase 6 - First extraction. If you have already generated and saved data there, synth_data/lpr_adm/ is ready and you can skip the preparation block.

Adapted from Anders Aasted Isaksen’s dev/common_tasks_datatable.qmd in fakeregs (MIT licence, Steno Diabetes Center Aarhus). Rewritten to dplyr + arrow and adapted to this guide’s pattern.

# Install fakeregs for the first time:
# install.packages("pak"); pak::pak("steno-aarhus/fakeregs")

library(fakeregs)   # synthetic DST register data
library(dplyr)      # filter, select, mutate, inner_join, bind_rows
library(arrow)      # open_dataset, write_parquet

#=====================================================
# Preparation: generate synthetic data (run only once)
#=====================================================
bp             <- generate_background_pop()
lpr_adm_synth  <- generate_lpr_adm(background_df = bp)
lpr_diag_synth <- generate_lpr_diag(background_df = lpr_adm_synth)
lpr_a_k_synth  <- generate_lpr_a_kontakt(background_df = bp)
lpr_a_d_synth  <- generate_lpr_a_diagnose(background_df = lpr_a_k_synth)

dir.create("synth_data/lpr_adm",        recursive = TRUE, showWarnings = FALSE)
dir.create("synth_data/lpr_diag",       recursive = TRUE, showWarnings = FALSE)
dir.create("synth_data/lpr_a_kontakt",  recursive = TRUE, showWarnings = FALSE)
dir.create("synth_data/lpr_a_diagnose", recursive = TRUE, showWarnings = FALSE)
write_parquet(lpr_adm_synth,  "synth_data/lpr_adm/lpr_adm.parquet")
write_parquet(lpr_diag_synth, "synth_data/lpr_diag/lpr_diag.parquet")
write_parquet(lpr_a_k_synth,  "synth_data/lpr_a_kontakt/lpr_a_kontakt.parquet")
write_parquet(lpr_a_d_synth,  "synth_data/lpr_a_diagnose/lpr_a_diagnose.parquet")
Tip

The path is relative to your working directory - check with getwd(). If you have already run the preparation block in Phase 6, synth_data/lpr_adm/ is already saved.

#=====================================================
# Extract CVD diagnoses (same pattern as Approach 1)
#=====================================================
# The ICD codes we are looking for - change these to your own outcome
CVD_CODES <- c("I20", "I21", "I22", "I23", "I24", "I25")   # ischaemic heart disease

#-----------------------------------------------------
# LPR2 somatic (up to March 2019)
#-----------------------------------------------------
lpr_adm  <- open_dataset("synth_data/lpr_adm/")  %>% rename_with(tolower)   # LPR2 contact table - synthetic
lpr_diag <- open_dataset("synth_data/lpr_diag/") %>% rename_with(tolower)   # LPR2 diagnosis table - synthetic

lpr2_cvd <- lpr_adm %>%
  select(pnr, recnum, date_contact = d_inddto) %>%           # select only necessary columns
  inner_join(
    lpr_diag %>%
      filter(c_diagtype %in% c("A", "B"),                    # only action and secondary diagnoses
             substr(c_diag, 2, 4) %in% !!CVD_CODES) %>%       # !! sends the local R vector to DuckDB
      select(recnum, c_diag),                    # only join key and diagnosis code
    by = "recnum"                                             # join key in LPR2
  ) %>%
  collect() %>%                                              # HERE data is fetched into R
  mutate(icd3 = substr(c_diag, 2, 4))                        # save cleaned code as new column

#-----------------------------------------------------
# LPR3 (March 2019 and onwards)
#-----------------------------------------------------
lpr3_k <- open_dataset("synth_data/lpr_a_kontakt/")  %>% rename_with(tolower)   # LPR3 contact table - synthetic
lpr3_d <- open_dataset("synth_data/lpr_a_diagnose/") %>% rename_with(tolower)   # LPR3 diagnosis table - synthetic

lpr3_cvd <- lpr3_k %>%
  select(pnr, dw_ek_kontakt, date_contact = kont_starttidspunkt) %>%   # dw_ek_kontakt is join key to lpr_a_diagnose
  inner_join(
    lpr3_d %>%
      filter(diag_kode_type %in% c("A", "B"),
             is.na(senere_afkraeftet) | senere_afkraeftet != "Ja",  # exclude retracted diagnoses
             substr(diag_kode, 2, 4) %in% !!CVD_CODES) %>%   # !! sends the local R vector to DuckDB
      select(dw_ek_kontakt, c_diag = diag_kode),             # rename to c_diag for consistency with LPR2
    by = "dw_ek_kontakt"                                     # join key in LPR3
  ) %>%
  collect() %>%                                              # fetch into R
  mutate(
    date_contact = as.Date(date_contact),                    # datetime → date
    icd3         = substr(c_diag, 2, 4)                      # strip D-prefix: "DI21" → "I21"
  )

#-----------------------------------------------------
# Combine and save
#-----------------------------------------------------
alle_cvd <- bind_rows(lpr2_cvd, lpr3_cvd)                   # stack LPR2 and LPR3

nrow(alle_cvd)                                               # check: number of diagnosis rows
length(unique(alle_cvd$pnr))                                 # check: number of unique individuals
table(alle_cvd$icd3)                                         # distribution across codes

saveRDS(alle_cvd, "sti/til/extract_cvd.rds")                # save - change path to your own folder

Wrap the pattern in a reusable function (for multiple outcomes)

If you extract diagnoses for several outcomes, it pays off to encapsulate the Approach 2 pattern in one reusable function rather than copying ~40 lines for each new outcome. Define it at the top of your script or in a separate functions.R file. The function keeps the diagnosis-type column (c_diagtype) in its output, so you can later restrict the case definition for a sensitivity analysis (see Diagnosis types) without re-querying LPR.

Advantages: - One place to fix if something changes (e.g. a new register or a new column) - The code block for each outcome is reduced from ~40 lines to one function call - Errors are introduced in one place instead of in each copy

Tip

Using fastreg, or working on DARTER? With fastreg, replace open_dataset("E:/workdata/.../<register>/") with read_register("<register>") (by name). On DARTER you use the same read_register("<register>") (fastreg). See DARTER - overview and pipeline for the fully adapted variant - it is kept up to date with the current, confirmed register names (as of June 2026).

See the full get_lpr_diagnoses() function and usage
library(arrow)
library(dplyr)

#=====================================================
# Function: get_lpr_diagnoses() - reusable LPR extract
#=====================================================
get_lpr_diagnoses <- function(pnr_vector, diagtypes = c("A", "B"), inpatient_only = FALSE) {
  base <- "E:/workdata/[projectnumber]/cleaned-data/parquet-registers/"

  # Open registers
  lpr_adm   <- open_dataset(paste0(base, "lpr_adm/"))   %>% rename_with(tolower)   # LPR2 somatic contacts
  lpr_diag  <- open_dataset(paste0(base, "lpr_diag/"))  %>% rename_with(tolower)   # LPR2 somatic diagnoses
  psyk_adm  <- open_dataset(paste0(base, "t_psyk_adm/"))  %>% rename_with(tolower) %>%
    rename(pnr = v_cpr, recnum = k_recnum)                            # LPR2 psychiatric contacts
  psyk_diag <- open_dataset(paste0(base, "t_psyk_diag/")) %>% rename_with(tolower) %>%
    rename(recnum = v_recnum)                                          # LPR2 psychiatric diagnoses
  lpr3_k    <- open_dataset(paste0(base, "lpr_a_kontakt/"))  %>% rename_with(tolower) %>%
    filter(lprindberetningssystem == "LPR3")                               # CRITICAL (DARTER): keep only rows from the LPR3 system - avoid overlapping rows
  lpr3_d    <- open_dataset(paste0(base, "lpr_a_diagnose/")) %>% rename_with(tolower)  # LPR3 diagnoses

  # Filter on admission type if desired
  if (inpatient_only) {
    lpr_adm <- lpr_adm %>% filter(c_pattype == "0")          # "0" = inpatient (full-day) in LPR2
    lpr3_k  <- lpr3_k  %>% filter(kont_type == "ALCA00")     # ALCA00 = physical attendance (NOT = inpatient) - see "Classify patient type" below
  }

  # LPR2 somatic
  lpr2_dx <- lpr_adm %>%
    semi_join(tibble(pnr = pnr_vector), by = "pnr") %>%
    select(pnr, recnum, date_contact = d_inddto) %>%
    inner_join(
      lpr_diag %>% filter(c_diagtype %in% !!diagtypes) %>% select(recnum, c_diag, c_diagtype),
      by = "recnum"
    ) %>%
    collect() %>%
    mutate(icd3 = substr(c_diag, 2, 4))                       # strip D-prefix

  # LPR2 psychiatric
  lpr2_psyk_dx <- psyk_adm %>%
    semi_join(tibble(pnr = pnr_vector), by = "pnr") %>%
    select(pnr, recnum, date_contact = d_inddto) %>%
    inner_join(
      psyk_diag %>% filter(c_diagtype %in% !!diagtypes) %>% select(recnum, c_diag, c_diagtype),
      by = "recnum"
    ) %>%
    collect() %>%
    mutate(icd3 = substr(c_diag, 2, 4))

  # LPR3
  lpr3_dx <- lpr3_k %>%
    semi_join(tibble(pnr = pnr_vector), by = "pnr") %>%
    select(pnr, dw_ek_kontakt, date_contact = kont_starttidspunkt) %>%
    inner_join(
      lpr3_d %>%
        filter(diag_kode_type %in% !!diagtypes,
               is.na(senere_afkraeftet) | senere_afkraeftet != "Ja") %>%
        select(dw_ek_kontakt, c_diag = diag_kode, c_diagtype = diag_kode_type),
      by = "dw_ek_kontakt"
    ) %>%
    collect() %>%
    mutate(date_contact = as.Date(date_contact),               # datetime → date
           icd3 = substr(c_diag, 2, 4))

  bind_rows(lpr2_dx, lpr2_psyk_dx, lpr3_dx)                   # return combined table
}

Use the function - one call per extraction, only change CODES:

cohort    <- readRDS("sti/til/full_cohort.rds")
pnr_list  <- unique(cohort$pnr)

# Fetch all diagnoses for the cohort (Phase 1 - see hospital contacts page)
alle_dx <- get_lpr_diagnoses(
  pnr_vector    = pnr_list,
  diagtypes     = c("A", "B"),
  inpatient_only = FALSE
)
# Returns: pnr | date_contact | c_diag | c_diagtype | icd3

# Extract one outcome - only change CODES (Phase 2)
CODES <- c("F00", "F01", "F02", "F03", "G30", "G31")   # dementia

dementia <- alle_dx %>%
  filter(icd3 %in% CODES) %>%
  inner_join(cohort %>% select(pnr, index_date), by = "pnr") %>%
  filter(date_contact > index_date) %>%
  group_by(pnr) %>% arrange(date_contact) %>% slice(1) %>% ungroup() %>%
  select(pnr, dementia_date = date_contact)

result <- cohort %>% select(pnr) %>% left_join(dementia, by = "pnr")
saveRDS(result, "sti/til/extract_dementia.rds")

Classify patient type: inpatient, outpatient or emergency

Many studies need to tell inpatient contacts apart from outpatient visits and emergency-room visits (e.g. only admissions as an outcome, or acute contacts as a marker). There is no single shared field across LPR2 and LPR3 - you derive the type.

Warning

Column names and codes must be verified against your own extract. The logic itself is durable (the codes ALCA00 and ATA1 are confirmed LPR3 values as of 2025), but the exact column names in LPR_A may differ from the example. Check with arrow::schema()/colnames() and look them up in Overview of registers. The pattern is adapted from the Plana-Ripoll group’s code on OSF.

LPR2 (up to March 2019). Patient type lives in c_pattype, but the emergency-room coding changed around 2014:

lpr2_type <- lpr_adm %>%                        # lpr_adm from the extract above
  mutate(patienttype = case_when(
    c_pattype %in% c("0", "1", "4", "5") ~ "inpatient",    # full-day/part-day/day/night
    c_pattype == "3"                     ~ "emergency",    # explicit ER (mostly pre-2014)
    c_pattype == "2" & c_indm == "1"     ~ "emergency",    # from ~2014: acute admission mode = ER
    c_pattype == "2"                     ~ "outpatient",
    TRUE                                 ~ NA_character_
  ))

LPR3 (March 2019 and onwards). There is no direct c_pattype. The kont_type code ALCA00 means physical attendance (not admission), and the prioritet code ATA1 means acute. A common research approach derives patient type from the contact’s duration plus an ER/acute marker:

lpr3_type <- lpr3_k %>%                          # lpr3_k from the extract above
  filter(kont_type == "ALCA00") %>%              # keep physical attendances; drop phone/video
  mutate(
    duration_hours = as.numeric(
      difftime(kont_sluttidspunkt, kont_starttidspunkt, units = "hours")  # verify the end-time column name
    ),
    patienttype = case_when(
      duration_hours >= 8                                  ~ "inpatient",    # >= 8 hours ~ admission
      enhedstype_ans == "skadestue" & prioritet == "ATA1"  ~ "emergency",    # acute ER
      TRUE                                                 ~ "outpatient"
    )
  )
Note

The 8-hour cut-off is a heuristic, not an official definition - LPR3 has no “inpatient” field. Pick and document your own threshold, and clarify with your data manager. The column names for start/end time and unit type vary between deliveries; verify them before use.


Remove unwanted diagnoses

A few codes are administrative artefacts rather than the patient’s own disease, and should typically be removed from both outcomes and comorbidity:

  • “Healthy companion” (someone admitted as a companion to another patient, e.g. a parent): ICD-10 DZ763. Related contact/observation codes with no disease: DZ032, DZ038, DZ039.
  • “Diagnosis not found”/unspecified from the ICD-8 era: Y719.
alle_dx <- alle_dx %>%
  filter(
    !substr(toupper(c_diag), 1, 5) %in% c("DZ763", "DZ032", "DZ038", "DZ039"),
    substr(toupper(c_diag), 1, 4) != "Y719"
  )
Note

What to remove depends on project and question - DZ03* (observation for suspected disease) is for instance relevant in some studies and should be kept there. The pattern is adapted from the Plana-Ripoll group’s code on OSF; verify against your own data.


Next steps

You have now extracted diagnoses from two LPR generations. Next steps are to shape and combine your extracts:

Phase 12 - Assemble and prepare the dataset

See also

Back to top