Register paths and datastores

Confirmed paths and access methods for all registers on project 708421

Published

July 21, 2026

Check the modification date on cleaned-data before running the pipeline.

file.info("E:/workdata/708421/cleaned-data/parquet-registers/")$mtime

The registers are not necessarily updated to today. Confirm that coverage matches your study period.

Base paths

# All paths used as constants at the top of scripts
path_parquet_reg <- "E:/workdata/708421/cleaned-data/parquet-registers/"
path_parquet_ext <- "E:/workdata/708421/cleaned-data/parquet-external/"
path_dm_pop <- "E:/workdata/708421/cleaned-data/diabetes-register-pop/dm_population_1977_2022.rds"
path_output <- "E:/workdata/708421/workspaces/[yourName]/BS_demens/datasets/"

Overview - all registers on project 708421

All confirmed via colnames() on the DST server 2026-05-15. Most registers are updated to end of 2024 as of 2026 (Anders Aasted Isaksen/DARTER team). Column names shown after rename_with(tolower).

Register	Access	Join key	Period	Critical column
BEF	`read_register("bef")`	`pnr`	All years	`koen`, `foed_dag`, `familie_id`
DODSAARS	`read_register("dodsaars")`	`pnr`	~1970–2001	`d_dodsdto` (death date)
DOD	not in cleaned-data	`pnr`	~2001–2024	`doddato` - see extraction guide
VNDS	`read_register("vnds")`	`pnr`	All years	`indud_kode`, `haend_dato`
LPR2 contacts	`read_register("lpr_adm")`	`recnum`	Up to March 2019	`d_inddto`, `c_pattype`
LPR2 diagnoses	`read_register("lpr_diag")`	`recnum`	Up to March 2019	`c_diag`, `c_diagtype`
LPR2 psych contacts	`read_register("t_psyk_adm")`	`k_recnum` → `recnum`	1995–March 2019	`v_cpr` → `pnr`
LPR2 psych diagnoses	`read_register("t_psyk_diag")`	`v_recnum` → `recnum`	1995–March 2019	`c_diag`, `c_diagtype`
LPR3 contacts	`read_register("lpr_a_kontakt")`	`dw_ek_kontakt`	March 2019+	`kont_starttidspunkt` (datetime)
LPR3 diagnoses	`read_register("lpr_a_diagnose")`	`dw_ek_kontakt`	March 2019+	`diag_kode`, `diag_kode_type`, `senere_afkraeftet`
LPR3 procedures	`read_register("procedurer_kirurgi")`	`dw_ek_forloeb`	2019+	`procedurekode`, `dato_start`
LPR2 procedures	`read_register("lpr_sksopr")`	`recnum`	1996–2018	`c_opr`, `d_odto`
LMDB	`read_register("lmdb")`	`pnr`	Approx. 1994+	`atc`, `eksd`
UDDA	`read_register("udda")`	`pnr`	All years	`hfaudd`, `aar`
FAIK	`read_register("faik")`	`familie_id`	All years	`famaekvivadisp_13`
AKM	`read_register("akm")`	`pnr`	All years	`socio13`, `aar`
DBSO	`read_register("dbso")`	`pnr`	2010+	`datoper_prim`, surgery flags
OSDC	`readRDS(path_dm_pop)`	`PNR` → rename to `pnr`	1977–2022	`diabetes_type`, `do_dm`
Laboratory results	`read_register("laboratorieproevesvar_")`	`pnr`	Approx. 1994+	`npu`, `samplingdato`, `samplevalue` (character)

Critical notes

DODSAARS and DOD - deaths are split across two registers: the date of death is not in one place. dodsaars is in cleaned-data but covers only ~1970–2001 (date of death in d_dodsdto). Deaths after 2001 are in DOD (date of death in doddato, covering ~2001–2024), which is not in cleaned-data - it requires extraction from the raw SAS file via your data manager. Both join on pnr, and you need both for full coverage over a modern study period.

Why it matters: if you run on dodsaars only, everyone who dies after 2001 is treated as alive. That skews censoring and matching in 01_build_cohorts.R. See pitfall 1.

LPR3 - duplicate risk: lpr_a_kontakt and lpr_a_diagnose contain data from two formats (LPR_F and LPR_A). Always filter on lprindberetningssystem == "LPR3". See pitfall 5.

Laboratory results - use only one source: laboratorieproevesvar_ (>2.2 billion rows) replaces lab_forsker/lab_dm_forsker. The old files still exist and cover the same data - use only one to avoid duplicates. Because the register is so large, semi_join(tibble(pnr = kohort$pnr), by = "pnr") and select() before collect() are essential. Two things to watch when extracting:

Tests are identified by NPU codes in the npu column - filter on the NPU codes your analysis needs.
samplevalue is a character column - it can contain text like “not detected” or “negative”, not just numbers. Convert with care (as.numeric() returns NA on text values).

See pitfall 6 - Laboratory results for a code example.

procedurer_kirurgi: dw_ek_kontakt is NA for all rows on DST. Join to lpr_a_kontakt via dw_ek_forloeb to fetch pnr.

DBSO: The identifier column is cpr in raw parquet - renamed to pnr by 00_prepare_dbso.R. All code uses pnr after that.

OSDC: PNR is uppercase in the raw file - rename with rename(pnr = PNR) after loading.

Loading templates

fastreg replaces dstDataPrep on DARTER. Registers are now loaded with fastreg::read_register("name") - the same function as in the general guide. If you have been using dstDataPrep::load_database(), that code still runs, but write new code with fastreg. You just point fastreg at DARTER’s parquet folder once per script - options(fastreg.project_workdata_dir = path_parquet_reg) - see “Didn’t convert the data yourself?” in Phase 4. Then read_register("bef") works by name.

library(fastreg) # read_register() - access to DST registers
library(dplyr) # rename_with, rename, left_join, select

# Standard register - via read_register:
bef <- read_register("bef") %>% rename_with(tolower) # lazy connection; lowercase columns

# Psychiatric LPR2 - requires renaming v_cpr and k_recnum:
psyk_adm <- read_register("t_psyk_adm") %>%
  rename_with(tolower) %>% # lowercase columns
  rename(pnr = v_cpr, recnum = k_recnum) # v_cpr → pnr; k_recnum → recnum

# DBSO - parquet-external (converted from SAS via 00_prepare_dbso.R):
dbso <- read_register("dbso") %>% rename_with(tolower) # lazy connection

# OSDC - RDS file with pre-computed diabetes classification:
dm_pop <- readRDS(path_dm_pop) %>% rename(pnr = PNR) # PNR is uppercase in raw file - rename

# LPR3 procedures - join via dw_ek_forloeb (NOT dw_ek_kontakt - is NA for all rows):
proc <- read_register("procedurer_kirurgi") %>%
  rename_with(tolower) %>% # lowercase columns
  left_join(
    read_register("lpr_a_kontakt") %>%
      rename_with(tolower) %>%
      select(dw_ek_forloeb, pnr), # fetch pnr via the forloeb key
    by = "dw_ek_forloeb" # join key - dw_ek_kontakt does not work
  )
# proc is still lazy - add filter() and collect() before use

Base paths

Overview - all registers on project 708421

Critical notes

Loading templates

See also