Register paths and datastores
Confirmed paths and access methods for all registers on project 708421
Check the modification date on cleaned-data before running the pipeline.
file.info("E:/workdata/708421/cleaned-data/parquet-registers/")$mtimeThe registers are not necessarily updated to today. Confirm that coverage matches your study period.
Base paths
# All paths used as constants at the top of scripts
path_parquet_reg <- "E:/workdata/708421/cleaned-data/parquet-registers/"
path_parquet_ext <- "E:/workdata/708421/cleaned-data/parquet-external/"
path_dm_pop <- "E:/workdata/708421/cleaned-data/diabetes-register-pop/dm_population_1977_2022.rds"
path_output <- "E:/workdata/708421/workspaces/[yourName]/BS_demens/datasets/"Overview - all registers on project 708421
All confirmed via colnames() on the DST server 2026-05-15. Most registers are updated to end of 2024 as of 2026 (Anders Aasted Isaksen/DARTER team). Column names shown after rename_with(tolower).
| Register | Access | Join key | Period | Critical column |
|---|---|---|---|---|
| BEF | read_register("bef") |
pnr |
All years | koen, foed_dag, familie_id |
| DODSAARS | read_register("dodsaars") |
pnr |
~1970–2001 | d_dodsdto (death date) |
| DOD | not in cleaned-data | pnr |
~2001–2024 | doddato - see extraction guide |
| VNDS | read_register("vnds") |
pnr |
All years | indud_kode, haend_dato |
| LPR2 contacts | read_register("lpr_adm") |
recnum |
Up to March 2019 | d_inddto, c_pattype |
| LPR2 diagnoses | read_register("lpr_diag") |
recnum |
Up to March 2019 | c_diag, c_diagtype |
| LPR2 psych contacts | read_register("t_psyk_adm") |
k_recnum → recnum |
1995–March 2019 | v_cpr → pnr |
| LPR2 psych diagnoses | read_register("t_psyk_diag") |
v_recnum → recnum |
1995–March 2019 | c_diag, c_diagtype |
| LPR3 contacts | read_register("lpr_a_kontakt") |
dw_ek_kontakt |
March 2019+ | kont_starttidspunkt (datetime) |
| LPR3 diagnoses | read_register("lpr_a_diagnose") |
dw_ek_kontakt |
March 2019+ | diag_kode, diag_kode_type, senere_afkraeftet |
| LPR3 procedures | read_register("procedurer_kirurgi") |
dw_ek_forloeb |
2019+ | procedurekode, dato_start |
| LPR2 procedures | read_register("lpr_sksopr") |
recnum |
1996–2018 | c_opr, d_odto |
| LMDB | read_register("lmdb") |
pnr |
Approx. 1994+ | atc, eksd |
| UDDA | read_register("udda") |
pnr |
All years | hfaudd, aar |
| FAIK | read_register("faik") |
familie_id |
All years | famaekvivadisp_13 |
| AKM | read_register("akm") |
pnr |
All years | socio13, aar |
| DBSO | read_register("dbso") |
pnr |
2010+ | datoper_prim, surgery flags |
| OSDC | readRDS(path_dm_pop) |
PNR → rename to pnr |
1977–2022 | diabetes_type, do_dm |
| Laboratory results | read_register("laboratorieproevesvar_") |
pnr |
Approx. 1994+ | npu, samplingdato, samplevalue (character) |
Critical notes
DODSAARS and DOD - deaths are split across two registers: the date of death is not in one place. dodsaars is in cleaned-data but covers only ~1970–2001 (date of death in d_dodsdto). Deaths after 2001 are in DOD (date of death in doddato, covering ~2001–2024), which is not in cleaned-data - it requires extraction from the raw SAS file via your data manager. Both join on pnr, and you need both for full coverage over a modern study period.
Why it matters: if you run on dodsaars only, everyone who dies after 2001 is treated as alive. That skews censoring and matching in 01_build_cohorts.R. See pitfall 1.
LPR3 - duplicate risk: lpr_a_kontakt and lpr_a_diagnose contain data from two formats (LPR_F and LPR_A). Always filter on lprindberetningssystem == "LPR3". See pitfall 5.
Laboratory results - use only one source: laboratorieproevesvar_ (>2.2 billion rows) replaces lab_forsker/lab_dm_forsker. The old files still exist and cover the same data - use only one to avoid duplicates. Because the register is so large, semi_join(tibble(pnr = kohort$pnr), by = "pnr") and select() before collect() are essential. Two things to watch when extracting:
- Tests are identified by NPU codes in the
npucolumn - filter on the NPU codes your analysis needs. samplevalueis a character column - it can contain text like “not detected” or “negative”, not just numbers. Convert with care (as.numeric()returnsNAon text values).
See pitfall 6 - Laboratory results for a code example.
procedurer_kirurgi: dw_ek_kontakt is NA for all rows on DST. Join to lpr_a_kontakt via dw_ek_forloeb to fetch pnr.
DBSO: The identifier column is cpr in raw parquet - renamed to pnr by 00_prepare_dbso.R. All code uses pnr after that.
OSDC: PNR is uppercase in the raw file - rename with rename(pnr = PNR) after loading.
Loading templates
fastreg replaces dstDataPrep on DARTER. Registers are now loaded with fastreg::read_register("name") - the same function as in the general guide. If you have been using dstDataPrep::load_database(), that code still runs, but write new code with fastreg. You just point fastreg at DARTER’s parquet folder once per script - options(fastreg.project_workdata_dir = path_parquet_reg) - see “Didn’t convert the data yourself?” in Phase 4. Then read_register("bef") works by name.
library(fastreg) # read_register() - access to DST registers
library(dplyr) # rename_with, rename, left_join, select
# Standard register - via read_register:
bef <- read_register("bef") %>% rename_with(tolower) # lazy connection; lowercase columns
# Psychiatric LPR2 - requires renaming v_cpr and k_recnum:
psyk_adm <- read_register("t_psyk_adm") %>%
rename_with(tolower) %>% # lowercase columns
rename(pnr = v_cpr, recnum = k_recnum) # v_cpr → pnr; k_recnum → recnum
# DBSO - parquet-external (converted from SAS via 00_prepare_dbso.R):
dbso <- read_register("dbso") %>% rename_with(tolower) # lazy connection
# OSDC - RDS file with pre-computed diabetes classification:
dm_pop <- readRDS(path_dm_pop) %>% rename(pnr = PNR) # PNR is uppercase in raw file - rename
# LPR3 procedures - join via dw_ek_forloeb (NOT dw_ek_kontakt - is NA for all rows):
proc <- read_register("procedurer_kirurgi") %>%
rename_with(tolower) %>% # lowercase columns
left_join(
read_register("lpr_a_kontakt") %>%
rename_with(tolower) %>%
select(dw_ek_forloeb, pnr), # fetch pnr via the forloeb key
by = "dw_ek_forloeb" # join key - dw_ek_kontakt does not work
)
# proc is still lazy - add filter() and collect() before useSee also
- Overview of registers: full confirmed column names
- DARTER pitfalls: project-specific issues