Skip to contents

This document describes the sources of data needed by the OSDC algorithm and gives a brief overview of each of these sources and how they might look like. In addition, the final section contains information on how to gain access to these data.

The algorithm uses these Danish registers as input data sources:

Danish registers used in the OSDC algorithm.
Register Abbreviation Years
CPR-registerets befolkningstabel bef 1968 - present
Laegemiddelstatistikregisteret lmdb 1995 - present
Landspatientregisterets administrationstabel (LPR2) lpr_adm 1977 - 2018
Landspatientregisterets diagnosetabel (LPR2) lpr_diag 1977 - 2018
Landspatientregisterets kontakttabel (LPR3) kontakter 2019 - present
Landspatientregisterets diagnosetabel (LPR3) diagnoser 2019 - present
Sygesikringsregisteret sysi 1990 - 2005
Sygesikringsregisteret sssy 2005 - present
Laboratoriedatabasens forskertabel lab_forsker 2011 - present

In a future revision, the algorithm can also use the Danish Medical Birth Register to extend the period of time of valid inclusions further back in time compared to what is possible using obstetric codes from the National Patient Register.

Expected data structure

This section describes how the data sources are expected to look like when they are input into the OSDC algorithm. We try to mimic as much as possible how the raw data looks like within Denmark Statistics. So since registers are often stored on a per year basis, we don’t expect a year variable in the data itself. If you’ve processed the data so that it has a year variable, you will likely need to do a split-apply-combine approach when using the osdc package. We internally convert all variable names to lower case, and so we present them here in lower case, but case may vary between data sources (and even between years in the same data source) in real data.

A small note about the National Patient Register. It contains several tables and types of data. The algorithm uses only hospital diagnosis data that contained in four registers, which are a pair of two related registers used before (LPR2) and after (LPR3) 2019. So the LPR2 to LPR3 equivalents are lpr_adm to kontakter and lpr_diag to diagnoser. Most of the variables have equivalents as well, except that while c_spec is the LPR2 equivalent of hovedspeciale_ans in LPR3, the specialty values in hovedspeciale_ans are coded as literal specialty names and are different from the padded integer codes that c_spec contains.

On Statistics Denmark, these tables are provided as a mix of separate files for each calendar year prior to 2019 (in LPR2 format) and a single file containing all the data from 2019 onward (LPR3 format). The two tables can be joined with either the recnum variable (LPR2 data) or the dw_ek_kontakt variable (LPR3 data).

bef: CPR-registerets befolkningstabel

Variables and their descriptions within the bef register.
variable_name english_description
pnr Pseudonymised social security number
koen Sex
foed_dato Date of birth
Simulated example of what the data looks like for the bef register.
koen pnr foed_dato
1 715894021914 19480312
2 186184788482 20111108
1 569614759560 19310105
2 671114564239 19561228

lmdb: Laegemiddelstatistikregisteret

Variables and their descriptions within the lmdb register.
variable_name english_description
pnr Pseudonymised social security number
eksd Date of purchase
atc Atc code (fully specified)
volume Number of daily standard doses (ddd) in package
apk Number of packages purchased
indo Indication code
name Drug retail name
vnr Item code
Simulated example of what the data looks like for the lmdb register.
volume pnr eksd atc apk indo name vnr
1.262771 072431269871 20030707 A10BK01 9.115327 3874536 silymarin 214334
9.933694 501466021832 20130107 A10BK01 5.398791 3476304 docetaxel 614423
2.224667 401113835050 20000723 J07AP01 3.497905 3118254 sulfaisodimidine 738880
8.817253 277471556351 20220709 A10BK01 3.012397 5445094 enflurane 075769

lpr_adm: Landspatientregisterets administrationstabel (LPR2)

Variables and their descriptions within the lpr_adm register.
variable_name english_description
pnr Pseudonymised social security number
recnum Record id number
d_inddto Date of admission or initial contact
c_spec Specialty code of department
Simulated example of what the data looks like for the lpr_adm register.
c_spec pnr recnum d_inddto
06 977591011522 906527079023828789 20130411
78 853967828101 410845599596979597 19861231
93 611478061504 854214110630761577 20240228
65 809745288639 530441469672734767 20160617

lpr_diag: Landspatientregisterets diagnosetabel (LPR2)

Variables and their descriptions within the lpr_diag register.
variable_name english_description
recnum Record id number
c_diag Diagnosis code
c_diagtype Diagnosis type
Simulated example of what the data looks like for the lpr_diag register.
c_diagtype recnum c_diag
B 996899850200054922 65318
B 444280801566183664 94312
A 698845059228061496 E8769
B 713472898421125476 24502

kontakter: Landspatientregisterets kontakttabel (LPR3)

Variables and their descriptions within the kontakter register.
variable_name english_description
cpr Pseudonymised social security number
dw_ek_kontakt Record id number
dato_start Date of admission or initial contact
hovedspeciale_ans Specialty of department
Simulated example of what the data looks like for the kontakter register.
cpr dw_ek_kontakt dato_start hovedspeciale_ans
445785927048 645923013048394355 20130915 Blandet medicin og kirurgi
485315300759 722426513916023838 19770131 Psykiatri
606692062808 977867012383313133 19900514 Neurologi
974444048558 374005535351814173 20111029 Kirurgi

diagnoser: Landspatientregisterets diagnosetabel (LPR3)

Variables and their descriptions within the diagnoser register.
variable_name english_description
dw_ek_kontakt Record id number
diagnosekode Diagnosis code
diagnosetype Diagnosis type
senere_afkraeftet Was the diagnosis retracted later?
Simulated example of what the data looks like for the diagnoser register.
dw_ek_kontakt diagnosekode diagnosetype senere_afkraeftet
824190727403572180 DF317 B Ja
362995308374931290 DI098C B Nej
320478152080848701 DR670AB B Nej
288739488282692166 DJ649 A Nej

sysi: Sygesikringsregisteret

Variables and their descriptions within the sysi register.
variable_name english_description
pnr Pseudonymised social security number
barnmak Was the service provided to the patient’s child?
speciale Billing code of the service (fully specified)
honuge Week and year of service
Simulated example of what the data looks like for the sysi register.
pnr barnmak speciale honuge
126727999466 0 76255 2992
545136589935 0 57526 1696
972085259165 1 04711 798
772714851027 0 56582 592

sssy: Sygesikringsregisteret

Variables and their descriptions within the sssy register.
variable_name english_description
pnr Pseudonymised social security number
barnmak Was the service provided to the patient’s child?
speciale Billing code of the service (fully specified)
honuge Week and year of service
Simulated example of what the data looks like for the sssy register.
pnr barnmak speciale honuge
474550758751 0 00166 1208
888864864492 1 45582 4720
631240179696 0 84058 4215
378459921780 0 38628 2311

lab_forsker: Laboratoriedatabasens forskertabel

Variables and their descriptions within the lab_forsker register.
variable_name english_description
patient_cpr Pseudonymised social security number
samplingdate Date of sampling
analysiscode Npu code of analysis
value Numerical result of analysis
Simulated example of what the data looks like for the lab_forsker register.
patient_cpr samplingdate analysiscode value
862573096712 20180527 NPU46024 178.81536
866795446259 20110306 NPU27300 16.82538
289845396907 20160324 NPU14914 177.74120
564144128607 20170128 NPU27300 153.17906

Getting access to data

The above data is available through Statistics Denmark and the Danish Health Data Authority. Researchers must be affiliated with an approved research institute in Denmark and fees apply. Information on how to gain access to data can be found at https://www.dst.dk/en/TilSalg/Forskningsservice.