Skip to contents

This document describes the sources of data needed by the OSDC algorithm and gives a brief overview of each of these sources and how they might look like. In addition, the final section contains information on how to gain access to these data.

The algorithm uses these Danish registers as input data sources:

Danish registers used in the OSDC algorithm.
Register Abbreviation Years
CPR-registerets befolkningstabel bef 1968 - present
Laegemiddelstatistikregisteret lmdb 1995 - present
Landspatientregisterets administrationstabel (LPR2) lpr_adm 1977 - 2018
Landspatientregisterets diagnosetabel (LPR2) lpr_diag 1977 - 2018
Landspatientregisterets kontakttabel (LPR3) kontakter 2019 - present
Landspatientregisterets diagnosetabel (LPR3) diagnoser 2019 - present
Sygesikringsregisteret sysi 1990 - 2005
Sygesikringsregisteret sssy 2005 - present
Laboratoriedatabasens forskertabel lab_forsker 2011 - present

In a future revision, the algorithm can also use the Danish Medical Birth Register to extend the period of time of valid inclusions further back in time compared to what is possible using obstetric codes from the National Patient Register.

Data required from registers

The following is a list of the variables required from specific registers in order for the package to classify diabetes status:

Register Variable
CPR-registerets befolkningstabel (bef) pnr
CPR-registerets befolkningstabel (bef) koen
CPR-registerets befolkningstabel (bef) foed_dato
Laegemiddelstatistikregisteret (lmdb) pnr
Laegemiddelstatistikregisteret (lmdb) eksd
Laegemiddelstatistikregisteret (lmdb) atc
Laegemiddelstatistikregisteret (lmdb) volume
Laegemiddelstatistikregisteret (lmdb) apk
Laegemiddelstatistikregisteret (lmdb) indo
Laegemiddelstatistikregisteret (lmdb) name
Laegemiddelstatistikregisteret (lmdb) vnr
Landspatientregisterets administrationstabel (LPR2) (lpr_adm) pnr
Landspatientregisterets administrationstabel (LPR2) (lpr_adm) recnum
Landspatientregisterets administrationstabel (LPR2) (lpr_adm) d_inddto
Landspatientregisterets administrationstabel (LPR2) (lpr_adm) c_spec
Landspatientregisterets diagnosetabel (LPR2) (lpr_diag) recnum
Landspatientregisterets diagnosetabel (LPR2) (lpr_diag) c_diag
Landspatientregisterets diagnosetabel (LPR2) (lpr_diag) c_diagtype
Landspatientregisterets kontakttabel (LPR3) (kontakter) cpr
Landspatientregisterets kontakttabel (LPR3) (kontakter) dw_ek_kontakt
Landspatientregisterets kontakttabel (LPR3) (kontakter) dato_start
Landspatientregisterets kontakttabel (LPR3) (kontakter) hovedspeciale_ans
Landspatientregisterets diagnosetabel (LPR3) (diagnoser) dw_ek_kontakt
Landspatientregisterets diagnosetabel (LPR3) (diagnoser) diagnosekode
Landspatientregisterets diagnosetabel (LPR3) (diagnoser) diagnosetype
Landspatientregisterets diagnosetabel (LPR3) (diagnoser) senere_afkraeftet
Sygesikringsregisteret (sysi) pnr
Sygesikringsregisteret (sysi) barnmak
Sygesikringsregisteret (sysi) speciale
Sygesikringsregisteret (sysi) honuge
Sygesikringsregisteret (sssy) pnr
Sygesikringsregisteret (sssy) barnmak
Sygesikringsregisteret (sssy) speciale
Sygesikringsregisteret (sssy) honuge
Laboratoriedatabasens forskertabel (lab_forsker) patient_cpr
Laboratoriedatabasens forskertabel (lab_forsker) samplingdate
Laboratoriedatabasens forskertabel (lab_forsker) analysiscode
Laboratoriedatabasens forskertabel (lab_forsker) value

Expected data structure

This section describes how the data sources listed from the above table are expected to look like when they are input into the OSDC algorithm. We try to mimic as much as possible how the raw data looks like within Denmark Statistics. So since registers are often stored on a per year basis, we don’t expect a year variable in the data itself. If you’ve processed the data so that it has a year variable, you will likely need to do a split-apply-combine approach when using the osdc package. We internally convert all variable names to lower case, and so we present them here in lower case, but case may vary between data sources (and even between years in the same data source) in real data.

A small note about the National Patient Register. It contains several tables and types of data. The algorithm uses only hospital diagnosis data that contained in four registers, which are a pair of two related registers used before (LPR2) and after (LPR3) 2019. So the LPR2 to LPR3 equivalents are lpr_adm to kontakter and lpr_diag to diagnoser. Most of the variables have equivalents as well, except that while c_spec is the LPR2 equivalent of hovedspeciale_ans in LPR3, the specialty values in hovedspeciale_ans are coded as literal specialty names and are different from the padded integer codes that c_spec contains.

On Statistics Denmark, these tables are provided as a mix of separate files for each calendar year prior to 2019 (in LPR2 format) and a single file containing all the data from 2019 onward (LPR3 format). The two tables can be joined with either the recnum variable (LPR2 data) or the dw_ek_kontakt variable (LPR3 data).

bef: CPR-registerets befolkningstabel

Variables and their descriptions within the bef register.
variable_name english_description
pnr Pseudonymised social security number
koen Sex
foed_dato Date of birth
Simulated example of what the data looks like for the bef register.
koen pnr foed_dato
2 295879519622 20190425
1 052484763211 19600411
2 473656557183 19360804
1 539882494700 20081206

lmdb: Laegemiddelstatistikregisteret

Variables and their descriptions within the lmdb register.
variable_name english_description
pnr Pseudonymised social security number
eksd Date of purchase
atc Atc code (fully specified)
volume Number of daily standard doses (ddd) in package
apk Number of packages purchased
indo Indication code
name Drug retail name
vnr Item code
Simulated example of what the data looks like for the lmdb register.
volume pnr eksd atc apk indo name vnr
3.165065 817240734192 20210225 G01AF11 8.970398 5237153 ciprofloxacin and tinidazole 649387
6.424760 058116293902 19981231 J01MB04 5.536929 3448203 alteplase 278156
3.367881 905420250800 20121121 A10BJ06 8.405381 4602414 isosorbide mononitrate 211135
5.964583 875350759905 19980806 M02AX06 6.336097 2681008 oxolamine 681431

lpr_adm: Landspatientregisterets administrationstabel (LPR2)

Variables and their descriptions within the lpr_adm register.
variable_name english_description
pnr Pseudonymised social security number
recnum Record id number
d_inddto Date of admission or initial contact
c_spec Specialty code of department
Simulated example of what the data looks like for the lpr_adm register.
c_spec pnr recnum d_inddto
94 093472546495 538559535929635949 19860419
70 467398454443 886601825091626796 19980105
20 735024785049 219357575157702871 19990719
75 710489216379 762737789078403798 20160924

lpr_diag: Landspatientregisterets diagnosetabel (LPR2)

Variables and their descriptions within the lpr_diag register.
variable_name english_description
recnum Record id number
c_diag Diagnosis code
c_diagtype Diagnosis type
Simulated example of what the data looks like for the lpr_diag register.
c_diagtype recnum c_diag
A 459652251500708444 DO663D
B 055918541881968999 DF1970
A 694686254086961241 DM821
B 253484852508781779 DY00

kontakter: Landspatientregisterets kontakttabel (LPR3)

Variables and their descriptions within the kontakter register.
variable_name english_description
cpr Pseudonymised social security number
dw_ek_kontakt Record id number
dato_start Date of admission or initial contact
hovedspeciale_ans Specialty of department
Simulated example of what the data looks like for the kontakter register.
cpr dw_ek_kontakt dato_start hovedspeciale_ans
479497165788 701164834656651184 19910430 Børne- og ungdomspsykiatri
592500805436 184973744127885455 20020815 Hæmatologi
255670621038 718615486876865290 20121101 Samfundsmedicin
493041743629 974782023363670879 20061218 Klinisk biokemi

diagnoser: Landspatientregisterets diagnosetabel (LPR3)

Variables and their descriptions within the diagnoser register.
variable_name english_description
dw_ek_kontakt Record id number
diagnosekode Diagnosis code
diagnosetype Diagnosis type
senere_afkraeftet Was the diagnosis retracted later?
Simulated example of what the data looks like for the diagnoser register.
dw_ek_kontakt diagnosekode diagnosetype senere_afkraeftet
295967218205190103 DQ600 B Nej
419136150597320977 DL022 B Ja
733999403212878558 DF252 B Ja
263706881734175834 DI252B A Nej

sysi: Sygesikringsregisteret

Variables and their descriptions within the sysi register.
variable_name english_description
pnr Pseudonymised social security number
barnmak Was the service provided to the patient’s child?
speciale Billing code of the service (fully specified)
honuge Year and week of service
Simulated example of what the data looks like for the sysi register.
pnr barnmak speciale honuge
838243007095 0 36617 9219
071881273675 0 40682 9945
526914425412 1 22352 9430
077731816708 0 42123 0309

sssy: Sygesikringsregisteret

Variables and their descriptions within the sssy register.
variable_name english_description
pnr Pseudonymised social security number
barnmak Was the service provided to the patient’s child?
speciale Billing code of the service (fully specified)
honuge Year and week of service
Simulated example of what the data looks like for the sssy register.
pnr barnmak speciale honuge
188473633425 0 00932 0946
018986974306 0 89492 2017
385864780067 0 94916 1010
469537648934 0 14487 1830

lab_forsker: Laboratoriedatabasens forskertabel

Variables and their descriptions within the lab_forsker register.
variable_name english_description
patient_cpr Pseudonymised social security number
samplingdate Date of sampling
analysiscode Npu code of analysis
value Numerical result of analysis
Simulated example of what the data looks like for the lab_forsker register.
patient_cpr samplingdate analysiscode value
995011179485 20160906 NPU59463 55.462586
430295258937 20220301 NPU29771 126.565916
217144925484 20140603 NPU84242 122.893255
010499706603 20150403 NPU03835 7.892775

Getting access to data

The above data is available through Statistics Denmark and the Danish Health Data Authority. Researchers must be affiliated with an approved research institute in Denmark and fees apply. Information on how to gain access to data can be found at https://www.dst.dk/en/TilSalg/Forskningsservice.