This document describes the sources of data needed by the OSDC
algorithm and gives a brief overview of each of these sources and how
they might look like. In addition, the final section contains
information on how to gain access to these data.
The algorithm uses these Danish registers as input data sources:
Danish registers used in the OSDC algorithm.
CPR-registerets befolkningstabel |
bef |
1968 - present |
Laegemiddelstatistikregisteret |
lmdb |
1995 - present |
Landspatientregisterets administrationstabel
(LPR2) |
lpr_adm |
1977 - 2018 |
Landspatientregisterets diagnosetabel (LPR2) |
lpr_diag |
1977 - 2018 |
Landspatientregisterets kontakttabel (LPR3) |
kontakter |
2019 - present |
Landspatientregisterets diagnosetabel (LPR3) |
diagnoser |
2019 - present |
Sygesikringsregisteret |
sysi |
1990 - 2005 |
Sygesikringsregisteret |
sssy |
2005 - present |
Laboratoriedatabasens forskertabel |
lab_forsker |
2011 - present |
In a future revision, the algorithm can also use the Danish Medical
Birth Register to extend the period of time of valid inclusions further
back in time compared to what is possible using obstetric codes from the
National Patient Register.
Expected data structure
This section describes how the data sources are expected to look like
when they are input into the OSDC algorithm. We try to mimic as much as
possible how the raw data looks like within Denmark Statistics. So since
registers are often stored on a per year basis, we don’t expect a year
variable in the data itself. If you’ve processed the data so that it has
a year variable, you will likely need to do a split-apply-combine
approach when using the osdc package. We internally convert all variable
names to lower case, and so we present them here in lower case, but case
may vary between data sources (and even between years in the same data
source) in real data.
A small note about the National Patient Register. It contains several
tables and types of data. The algorithm uses only hospital diagnosis
data that contained in four registers, which are a pair of two related
registers used before (LPR2) and after (LPR3) 2019. So the LPR2 to LPR3
equivalents are lpr_adm
to kontakter
and
lpr_diag
to diagnoser
. Most of the variables
have equivalents as well, except that while c_spec
is the
LPR2 equivalent of hovedspeciale_ans
in LPR3, the specialty
values in hovedspeciale_ans
are coded as literal specialty
names and are different from the padded integer codes that
c_spec
contains.
On Statistics Denmark, these tables are provided as a mix of separate
files for each calendar year prior to 2019 (in LPR2 format) and a single
file containing all the data from 2019 onward (LPR3 format). The two
tables can be joined with either the recnum
variable (LPR2
data) or the dw_ek_kontakt
variable (LPR3 data).
bef
: CPR-registerets befolkningstabel
Variables and their descriptions within the bef
register.
pnr |
Pseudonymised social security number |
koen |
Sex |
foed_dato |
Date of birth |
Simulated example of what the data looks like for the
bef
register.
1 |
715894021914 |
19480312 |
2 |
186184788482 |
20111108 |
1 |
569614759560 |
19310105 |
2 |
671114564239 |
19561228 |
lmdb
: Laegemiddelstatistikregisteret
Variables and their descriptions within the lmdb
register.
pnr |
Pseudonymised social security number |
eksd |
Date of purchase |
atc |
Atc code (fully specified) |
volume |
Number of daily standard doses (ddd) in package |
apk |
Number of packages purchased |
indo |
Indication code |
name |
Drug retail name |
vnr |
Item code |
Simulated example of what the data looks like for the
lmdb
register.
1.262771 |
072431269871 |
20030707 |
A10BK01 |
9.115327 |
3874536 |
silymarin |
214334 |
9.933694 |
501466021832 |
20130107 |
A10BK01 |
5.398791 |
3476304 |
docetaxel |
614423 |
2.224667 |
401113835050 |
20000723 |
J07AP01 |
3.497905 |
3118254 |
sulfaisodimidine |
738880 |
8.817253 |
277471556351 |
20220709 |
A10BK01 |
3.012397 |
5445094 |
enflurane |
075769 |
lpr_adm
: Landspatientregisterets administrationstabel
(LPR2)
Variables and their descriptions within the
lpr_adm
register.
pnr |
Pseudonymised social security number |
recnum |
Record id number |
d_inddto |
Date of admission or initial contact |
c_spec |
Specialty code of department |
Simulated example of what the data looks like for the
lpr_adm
register.
06 |
977591011522 |
906527079023828789 |
20130411 |
78 |
853967828101 |
410845599596979597 |
19861231 |
93 |
611478061504 |
854214110630761577 |
20240228 |
65 |
809745288639 |
530441469672734767 |
20160617 |
lpr_diag
: Landspatientregisterets diagnosetabel
(LPR2)
Variables and their descriptions within the
lpr_diag
register.
recnum |
Record id number |
c_diag |
Diagnosis code |
c_diagtype |
Diagnosis type |
Simulated example of what the data looks like for the
lpr_diag
register.
B |
996899850200054922 |
65318 |
B |
444280801566183664 |
94312 |
A |
698845059228061496 |
E8769 |
B |
713472898421125476 |
24502 |
kontakter
: Landspatientregisterets kontakttabel
(LPR3)
Variables and their descriptions within the
kontakter
register.
cpr |
Pseudonymised social security number |
dw_ek_kontakt |
Record id number |
dato_start |
Date of admission or initial contact |
hovedspeciale_ans |
Specialty of department |
Simulated example of what the data looks like for the
kontakter
register.
445785927048 |
645923013048394355 |
20130915 |
Blandet medicin og kirurgi |
485315300759 |
722426513916023838 |
19770131 |
Psykiatri |
606692062808 |
977867012383313133 |
19900514 |
Neurologi |
974444048558 |
374005535351814173 |
20111029 |
Kirurgi |
diagnoser
: Landspatientregisterets diagnosetabel
(LPR3)
Variables and their descriptions within the
diagnoser
register.
dw_ek_kontakt |
Record id number |
diagnosekode |
Diagnosis code |
diagnosetype |
Diagnosis type |
senere_afkraeftet |
Was the diagnosis retracted later? |
Simulated example of what the data looks like for the
diagnoser
register.
824190727403572180 |
DF317 |
B |
Ja |
362995308374931290 |
DI098C |
B |
Nej |
320478152080848701 |
DR670AB |
B |
Nej |
288739488282692166 |
DJ649 |
A |
Nej |
sysi
: Sygesikringsregisteret
Variables and their descriptions within the sysi
register.
pnr |
Pseudonymised social security number |
barnmak |
Was the service provided to the patient’s child? |
speciale |
Billing code of the service (fully specified) |
honuge |
Week and year of service |
Simulated example of what the data looks like for the
sysi
register.
126727999466 |
0 |
76255 |
2992 |
545136589935 |
0 |
57526 |
1696 |
972085259165 |
1 |
04711 |
798 |
772714851027 |
0 |
56582 |
592 |
sssy
: Sygesikringsregisteret
Variables and their descriptions within the sssy
register.
pnr |
Pseudonymised social security number |
barnmak |
Was the service provided to the patient’s child? |
speciale |
Billing code of the service (fully specified) |
honuge |
Week and year of service |
Simulated example of what the data looks like for the
sssy
register.
474550758751 |
0 |
00166 |
1208 |
888864864492 |
1 |
45582 |
4720 |
631240179696 |
0 |
84058 |
4215 |
378459921780 |
0 |
38628 |
2311 |
lab_forsker
: Laboratoriedatabasens forskertabel
Variables and their descriptions within the
lab_forsker
register.
patient_cpr |
Pseudonymised social security number |
samplingdate |
Date of sampling |
analysiscode |
Npu code of analysis |
value |
Numerical result of analysis |
Simulated example of what the data looks like for the
lab_forsker
register.
862573096712 |
20180527 |
NPU46024 |
178.81536 |
866795446259 |
20110306 |
NPU27300 |
16.82538 |
289845396907 |
20160324 |
NPU14914 |
177.74120 |
564144128607 |
20170128 |
NPU27300 |
153.17906 |
Getting access to data
The above data is available through Statistics Denmark and the Danish
Health Data Authority. Researchers must be affiliated with an approved
research institute in Denmark and fees apply. Information on how to gain
access to data can be found at https://www.dst.dk/en/TilSalg/Forskningsservice.