Principles
These are the guiding principles for this package:
- Functionality is as agnostic to data format as possible (e.g. can be used with SQL or Arrow connections, in a data.table format, or as a data.frame).
- Functions have consistent inputs and outputs (e.g. inputs and outputs are the same, regardless of specific conditions).
- Functions have predictable outputs based on inputs (e.g. if an input is a data frame, the output is a data frame).
- Functions have consistent naming based on their action.
- Functions have limited additional arguments.
- Casing of input variables (upper or lower case) is agnostic, all internal variables are lower case, and output variables are lower case.
Use cases
We make these assumptions on how this package will be used, based on our experiences and expectations for use cases:
- Entirely used within the Denmark Statistics or the Danish Health Authority’s servers, since that is where their data are kept.
- Used by researchers within or affiliated with Danish research institutions.
- Used specifically within a Danish register-based context.
Below is a set of “narratives” or “personas” with associated needs that this package aims to fulfill:
- “As a researcher, …”
- “… I want to determine which registers and variables to request from Denmark Statistics and Danish Health Data Authority, so that I am certain I will be able to classify diabetes status of individuals in the registers.”
- “… I want to easily and simply create a dataset that contains data on diabetes status in my population, so that I can begin conducting my research that involves persons with diabetes without having to tinker with coding the correct algorithm to classify them.”
- “… I want to be informed early and in a clear way whether my data fits with the required data type and values, so that I can fix and correct these issues without having to do extensive debugging of the code and/or data.”
Core functionality
This is the list of functionality we aim to have in the osdc package
- Classify individuals type 1 and type 2 diabetes status and create a data frame with that information and the date of onset of diabetes.
- Provide helper functions to check and process individual registers for the variables required to enter into the classifier.
- Provide a list of required variables and registers in order to calculate diabetes status.
- Provide validation helper functions to check that variables match what is expected of the algorithm.
- Provide a common and easily accessible standard for determining diabetes status within the context of research using Danish registers.
Classifier algorithm
A more complete description of the classifier is found in Anders Aasted Isaksen’s PhD Thesis as well as the validation paper (1). The description below is a brief and concise version of those documents.
The algorithm for classifying individuals with diabetes is described below. The overall output of this algorithm is first to classify those with diabetes, then to further classify and check if the individuals might have type 1 diabetes, otherwise classify as type 2 diabetes.
Initial diabetes classification is defined as the second occurrence of any of the listed inclusion events. Wherever possible, all available data for each event is used, except for the purchases of glucose-lowering drugs, since the data on obstetric diagnoses necessary to censor glucose-lowering drug purchases is only complete from 1997 onwards. Inclusion criteria are:
- HbA1c measurements of ≥48 mmol/mol.
- Hospital diagnoses of diabetes.
- Diabetes-specific services received at podiatrist.
- Purchase of glucose-lowering drugs.
Exclusions are:
- HbA1c samples:
- Samples taken during pregnancies, as that could be a potential gestational diabetes mellitus.
- Glucose-lowering drugs:
- Brand drugs for weight loss, e.g. Saxenda.
- Purchases during pregnancies, as that is a potential treatment for gestational diabetes mellitus.
- Metformin for women below age 40, as that could be a treatment for polycystic ovary syndrome.
Classifying type 1 diabetes
Diabetes type is classified as either T1D or T2D based on patterns of purchases of insulin drugs (including analogues) and hospital primary diagnoses of T1D and T2D.
Classification as T1D requires an individual to fulfill either of the following criteria:
- Must have purchased only insulin drugs and never any other type of glucose-lowering drugs, and have at least one diagnosis of T1D
- Must have a majority of T1D diagnoses from endocrinological departments (or from other medical departments, in the absence of contacts to endocrinological departments), and a purchase of insulin within 180 days after onset of diabetes, with insulin contributing at least two thirds of all defined daily doses of glucose-lowering drugs purchased.
In populations generated on a fixed index date (such as the cross-sectional studies associated with this dissertation), individuals classified as T1D cases must have purchased insulin drugs in the last year prior to the index date.
Data required from registers
The following is a list of the variables required from specific registers in order for the package to classify diabetes status:
Register | Variable |
---|---|
CPR-registerets befolkningstabel (bef) | pnr |
CPR-registerets befolkningstabel (bef) | koen |
CPR-registerets befolkningstabel (bef) | foed_dato |
Laegemiddelstatistikregisteret (lmdb) | pnr |
Laegemiddelstatistikregisteret (lmdb) | eksd |
Laegemiddelstatistikregisteret (lmdb) | atc |
Laegemiddelstatistikregisteret (lmdb) | volume |
Laegemiddelstatistikregisteret (lmdb) | apk |
Laegemiddelstatistikregisteret (lmdb) | indo |
Laegemiddelstatistikregisteret (lmdb) | name |
Laegemiddelstatistikregisteret (lmdb) | vnr |
Landspatientregisterets administrationstabel (LPR2) (lpr_adm) | pnr |
Landspatientregisterets administrationstabel (LPR2) (lpr_adm) | recnum |
Landspatientregisterets administrationstabel (LPR2) (lpr_adm) | d_inddto |
Landspatientregisterets administrationstabel (LPR2) (lpr_adm) | c_spec |
Landspatientregisterets diagnosetabel (LPR2) (lpr_diag) | recnum |
Landspatientregisterets diagnosetabel (LPR2) (lpr_diag) | c_diag |
Landspatientregisterets diagnosetabel (LPR2) (lpr_diag) | c_diagtype |
Landspatientregisterets kontakttabel (LPR3) (kontakter) | cpr |
Landspatientregisterets kontakttabel (LPR3) (kontakter) | dw_ek_kontakt |
Landspatientregisterets kontakttabel (LPR3) (kontakter) | dato_start |
Landspatientregisterets kontakttabel (LPR3) (kontakter) | hovedspeciale_ans |
Landspatientregisterets diagnosetabel (LPR3) (diagnoser) | dw_ek_kontakt |
Landspatientregisterets diagnosetabel (LPR3) (diagnoser) | diagnosekode |
Landspatientregisterets diagnosetabel (LPR3) (diagnoser) | diagnosetype |
Landspatientregisterets diagnosetabel (LPR3) (diagnoser) | senere_afkraeftet |
Sygesikringsregisteret (sysi) | pnr |
Sygesikringsregisteret (sysi) | barnmak |
Sygesikringsregisteret (sysi) | speciale |
Sygesikringsregisteret (sysi) | honuge |
Sygesikringsregisteret (sssy) | pnr |
Sygesikringsregisteret (sssy) | barnmak |
Sygesikringsregisteret (sssy) | speciale |
Sygesikringsregisteret (sssy) | honuge |
Laboratoriedatabasens forskertabel (lab_forsker) | patient_cpr |
Laboratoriedatabasens forskertabel (lab_forsker) | samplingdate |
Laboratoriedatabasens forskertabel (lab_forsker) | analysiscode |
Laboratoriedatabasens forskertabel (lab_forsker) | value |