Introduction
This vignette describes the function conventions and function flow of the osdc package. The function convention sections go over how we name functions and how we structure them in terms of input and output. The function flow describes the functions within the package, both internal and user-facing, which data sources they rely on, and how they are connected to each other. First, the functions for classifying diabetes status are presented, followed by the functions for classifying the diabetes type.
Function conventions
The below conventions are ideals only, to be used as a guidelines to help with development and understanding of the code. They are not hard rules.
Naming
- First word is an action verb, later words are objects or conditions.
- Exclusion criteria are prefixed with
exclude_
. - Inclusion criteria are prefixed with
include_
. - Helpers that get or extract a condition (e.g., “pregnancy” or “date
of visit”) are prefixed with
get_
. - Helpers that drop or keep a specific condition are prefixed with
drop_
orkeep_
(e.g., “first visit date to maternal care for pregnancy after 40 weeks”). These types of helpers likely are contained in theget_
functions. - Helpers that join registers or output of other functions are
prefixed with
join_
.
Input and output
- Few arguments, with one or two core required argument.
-
include_
functions take a register as the first argument.- One input register database at a time.
-
exclude_
functions can take a register as the first argument or take the output from aninclude_
function. - Second argument can be an output data from another function.
Function flow
The osdc package contains one main function that classifies
individuals into those with either type 1 or type 2 diabetes using the
Danish registers: classify_diabetes()
. This function
classifies those with diabetes (type 1 or 2) based on the Danish
registers described in the vignette("design")
and
vignette("data-sources")
. All data sources are used as
input for this function. The specific inclusion and exclusion details
are also described in the vignette("design")
.
This results in the functionality flow for classifying diabetes status seen below. This flow can be divided into two sections: extracting the diabetes population and classifying diabetes type which we will detail in the following sections.
All functions take a data.frame
type object as input and
outputs the same type of object as the input object (a
data.frame
type). For instance, if the input is a
data.table
object, the output will also be a
data.table
.
Population extraction
In the following sections, we describe the functions used to extract the diabetes population from the Danish registers. The functions are divided into inclusion and exclusion events, and the final diagnosis date is calculated based on these events.
Inclusion events
HbA1c tests above the diagnosis cut-off value (48 mmol/mol or 6.5%)
The function include_hba1c()
uses
lab_forsker
as the input data to extract all events of
HbA1c tests above the diagnosis cut-off value.
Since the HbA1c diagnosis cut-off value depends on the kind of test that is used, the inclusion event is defined as follows:
- For HbA1c IFCC (NPU03835), we include values >= 6.5 %.
- For HbA1c DCCT (NPU27300), we include values >= 48 mmol/mol.
name | logic |
---|---|
hba1c | (analysiscode == ‘NPU27300’ AND value >= 48) OR (analysiscode == ‘NPU03835’ AND value >= 6.5) |
Hospital diagnosis of diabetes
The function include_diabetes_diagnoses()
uses the
hospital contacts from LPR2 and 3 to include all dates of diabetes
diagnoses. Diabetes diagnoses from both ICD 8 and ICD 10 are
included.
This function contains two helper functions:
keep_diabetes_icd10()
keep_diabetes_icd8()
Exclusion events
HbA1c tests and GLD purchases during pregnancy
The function exclude_pregnancy()
uses diagnoses from
LPR2 or LPR3 as input and is used to exclude both HbA1c tests and GLD
purchases during pregnancy.
Internally, this relies on the function
get_pregnancy_dates()
that contains the following three
helper functions:
-
calculate_pregnancy_index_date_for_mc_visits_wo_end_date()
(this might be removed with the inclusion of the birth register) -
get_pregnancy_end_dates()
: Keep maternal care visits with an end date and drop visits between 40 weeks before end date and 12 weeks after end date. -
get_maternal_care_visit_dates_without_end_date()
: Uses the output fromget_pregnancy_end_dates()
which identifies maternal care visits with end dates to derive maternal care visits without end dates. below.
Glucose-lowering brand drugs for weight loss
The function exclude_wld_purchases()
uses lmdb as input
and excludes the brand drugs Saxenda and Wegovy.
Metformin purchases for women below age 40
The function exclude_potential_pcos()
as input to
exclude all purchases of metformin by women below age 40 (i.e., <= 39
years old) at the date of purchase. It relies on bef
as
input.
This function contains two helper functions:
keep_women()
drop_age_40_below()
Get diagnosis date
The function get_diagnosis_date()
combines the outputs
from the inclusion and exclusion functions to get the final diagnosis
date. Initially, it drops the first inclusion and exclusion events from
the function outputs with the helper drop_first_event()
, so
that only those with two or more events are kept. This is then used to
assign an initial diagnosis according to OSDC. Then, all the outputs are
joined together with join_diagnosis_dates()
.
Finally, the dates outside of the data coverage period are dropped
with drop_diagnosis_dates_outside_coverage()
to end with a
final diagnosis date. For details on this censoring based on periods
with insufficient data coverage, see the
vignette("design")
.
Classifying the diabetes type
The next step of the OSDC algorithm classifies individuals from the
extracted diabetes population as having either T1D or T2D. As described
in the vignette("design")
, individuals not classified as
T1D cases are classified as T2D cases.
The output is a data.frame
that includes one row per
individual in the diabetes population: one column with their PNR, two
columns with inclusion dates (one “stable” date and one “raw” date - see
the vignette("design")
for an elaboration on what that
entails), and one column with the diabetes type.
Type 1 classification
The details for the classification of type 1 diabetes is described in
vignette("design")
. To classify whether an individual has
T1D, the OSDC algorithm includes the following criteria:
-
get_t1d_primary_diagnosis()
, which relies on the hospital diagnoses extracted fromlpr_diag
(LPR2) anddiagnoser
(LPR3) in the previous steps. -
get_only_insulin_purchases()
which relies on the GLD purchases from Lægemiddelsdatabasen to get patients where all GLD purchases are insulin only. -
get_majority_of_t1d_diagnoses()
(as compared to T2D diagnoses) which again relies on primary hospital diagnoses from LPR. -
get_insulin_purchase_within_180_days()
which relies on both diagnosis from LPR and GLD purchases from Lægemiddelsdatabasen. -
get_insulin_is_two_thirds_of_gld_doses
which relies on the GLD purchases from Lægemiddelsdatabasen.
Note the following hierarchy in first function above: First, the function checks whether the individual has primary diagnoses from endocrinological specialty. If that’s the case for a given person, the check of whether they have a majority of T1D primary diagnoses are based on data from endocrinological specialty. If that’s not the case, the check will be based on primary diagnoses from medical specialties.
Type 2 classification
As described in the vignette("design")
, individuals not
classified as type 1 cases are classified as type 2 cases.
Output
The output of the OSDC algorithm is a data.frame
which
includes four columns:
- PNR: The pseudonymised social security number of individuals in the diabetes population (one row per individual)
- stable_inclusion_date: The stable inclusion date (i.e., the raw date mutated so only individuals included in the time-period where data coverage is sufficient to make incident cases reliable)1
- raw_inclusion_date: The raw inclusion date (i.e., the date of the second inclusion event as described in the Extracting the diabetes population section above)
- diabetes_type The classified diabetes type
For an example, see below.
PNR | stable_inclusion_date | raw_inclusion_date | diabetes_type |
---|---|---|---|
0000000001 | 2020-01-01 | 2020-01-01 | T1D |
0000000004 | NULL | 1995-04-19 | T2D |
The individuals 0000000001
and 0000000004
have been classified as having diabetes (T1D
and
T2D
, respectively). 0000000004
is classified
as having type 1 diabetes (T1D) with an inclusion date of
2020-01-01
. Since this date is within a time-period of
sufficient data coverage, the column stable_inclusion_date
is populated with the same date as raw_inclusion_date
.
The individual in the second row, 0000000004
is
classified as having type 2 diabetes T2D
with an inclusion
date of 1995-19-04
. Since 1995 is within a time-period of
insufficient data coverage, stable_inclusion_date
is
NULL
. However, raw_inclusion_date
still
contains the inclusion date of this individual.