The function flow describes the functions within the package, both internal and user-facing, which data sources they rely on, and how they are connected to each other. First, the functions for classifying diabetes status are presented, followed by the functions for classifying the diabetes type.
Function flow
This results in the functionality flow for classifying diabetes status seen below. This flow can be divided into two sections: extracting the diabetes population and classifying diabetes type which we will detail in the following sections.
Population extraction
In the following sections, we describe the functions used to extract the diabetes population from the Danish registers. The functions are divided into inclusion and exclusion events, and the final diagnosis date is calculated based on these events.
Inclusion events
-
prepare_lpr2()
: See?prepare_lpr2
for more information. -
prepare_lpr3()
: See?prepare_lpr3
for more information.
include_diabetes_diagnosis()
#' Include diabetes diagnoses from LPR2 and LPR3.
#'
#' Uses the hospital contacts from LPR2 and LPR3 to include all dates of diabetes
#' diagnoses to use for inclusion, as well as additional information needed to classify diabetes
#' type. Diabetes diagnoses from both ICD-8 and ICD-10 are included.
#'
#' The output is used as inputs to `join_inclusions()`.
#' This output is passed to the `join_inclusions()` function, where the
#' `dates` variable is used for the final step of the inclusion process.
#' The variables of counts of diabetes type-specific primary diagnoses (the
#' four columns prefixed `n_` above) are carried over for the subsequent
#' classification of diabetes type, initially as inputs to the
#' `get_t1d_primary_diagnosis()` and `get_majority_of_t1d_diagnoses()`
#' functions.
#'
#' @param lpr2 The output from `prepare_lpr2()`.
#' @param lpr3 The output from `prepare_lpr3()`.
#'
#' @return The same type as the input data, default as a [tibble::tibble()],
#' with the following columns and up to two rows per individual:
#'
#' - `pnr`: The personal identification variable.
#' - `dates`: The dates of the first and second hospital diabetes diagnosis.
#' - `n_t1d_endocrinology`: The number of type 1 diabetes-specific primary
#' diagnosis codes from endocrinology departments.
#' - `n_t2d_endocrinology`: The number of type 2 diabetes-specific primary
#' diagnosis codes from endocrinology departments.
#' - `n_t1d_medical`: The number of type 1 diabetes-specific primary
#' diagnosis codes from medical departments.
#' - `n_t2d_medical`: The number of type 2 diabetes-specific primary
#' diagnosis codes from medical departments.
#' - `has_lpr_diabetes_diagnosis`: A logical variable that acts as a helper
#' indicator for use in later functions.
#'
#' @keywords internal
#' @inherit algorithm seealso
#'
#' @examples
#' register_data <- simulate_registers(c("lpr_diag", "lpr_adm", "diagnoser", "kontakter"))
#' include_diabetes_diagnosis(
#' lpr2 = prepare_lpr2(register_data$lpr_diag, register_data$lpr_adm),
#' lpr3 = prepare_lpr3(register_data$diagnoser, register_data$kontakter)
#' )
include_diabetes_diagnosis <- function(lpr2, lpr3) {
# Combine and process the two inputs
lpr2 |>
dplyr::full_join(lpr3, by = dplyr::join_by(.data$pnr)) |>
dplyr::select(
"pnr",
"dates" = "date"
# n_t1d_endocrinology =
# n_t2d_endocrinology =
# n_t1d_medical =
# n_t2d_medical =
) |>
dplyr::mutate(has_lpr_diabetes_diagnosis = TRUE)
}
include_podiatrist_services()
See ?include_podiatrist_services
for more
information.
include_hba1c()
See ?include_hba1c
for more information.
include_gld_purchases()
See ?include_gld_purchases
for more information.
Exclusion events
exclude_potential_pcos()
See ?exclude_potential_pcos
for more information.
exclude_pregnancy()
#' Exclude any pregnancy events that could be gestational diabetes.
#'
#'
#' The function `exclude_pregnancy()` takes the combined outputs from
#' `prepare_lpr2()`, `prepare_lpr3()`, `include_hba1c()`, and
#' `exclude_potential_pcos()` and uses diagnoses from LPR2 or LPR3 to
#' exclude both elevated HbA1c tests and GLD purchases during pregnancy, as
#' these may be due to gestational diabetes, rather than type 1 or type 2
#' diabetes. The aim is to identify pregnancies based on diagnosis codes
#' specific to pregnancy-ending events (e.g. live births or miscarriages),
#' and then use the dates of these events to remove inclusion events in the
#' preceding months that may be related to gestational diabetes (e.g.
#' elevated HbA1c tests or purchases of glucose-lowering drugs during
#' pregnancy).
#'
#' After these exclusion functions have been applied, the output serves as
#' inputs to two sets of functions:
#'
#' 1. The censored HbA1c and GLD data are passed to the
#' `join_inclusions()` function for the final step of the inclusion
#' process.
#' 2. the censored GLD data is passed to the
#' `get_only_insulin_purchases()`,
#' `get_insulin_purchases_within_180_days()`, and
#' `get_insulin_is_two_thirds_of_gld_doses()` helper functions for the
#' classification of diabetes type.
#'
#' @param excluded_pcos Output from `exclude_potential_pcos()`.
#' @param pregnancy_dates Output from `get_pregnancy_dates()`.
#' @param included_hba1c Output from `include_hba1c()`.
#'
#' @returns The same type as the input data, default as a [tibble::tibble()].
#' Has the same output data as the input `excluded_potential_pcos()`, except
#' for a helper logical variable `no_pregnancy` that is used in later functions.
#' @keywords internal
#' @inherit algorithm seealso
#'
#' @examples
#' register_data <- simulate_registers(c(
#' "lmdb", "bef", "lpr_diag", "lpr_adm",
#' "diagnoser", "kontakter", "lab_forsker"
#' ))
#' lpr2 <- prepare_lpr2(register_data$lpr_diag, register_data$lpr_adm)
#' lpr3 <- join_lpr3(register_data$diagnoser, register_data$kontakter)
#' lmdb |>
#' include_gld_purchases() |>
#' exclude_potential_pcos(register_data$bef) |>
#' exclude_pregnancy(
#' get_pregnancy_dates(lpr2, lpr3),
#' include_hba1c(register_data$lab_forsker)
#' )
exclude_pregnancy <- function(excluded_pcos, pregnancy_dates, included_hba1c) {
# Filter using the algorithm for pregnancy
excluded_pcos |>
# Exclude those who are not pregnant.
dplyr::full_join(pregnancy_dates, by = dplyr::join_by(.data$pnr)) |>
dplyr::full_join(included_hba1c, by = dplyr::join_by(.data$pnr)) |>
# Filtering here...
dplyr::mutate(no_pregnancy = TRUE)
}
#' Simple function to get only the pregnancy event dates.
#'
#' @param lpr2 Output from `prepare_lpr2()`.
#' @param lpr3 Output from `prepare_lpr3()`.
#'
#' @returns The same type as the input data, default as a [tibble::tibble()].
#' @keywords internal
#' @inherit algorithm seealso
#'
#' @examples
#' register_data <- simulate_registers(c("lpr_diag", "lpr_adm", "diagnoser", "kontakter"), 100)
#' lpr2 <- prepare_lpr2(register_data$lpr_diag, register_data$lpr_adm)
#' lpr3 <- prepare_lpr3(register_data$diagnoser, register_data$kontakter)
#' get_pregnancy_dates(lpr2, lpr3)
get_pregnancy_dates <- function(lpr2, lpr3) {
# Filter using the algorithm for pregnancy
lpr2 |>
dplyr::full_join(lpr3, by = dplyr::join_by(pnr)) |>
dplyr::filter(has_pregnancy_events) |>
dplyr::select(
pnr,
pregnancy_event_date = date
)
}
Joining inclusions and exclusions
join_inclusions()
#' Join included events.
#'
#' This function joins the outputs from all the inclusion and exclusion
#' functions, by `pnr` and `dates`. Input datasets:
#'
#' - `included_diabetes_diagnoses`: Dates are the first and second hospital diabetes diagnosis.
#' - `included_podiatrist_services`: Dates are the first and second diabetes-specific podiatrist record.
#' - `hba1c_censored_pregnancy`: Dates are the first and second elevated HbA1c test results (after censoring potential gestational diabetes).
#' - `gld_censored_pcos_pregnancy`: Dates are the first and second purchase of a glucose-lowering drug (after censoring potential polycystic ovary syndrome and gestational diabetes).
#'
#' @param included_diabetes_diagnoses Output from [include_diabetes_diagnoses()].
#' @param included_podiatrist_services Output from [include_podiatrist_services()].
#' @param hba1c_censored_pregnancy Output from [exclude_pregnancy()] when given `hba1c` data.
#' @param gld_censored_pcos_pregnancy Output from [exclude_pregnancy()] when given `gld_censored_pcos` data.
#'
#' @returns The same type as the input data, default as a [tibble::tibble()],
#' with the joined columns from the output of [include_diabetes_diagnoses()],
#' [include_podiatrist_services()] and [exclude_pregnancy()]. There will be
#' 1-8 rows per `pnr`.
#' @keyword internal
#' @inherit algorithm seealso
join_inclusions <- function(
included_diabetes_diagnoses,
included_podiatrist_services,
hba1c_censored_pregnancy,
gld_censored_pcos_pregnancy
) {
# Combine the outputs from the inclusion and exclusion events
purrr::reduce(
list(
included_diabetes_diagnoses,
included_podiatrist_services,
excluded_pregnancy
),
# This joins *only* by pnr and dates. If datasets have the same column
# names, they will be renamed to differentiate them.
# TODO: We may need to ensure that no two datasets have the same columns.
\(x, y) dplyr::full_join(x, y, by = dplyr::join_by(.data$pnr, .data$dates))
)
}
Create inclusion dates
#' Create the final diagnosis date based on all the inclusion event types.
#'
#' The function `create_inclusion_dates()` takes the output from `join_inclusions()`
#' and defines the final diagnosis date based on all the inclusion event types.
#' Keeps only those with 2 or more recorded inclusion events, regardless of the
#' type of these events (e.g. two elevated HbA1c tests will lead to inclusion as
#' well as one elevated HbA1c test followed by a purchase of glucose-lowering
#' drugs).
#'
#' @param inclusions Output from [join_inclusions()].
#' @param stable_inclusion_start_date The date from when the inclusion events
#' from all sources are considered more 'stable' (e.g. time after the change
#' in how medication drugs are labeled and how doctors actually regularly
#' input the new change into the database).
#'
#' @returns The same type as the input data, default as a [tibble::tibble()],
#' along with the `purchase_date`, `atc`, `contained_doses` columns from
#' `exclude_pregnancy()`, and the `n_t1d_endocrinology`,
#' `n_t2d_endocrinology`, `n_t1d_medical`, and `n_t2d_medical` columns from
#' `include_diabetes_diagnoses()`. It also creates two new columns:
#'
#' - `raw_inclusion_date`: Date of inclusion, which is the second
#' earliest recorded event.
#' - `stable_inclusion_date`: Date of inclusion of individuals included
#' at least one year after the incorporation of inclusions based on
#' glucose-lowering drug data (1998 onwards when using National Patient
#' Register data for censoring of gestational diabetes). Limits the
#' included cohort to only individuals with a valid date of inclusion
#' (and thereby valid age at inclusion & duration of diabetes).
#'
#' @keywords internal
#' @inherit algorithm seealso
create_inclusion_dates <- function(inclusions, stable_inclusion_start_date = "1998-01-01") {
inclusions |>
# TODO: May need to consider more efficient ways than using group by.
dplyr::group_by(.data$pnr) |>
# Drop earliest date so only those with two or more events are included.
dplyr::filter(.data$dates != min(.data$dates, na.rm = TRUE)) |>
dplyr::mutate(
# Earliest date in the rows for each individual.
raw_inclusion_date = min(.data$dates, na.rm = TRUE),
stable_inclusion_date = dplyr::if_else(
.data$raw_inclusion_date < lubridate::as_date(stable_inclusion_start_date),
NA,
.data$raw_inclusion_date
)
) |>
dplyr::ungroup() |>
dplyr::select(
"pnr",
"raw_inclusion_date",
"stable_inclusion_date",
# From `exclude_pregnancy()` via the GLD purchases
# TODO: this might need to be renamed in a previous step, rather than here.
"purchase_date" = "date",
"atc",
"contained_doses",
# From `include_diabetes_diagnoses()`
"n_t1d_endocrinology",
"n_t2d_endocrinology",
"n_t1d_medical",
"n_t2d_medical"
)
}
Classifying the diabetes type
The next step of the OSDC algorithm classifies individuals from the
extracted diabetes population as having either T1D or T2D. As described
in the vignette("design")
, individuals not classified as
T1D cases are classified as T2D cases.
As the diabetes type classification incorporates an evaluation of the
time from diagnosis/inclusion to first subsequent purchase of insulin,
the get_diabetes_type()
function has to take the date of
diagnosis and all purchases of GLD drugs (after censoring) as inputs. In
addition, information on diabetes type-specific primary diagnoses from
hospitals is also a requirement.
Thus, the function takes the following inputs from
get_inclusion_date()
, exclude_pregnancy()
, and
include_diabetes_diagnoses()
:
- From
get_inclusion_date()
: Information on date of diagnosis of diabetespnr
raw_inclusion_date
stable_inclusion_date
- From
exclude_pregnancy()
: Information on historic GLD purchases:-
pnr
: identifier variable -
date
: dates of all purchases of GLD. -
atc
: type of drug -
contained_doses
: defined daily doses of drug contained in purchase
-
- From
include_diabetes_diagnoses()
: Information on diabetes type-specific primary diagnoses from hospitals:-
pnr
: identifier variable -
n_t1d_endocrinology
: number of type 1 diabetes-specific primary diagnosis codes from endocrinological departments -
n_t2d_endocrinology
: number of type 2 diabetes-specific primary diagnosis codes from endocrinological departments -
n_t1d_medical
: number of type 1 diabetes-specific primary diagnosis codes from medical departments -
n_t2d_medical
: number of type 2 diabetes-specific primary diagnosis codes from medical departments
-
For each pnr
number, several helper functions are
applied to these inputs to extract additional information from the
censored GLD data and diagnoses to use for classification of diabetes
type. All of these return a single value (TRUE
, otherwise
FALSE
) for each individual:
-
get_only_insulin_purchases()
:- Inputs passed from
exclude_pregnancy()
:atc
- Outputs:
- only_insulin_purchases =
TRUE
if no purchases withatc
starting with “A10A” are present
- only_insulin_purchases =
- Inputs passed from
-
get_insulin_purchases_within_180_days()
- Inputs passed from
exclude_pregnancy()
:-
date
&atc
-
- Inputs passed from
get_inclusion_date()
:raw_inclusion_date
- Outputs:
TRUE
If any purchases withatc
starting with “A10A” have adate
between 0 and 180 days higher thanraw_inclusion_date
- Inputs passed from
-
get_insulin_is_two_thirds_of_gld_doses()
- Inputs passed from
exclude_pregnancy()
:-
contained_doses
&atc
-
- Outputs:
TRUE
If the sum ofcontained_doses
of rows ofatc
starting with “A10A” (except “A10AE5”) is at least twice the sum ofcontained_doses
of rows ofatc
starting with “A10B” or “A10AE5”
- Inputs passed from
-
get_any_t1d_primary_diagnoses()
:- Inputs passed from
include_diabetes_diagnoses()
:-
n_t1d_endocrinology
&n_t1d_medical
-
- Outputs:
TRUE
if the combined sum of the inputs is 1 or above.
- Inputs passed from
-
get_type_diagnoses_from_endocrinology()
:- Inputs passed from
include_diabetes_diagnoses()
:-
n_t1d_endocrinology
,n_t2d_endocrinology
-
- Outputs:
type_diagnoses_from_endocrinology
=TRUE
if the combined sum of the inputs is 1 or above
- Inputs passed from
-
get_type_diagnosis_majority()
:- Inputs passed from
include_diabetes_diagnoses()
:-
n_t1d_endocrinology
,n_t2d_endocrinology
,n_t1d_medical
&n_t2d_medical
-
- Inputs passed from
get_type_diagnoses_from_endocrinology()
:type_diagnoses_from_endocrinology
- Outputs:
TRUE
iftype_diagnoses_from_endocrinology
==TRUE
andn_t1d_endocrinology
is aboven_t2d_endocrinology
. AlsoTRUE
iftype_diagnoses_from_endocrinology
=FALSE
andn_t1d_medical
is aboven_t2d_medical
- Inputs passed from
get_diabetes_type()
evaluates all the outputs from the
helper functions to define diabetes type for each individual. Diabetes
type is classified as “T1D” if:
-
only_insulin_purchases
==TRUE
&any_t1d_primary_diagnoses
==TRUE
- Or
only_insulin_purchases
==FALSE
&any_t1d_primary_diagnoses
==TRUE
&type_diagnosis_majority
==TRUE
&insulin_is_two_thirds_of_gld_doses
==TRUE
&insulin_purchases_within_180_days
==TRUE
get_diabetes_type()
returns a data.frame
with one row per pnr
number and four columns:
pnr
, stable_inclusion_date
,
raw_inclusion_date
& diabetes_type
. This
is the final product of the OSDC algorithm. See the
vignette("design")
for an more detail on the two inclusion
dates and their intended use-cases.
osdc
package.Type 1 classification
The details for the classification of type 1 diabetes is described in
vignette("design")
. To classify whether an individual has
T1D, the OSDC algorithm includes the following criteria:
-
get_t1d_primary_diagnosis()
, which relies on the hospital diagnoses extracted fromlpr_diag
(LPR2) anddiagnoser
(LPR3) in the previous steps. -
get_only_insulin_purchases()
which relies on the GLD purchases from Lægemiddeldatabasen to get patients where all GLD purchases are insulin only. -
get_majority_of_t1d_diagnoses()
(as compared to T2D diagnoses) which again relies on primary hospital diagnoses from LPR. -
get_insulin_purchase_within_180_days()
which relies on both diagnosis from LPR and GLD purchases from Lægemiddeldatabasen. -
get_insulin_is_two_thirds_of_gld_doses
which relies on the GLD purchases from Lægemiddeldatabasen.
Note the following hierarchy in first function above: First, the function checks whether the individual has primary diagnoses from endocrinological specialty. If that’s the case for a given person, the check of whether they have a majority of T1D primary diagnoses are based on data from endocrinological specialty. If that’s not the case, the check will be based on primary diagnoses from medical specialties.