#' Create the final diagnosis date based on all the inclusion event types.
#'
#' The function `create_inclusion_dates()` takes the output from `join_inclusions()`
#' and defines the final diagnosis date based on all the inclusion event types.
#' Keeps only those with 2 or more recorded inclusion events, regardless of the
#' type of these events (e.g. two elevated HbA1c tests will lead to inclusion as
#' well as one elevated HbA1c test followed by a purchase of glucose-lowering
#' drugs).
#'
#' @param inclusions Output from [join_inclusions()].
#' @param stable_inclusion_start_date The date from when the inclusion events
#' from all sources are considered more 'stable' (e.g. time after the change
#' in how medication drugs are labeled and how doctors actually regularly
#' input the new change into the database).
#'
#' @returns The same type as the input data, default as a [tibble::tibble()],
#' along with the `purchase_date`, and `atc` columns from
#' `exclude_pregnancy()`, and the `n_t1d_endocrinology`,
#' `n_t2d_endocrinology`, `n_t1d_medical`, and `n_t2d_medical` columns from
#' `include_diabetes_diagnoses()`. It also creates two new columns:
#'
#' - `raw_inclusion_date`: Date of inclusion, which is the second
#' earliest recorded event.
#' - `stable_inclusion_date`: Date of inclusion of individuals included
#' at least one year after the incorporation of inclusions based on
#' glucose-lowering drug data (1998 onwards when using National Patient
#' Register data for censoring of gestational diabetes). Limits the
#' included cohort to only individuals with a valid date of inclusion
#' (and thereby valid age at inclusion & duration of diabetes).
#'
#' @keywords internal
#' @inherit algorithm seealso
create_inclusion_dates <- function(inclusions, stable_inclusion_start_date = "1998-01-01") {
inclusions |>
# Drop earliest date so only those with two or more events are included.
dplyr::filter(.data$dates != min(.data$dates, na.rm = TRUE), .by = "pnr") |>
dplyr::mutate(
# Earliest date in the rows for each individual.
raw_inclusion_date = min(.data$dates, na.rm = TRUE),
stable_inclusion_date = dplyr::if_else(
.data$raw_inclusion_date < lubridate::as_date(stable_inclusion_start_date),
NA,
.data$raw_inclusion_date
),
.by = "pnr"
) |>
dplyr::select(
"pnr",
"raw_inclusion_date",
"stable_inclusion_date",
# From `exclude_pregnancy()` via the GLD purchases
# TODO: this might need to be renamed in a previous step, rather than here.
"purchase_date" = "date",
"atc",
# From `include_diabetes_diagnoses()`
"n_t1d_endocrinology",
"n_t2d_endocrinology",
"n_t1d_medical",
"n_t2d_medical"
)
}
This document describes the flow of functions and objects within the package, specifically within the main exposed function classify_diabetes()
. It shows the data sources and how they enter or are used in the function as well as how the different internal functions and logic are connected to each other. A high-level overview of the flow is shown in the diagram below.
The sections below are split into functions for inclusion and exclusion as well as functions for determining the final diagnosis date and eventual classification of type 1 and type 2 diabetes.
Inclusion events
-
prepare_lpr2()
: See?prepare_lpr2
for more information. -
prepare_lpr3()
: See?prepare_lpr3
for more information.
include_diabetes_diagnoses()
See ?include_diabetes_diagnoses
for more information.
include_podiatrist_services()
See ?include_podiatrist_services
for more information.
include_hba1c()
See ?include_hba1c
for more information.
include_gld_purchases()
See ?include_gld_purchases
for more information.
Exclusion events
exclude_potential_pcos()
See ?exclude_potential_pcos
for more information.
exclude_pregnancy()
See ?exclude_pregnancy
for more information.
Joining inclusions and exclusions
join_inclusions()
See ?join_inclusions
for more information.