Skip to contents

This document describes the flow of functions and objects within the package, specifically within the main exposed function classify_diabetes(). It shows the data sources and how they enter or are used in the function as well as how the different internal functions and logic are connected to each other. A high-level overview of the flow is shown in the diagram below.

Flow of functions, as well as their required input registers, for classifying diabetes status using the osdc package. Light blue and orange boxes represent filtering functions (inclusion and exclusion events, respectively).

The sections below are split into functions for inclusion and exclusion as well as functions for determining the final diagnosis date and eventual classification of type 1 and type 2 diabetes.

Inclusion events

include_diabetes_diagnoses()

See ?include_diabetes_diagnoses for more information.

include_podiatrist_services()

See ?include_podiatrist_services for more information.

include_hba1c()

See ?include_hba1c for more information.

include_gld_purchases()

See ?include_gld_purchases for more information.

Exclusion events

exclude_potential_pcos()

See ?exclude_potential_pcos for more information.

exclude_pregnancy()

See ?exclude_pregnancy for more information.

Joining inclusions and exclusions

join_inclusions()

See ?join_inclusions for more information.

Create inclusion dates

#' Create the final diagnosis date based on all the inclusion event types.
#'
#' The function `create_inclusion_dates()` takes the output from `join_inclusions()`
#' and defines the final diagnosis date based on all the inclusion event types.
#' Keeps only those with 2 or more recorded inclusion events, regardless of the
#' type of these events (e.g. two elevated HbA1c tests will lead to inclusion as
#' well as one elevated HbA1c test followed by a purchase of glucose-lowering
#' drugs).
#'
#' @param inclusions Output from [join_inclusions()].
#' @param stable_inclusion_start_date The date from when the inclusion events
#'    from all sources are considered more 'stable' (e.g. time after the change
#'    in how medication drugs are labeled and how doctors actually regularly
#'    input the new change into the database).
#'
#' @returns The same type as the input data, default as a [tibble::tibble()],
#'   along with the `purchase_date`, and `atc` columns from
#'   `exclude_pregnancy()`, and the `n_t1d_endocrinology`,
#'   `n_t2d_endocrinology`, `n_t1d_medical`, and `n_t2d_medical` columns from
#'   `include_diabetes_diagnoses()`. It also creates two new columns:
#'
#'   - `raw_inclusion_date`: Date of inclusion, which is the second
#'      earliest recorded event.
#'   - `stable_inclusion_date`: Date of inclusion of individuals included
#'      at least one year after the incorporation of inclusions based on
#'      glucose-lowering drug data (1998 onwards when using National Patient
#'      Register data for censoring of gestational diabetes). Limits the
#'      included cohort to only individuals with a valid date of inclusion
#'      (and thereby valid age at inclusion & duration of diabetes).
#'
#' @keywords internal
#' @inherit algorithm seealso
create_inclusion_dates <- function(inclusions, stable_inclusion_start_date = "1998-01-01") {
  inclusions |>
    # Drop earliest date so only those with two or more events are included.
    dplyr::filter(.data$dates != min(.data$dates, na.rm = TRUE), .by = "pnr") |>
    dplyr::mutate(
      # Earliest date in the rows for each individual.
      raw_inclusion_date = min(.data$dates, na.rm = TRUE),
      stable_inclusion_date = dplyr::if_else(
        .data$raw_inclusion_date < lubridate::as_date(stable_inclusion_start_date),
        NA,
        .data$raw_inclusion_date
      ),
      .by = "pnr"
    ) |>
    dplyr::select(
      "pnr",
      "raw_inclusion_date",
      "stable_inclusion_date",

      # From `exclude_pregnancy()` via the GLD purchases
      # TODO: this might need to be renamed in a previous step, rather than here.
      "purchase_date" = "date",
      "atc",

      # From `include_diabetes_diagnoses()`
      "n_t1d_endocrinology",
      "n_t2d_endocrinology",
      "n_t1d_medical",
      "n_t2d_medical"
    )
}