Register-based research at DST

Who is this guide for, and what will you learn?

Published

July 2, 2026

This guide is for anyone working with register-based research at DST.

Important

Assumptions The examples assume your registers are stored as parquet, and they recommend fastreg for reading a register by name (read_register("bef")). If your data is still in SAS, convert it once - see Phase 4 - Convert SAS to parquet. Not using fastreg? The same parquet opens with open_dataset() (arrow), shown alongside throughout.


Where do you start?

New to register research?

Start at phase 1 and follow the phases in order - a little R experience makes the start easier.

Start at phase 1

Already know R?

Skip the introductory phases and go straight to the server, files and extractions.

Jump to phase 3

Working on DARTER?

Project-specific setup steps and guidance.

DARTER - overview and pipeline

Tip

Looking for something specific? Use the search box in the top right - it searches across the entire guide.


The phases of the guide

The guide is built as 14 phases. The roadmap shows the natural path through, from planning to export; the table below gives a quick overview. You don’t have to read everything in order - the look-up pages (functions, pitfalls, overview of registers etc.) can be visited as you go.

%%{init: {
  "themeVariables": {
    "fontSize": "24px"
  },
  "flowchart": {
    "nodeSpacing": 23,
    "rankSpacing": 40
  }
}}%%
flowchart LR
    S1["①  Get started<br>1 · 2 · 3"] --> S2["②  Get data<br>4 · 5 · 6 · 7 · 8"] --> S3["③  Build the study<br>9 · 10"] --> S4["④  Extract variables & assemble<br>11 · 12"] --> S5["⑤  Analyse & finish<br>13 · 14"]
    O1["Functions: overview"] ~~~ O2["Format tables"] ~~~ O3["DST pitfalls"] ~~~ O4["Overview of registers"] ~~~ O5["Good code practice"]
    O6["Algorithms & special packages"] ~~~ O7["Learning resources"] ~~~ O8["DARTER - Project 708421"]
    classDef stage fill:#eaf2fb,stroke:#4a78b5,stroke-width:1px,color:#173a5e;
    classDef ref fill:#f6f6f6,stroke:#aaaaaa,color:#555555;
    class S1,S2,S3,S4,S5 stage
    class O1,O2,O3,O4,O5,O6,O7,O8 ref

Phase Contents
1 - Plan your study Research question, key concepts and data model
2 - R: the bare essentials The minimum of R you need to get going
3 - Log in to DST Access to the server and the first overview
4 - File types and loading Parquet and SAS - formats and conversion
5 - Extracting data step by step The universal extraction pattern: read_register/open_dataset → filter → collect
6 - First extraction Your first real extraction with synthetic data
7 - Inspect your data Check structure, types and distributions before analysis
8 - Find your registers Find the right registers for exposure, outcome and covariates
9 - Understand LPR LPR2/LPR3 and ICD codes
10 - Build your study population Cohort, index date, in/exclusion, censoring and special designs
11 - Extract variables Extract outcomes, socioeconomics and comorbidity - filtered to the cohort
12 - Assemble and prepare the dataset Joins, pivots and handling missing data
13 - Analysis Tables, figures, regression, time-to-event, rates and sensitivity analyses
14 - Export and repatriation Get your results safely out of DST

Look-ups - use as you go

These pages are not part of the numbered guide - jump here when you have a specific question.

Page Use it when you want to…
Functions: overview look up a single function - what do filter(), collect(), %>%, left_join() do?
Format tables translate codes to text (municipality, education and employment codes) with DST’s SAS format files
DST pitfalls check the errors that most often cost time and give uninformative messages
Overview of registers find the confirmed column names, types and join keys for each register
Good code practice structure, naming and reproducible code
Algorithms & special packages ready-made algorithms (OSDC, NMI) to derive variables
Learning resources find courses, books and references for R, epidemiology and statistics
Back to top