Register-based research at DST

Who is this guide for, and what will you learn?

Published

July 21, 2026

This guide is for anyone working with register-based research at DST.

Assumptions The examples assume your registers are stored as parquet, and they recommend fastreg for reading a register by name (read_register("bef")). If your data is still in SAS, convert it once - see Phase 4 - Convert SAS to parquet. Not using fastreg? The same parquet opens with open_dataset() (arrow), shown alongside throughout.

Where do you start?

New to register research?

Start at phase 1 and follow the phases in order - a little R experience makes the start easier.

→ Start at phase 1

Already know R?

Skip the introductory phases and go straight to the server, files and extractions.

→ Jump to phase 3

Working on DARTER?

Project-specific setup steps and guidance.

→ DARTER - overview and pipeline

Looking for something specific? Use the search box in the top right - it searches across the entire guide.

The phases of the guide

The guide is built as 14 phases. The roadmap shows the natural path through, from planning to export; the table below gives a quick overview. You don’t have to read everything in order - the look-up pages (functions, pitfalls, overview of registers etc.) can be visited as you go.

%%{init: {
  "themeVariables": {
    "fontSize": "24px"
  },
  "flowchart": {
    "nodeSpacing": 23,
    "rankSpacing": 40
  }
}}%%
flowchart LR
    S1["①  Get started<br>1 · 2 · 3"] --> S2["②  Get data<br>4 · 5 · 6 · 7 · 8"] --> S3["③  Build the study<br>9 · 10"] --> S4["④  Extract variables & assemble<br>11 · 12"] --> S5["⑤  Analyse & finish<br>13 · 14"]
    O1["Functions: overview"] ~~~ O2["Format tables"] ~~~ O3["DST pitfalls"] ~~~ O4["Overview of registers"] ~~~ O5["Good code practice"]
    O6["Algorithms & special packages"] ~~~ O7["Learning resources"] ~~~ O8["DARTER - Project 708421"]
    classDef stage fill:#eaf2fb,stroke:#4a78b5,stroke-width:1px,color:#173a5e;
    classDef ref fill:#f6f6f6,stroke:#aaaaaa,color:#555555;
    class S1,S2,S3,S4,S5 stage
    class O1,O2,O3,O4,O5,O6,O7,O8 ref

Phase	Contents
1 - Plan your study	Research question, key concepts and data model
2 - R: the bare essentials	The minimum of R you need to get going
3 - Log in to DST	Access to the server and the first overview
4 - File types and loading	Parquet and SAS - formats and conversion
5 - Extracting data step by step	The universal extraction pattern: read_register/open_dataset → filter → collect
6 - First extraction	Your first real extraction with synthetic data
7 - Inspect your data	Check structure, types and distributions before analysis
8 - Find your registers	Find the right registers for exposure, outcome and covariates
9 - Understand LPR	LPR2/LPR3 and ICD codes
10 - Build your study population	Cohort, index date, in/exclusion, censoring and special designs
11 - Extract variables	Extract outcomes, socioeconomics and comorbidity - filtered to the cohort
12 - Assemble and prepare the dataset	Joins, pivots and handling missing data
13 - Analysis	Tables, figures, regression, time-to-event, rates and sensitivity analyses
14 - Export and repatriation	Get your results safely out of DST

Look-ups - use as you go

These pages are not part of the numbered guide - jump here when you have a specific question.

Page	Use it when you want to…
Functions: overview	look up a single function - what do `filter()`, `collect()`, `%>%`, `left_join()` do?
Format tables	translate codes to text (municipality, education and employment codes) with DST’s SAS format files
DST pitfalls	check the errors that most often cost time and give uninformative messages
Overview of registers	find the confirmed column names, types and join keys for each register
Good code practice	structure, naming and reproducible code
Algorithms & special packages	ready-made algorithms (OSDC, NMI) to derive variables
Learning resources	find courses, books and references for R, epidemiology and statistics