%%{init: {
"themeVariables": {
"fontSize": "24px"
},
"flowchart": {
"nodeSpacing": 23,
"rankSpacing": 40
}
}}%%
flowchart LR
S1["① Get started<br>1 · 2 · 3"] --> S2["② Get data<br>4 · 5 · 6 · 7 · 8"] --> S3["③ Build the study<br>9 · 10"] --> S4["④ Extract variables & assemble<br>11 · 12"] --> S5["⑤ Analyse & finish<br>13 · 14"]
O1["Functions: overview"] ~~~ O2["Format tables"] ~~~ O3["DST pitfalls"] ~~~ O4["Overview of registers"] ~~~ O5["Good code practice"]
O6["Algorithms & special packages"] ~~~ O7["Learning resources"] ~~~ O8["DARTER - Project 708421"]
classDef stage fill:#eaf2fb,stroke:#4a78b5,stroke-width:1px,color:#173a5e;
classDef ref fill:#f6f6f6,stroke:#aaaaaa,color:#555555;
class S1,S2,S3,S4,S5 stage
class O1,O2,O3,O4,O5,O6,O7,O8 ref
Register-based research at DST
Who is this guide for, and what will you learn?
This guide is for anyone working with register-based research at DST.
Assumptions The examples assume your registers are stored as parquet, and they recommend fastreg for reading a register by name (read_register("bef")). If your data is still in SAS, convert it once - see Phase 4 - Convert SAS to parquet. Not using fastreg? The same parquet opens with open_dataset() (arrow), shown alongside throughout.
Where do you start?
New to register research?
Start at phase 1 and follow the phases in order - a little R experience makes the start easier.
Already know R?
Skip the introductory phases and go straight to the server, files and extractions.
Working on DARTER?
Project-specific setup steps and guidance.
Looking for something specific? Use the search box in the top right - it searches across the entire guide.
The phases of the guide
The guide is built as 14 phases. The roadmap shows the natural path through, from planning to export; the table below gives a quick overview. You don’t have to read everything in order - the look-up pages (functions, pitfalls, overview of registers etc.) can be visited as you go.
| Phase | Contents |
|---|---|
| 1 - Plan your study | Research question, key concepts and data model |
| 2 - R: the bare essentials | The minimum of R you need to get going |
| 3 - Log in to DST | Access to the server and the first overview |
| 4 - File types and loading | Parquet and SAS - formats and conversion |
| 5 - Extracting data step by step | The universal extraction pattern: read_register/open_dataset → filter → collect |
| 6 - First extraction | Your first real extraction with synthetic data |
| 7 - Inspect your data | Check structure, types and distributions before analysis |
| 8 - Find your registers | Find the right registers for exposure, outcome and covariates |
| 9 - Understand LPR | LPR2/LPR3 and ICD codes |
| 10 - Build your study population | Cohort, index date, in/exclusion, censoring and special designs |
| 11 - Extract variables | Extract outcomes, socioeconomics and comorbidity - filtered to the cohort |
| 12 - Assemble and prepare the dataset | Joins, pivots and handling missing data |
| 13 - Analysis | Tables, figures, regression, time-to-event, rates and sensitivity analyses |
| 14 - Export and repatriation | Get your results safely out of DST |
Look-ups - use as you go
These pages are not part of the numbered guide - jump here when you have a specific question.
| Page | Use it when you want to… |
|---|---|
| Functions: overview | look up a single function - what do filter(), collect(), %>%, left_join() do? |
| Format tables | translate codes to text (municipality, education and employment codes) with DST’s SAS format files |
| DST pitfalls | check the errors that most often cost time and give uninformative messages |
| Overview of registers | find the confirmed column names, types and join keys for each register |
| Good code practice | structure, naming and reproducible code |
| Algorithms & special packages | ready-made algorithms (OSDC, NMI) to derive variables |
| Learning resources | find courses, books and references for R, epidemiology and statistics |