DST pitfalls
10 errors that cost time and produce uninformative or no error messages
This page collects the errors that most frequently catch new users of DST registers. What they have in common: the error messages are either confusing, or there is no error message at all - the result is just silently wrong.
1. dodsaars vs dodsaasg - use the correct death register
There are two registers with similar names:
| Register | Contains | Used for |
|---|---|---|
dodsaars |
Individual death registrations with precise date of death (d_dodsdto) |
Censoring at death |
dodsaasg |
Cause-of-death classification | Only for analysis of cause of death |
dodsaasg does not have the date of death in the correct format and is not the authoritative source for individual death dates.
Check dodsaars coverage in your project guide. dodsaars does not necessarily cover your entire study period - in project 708421 it covers only ~1970–2001 (as of June 2026), and post-2001 deaths require a separate extraction. Other projects may have different coverage.
# CORRECT - replace "path/to/dodsaars/" with your project's parquet path
# DARTER: read_register("dodsaars") %>% rename_with(tolower)
death <- open_dataset("path/to/dodsaars/") %>%
rename_with(tolower) # check coverage in your project guide
death_person <- death %>%
semi_join(tibble(pnr = cohort_pnrs), by = "pnr") %>% # only the cohort's pnr's
select(pnr, death_date = d_dodsdto) %>% # d_dodsdto is the confirmed column
collect()
# WRONG - do not use dodsaasg for censoring dates3. rename_with(tolower) must be called on each register
Raw column names vary by register and year: PNR, pnr, Pnr, V_CPR. If you forget it, semi_join(..., by = "pnr") silently fails with “Column pnr not found” - even though the column is there.
The rule: every open_dataset() or read_register() call ends with %>% rename_with(tolower) as the first step in your pipe. See Extracting data step by step for explanation and example.
4. Date columns are not always in Date format
DST registers store dates in multiple formats - they look the same but behave differently.
| Format | Example | What class() returns |
What to do |
|---|---|---|---|
| Date | 2020-05-15 |
"Date" |
Nothing - can be used directly |
| Character | "2020-05-15" |
"character" |
as.Date(column) |
| Datetime | "2020-05-15 14:32:00" |
"POSIXct" |
as.Date(column) to get only the date part |
| SAS integer | 21990 |
"numeric" |
as.Date(column, origin = "1960-01-01") |
The rule: always check class() on a date column before using it in calculations.
class(lpr_a_kontakt$kont_starttidspunkt) # "POSIXct" - datetime, not Date
# Fix:
mutate(date = as.Date(kont_starttidspunkt))
class(bef$foed_dag) # "Date" - can be used directly5. BEF is a status snapshot - not a live register
BEF is a status register: it records the composition of the population at a given reference time - not continuously. DST’s reference time is ultimo (typically 31 December for an annual snapshot). Since 2008, BEF is also delivered quarterly (March, June, September, December).
**aar == 2020 = 1 January 2020" is a project convention.** In many projects BEF snapshots are renamed soaar == 2020` conventionally refers to the population composition as of 1 January 2020 - but this does not follow from DST’s delivery naming. Confirm the convention in your project guide.
See DST’s official BEF documentation: statistikdokumentation/befolkningen →
This means that a person who dies in June 2020 still appears in the 2020 BEF snapshot.
# ERROR: do not use BEF to check "alive on a specific date"
bef_2020 <- bef %>%
filter(aar == 2020) # includes everyone in the 2020 snapshot
# - including those who die during 2020
# CORRECT: combine with dodsaars to exclude deaths
deaths <- open_dataset("path/to/dodsaars/") %>% # DARTER: read_register("dodsaars")
rename_with(tolower) %>%
semi_join(tibble(pnr = cohort_pnrs), by = "pnr") %>%
select(pnr, d_dodsdto) %>%
collect()
bef_alive <- bef_data %>%
left_join(deaths, by = "pnr") %>%
filter(is.na(d_dodsdto) | d_dodsdto > index_date) # alive at index date6. The “a” in lpr_a_diagnose does not mean A-type diagnoses
The table is called lpr_a_diagnose - the “a” refers to “analysis model” (the LPR_A series introduced in 2025). It does not mean the table only contains A-type (action) diagnoses.
The table contains all diagnosis types: A (action), B (secondary diagnosis) and G (underlying condition). You still need to filter on diag_kode_type:
lpr_a_diagnose %>%
filter(diag_kode_type %in% c("A", "B")) %>% # still necessary
...7. Categorical codes are not consistent across registers
The same variable can have different coding in different registers - different type (numeric vs. character), different values, or both.
In practice you extract demographic variables (sex, age) from BEF and rarely need to compare the same variable in another register. But if you do, always check with table() and class() before using the variable:
table(register_a$koen) # what are the actual values and types?
class(register_a$koen)
table(register_b$koen)
class(register_b$koen)8. !! (bang-bang) forgotten in lazy evaluation
When filtering with a local R vector inside a DuckDB query, you must use !!. Without it, DuckDB looks for a column with that name - and fails silently or with a confusing message.
# Example: a year list against bef (the principle applies to any local R vector)
my_years <- c(2018, 2019, 2020) # local R vector (years, here as an example)
# WRONG - DuckDB looks for a column called "my_years"
bef %>% filter(aar %in% my_years) # error or wrong result
# CORRECT - !! tells DuckDB: "use the local R vector"
bef %>% filter(aar %in% !!my_years)!! is necessary for all local R objects used inside filter(), mutate() etc. on lazy DuckDB connections - typically code or year lists (%in% !!codes, >= !!min_date). If instead you filter on pnr against the whole cohort, use semi_join(tibble(pnr = cohort_pnrs), by = "pnr"): it takes a local table directly and needs no !!. See Functions guide for full explanation.
9. nmi_count ≠ nmi_score
These two variables are not the same and are not interchangeable:
| Variable | What it is | Source |
|---|---|---|
nmi_score |
Weighted comorbidity score - Nordic Multimorbidity Index (Kristensen et al., Clin Epidemiol 2022). 50 predictors with individual weights; lung cancer counts e.g. 19 points, type 2 diabetes counts 2. | See NMI page |
nmi_count |
Simple count of the number of chronic conditions (out of 33 possible) a person has been diagnosed with | Calculated separately |
If you use nmi_count in your regression model instead of nmi_score, you are adjusting for something different than you think - and you get no error message.
10. Immortal time bias - exposure defined using the future
No error message, no warning - just an effect estimate that looks too good. Immortal time bias arises when a person is given follow-up time during which they could not, by construction, have had the outcome. It is the classic register mistake, because register data let you define groups retrospectively, looking back at what eventually happened.
A concrete example. Question: does bariatric surgery lower mortality in people with type 2 diabetes? You take everyone diagnosed with T2D in 2010, split them into a surgery group (had surgery at some point during follow-up) and a no-surgery group, and start counting follow-up for everyone at the diagnosis date.
The trap: to land in the surgery group, a person had to survive long enough to be operated. Say the average wait from diagnosis to surgery is 2 years. Those 2 years are immortal: anyone who died in that window never reached surgery and so fell into the no-surgery group instead. You have handed the surgery group ~2 years of guaranteed-alive person-time and labelled it “surgery” time.
| Group | Deaths | Person-years | Rate (per 1000 py) |
|---|---|---|---|
| Surgery (immortal time counted as surgery time) | 30 | 12,000 | 2.5 |
| Surgery (time correctly aligned) | 30 | 8,000 | 3.8 |
| No surgery | 50 | 13,000 | 3.8 |
The true rates are identical (3.8) - surgery does nothing. But counting the 4,000 immortal person-years as surgery time drops its rate to 2.5 and makes surgery look 34% protective. The “effect” is an artefact of the misaligned time zero, not of surgery.
The fix: align time zero. Eligibility, exposure assignment and start of follow-up must coincide.
- Risk-set (incidence-density) matching: start each person’s follow-up at the moment they become exposed, and assign each comparator the same index date (the core rule in Comparison cohort).
- Treat exposure as time-varying: the person contributes unexposed person-time until surgery, then exposed time after - never exposed time before they were exposed (see Time-varying variables).
Related: defining a baseline covariate from post-index information is the same error in disguise (e.g. the OSDC diabetes type, see OSDC). Whenever a variable is built from the future, ask whether you are conditioning on the person having survived to see it. Background: Hernán & Robins, What If, §3.6 (target trial, time zero).
11. The most common error messages and what they mean
R’s error messages are short and technical - here are the ones you most often encounter in a DST workflow, translated into what they actually mean:
| Error message | Typical cause | Solution |
|---|---|---|
Error: Column 'pnr' not found |
rename_with(tolower) is missing |
Add %>% rename_with(tolower) immediately after read_register() - see pitfall 3 |
Error: object 'my_list' not found |
!! missing in filter() on a lazy connection |
Write filter(aar %in% !!my_list) - see pitfall 8 |
Error: could not find function "read_register" |
library(fastreg) missing |
Add library(fastreg) at the top of the script |
non-numeric argument to binary operator |
Date column is character, not Date |
mutate(date = as.Date(date)) - see pitfall 4 |
Error in filter.default(...) |
Filtering on a lazy object without %>% |
Switch to %>% - see the pipe |
Error: Can't convert ... to ... |
Join on columns of different type (e.g. numeric vs. character) | Use mutate(pnr = as.character(pnr)) to match types |
object of type 'closure' is not subsettable |
A variable name overwrites a function (e.g. data <- ...) |
Use a unique variable name - avoid data, df, c as object names |
The fastest debugging flow - what to do step by step when you see a red error message - is described in Phase 7 - Seeing a red error message?.