Initial setup after acceptance

Published

December 13, 2024

In order to access and analyze the UK Biobank data, we need to work on the Research Analysis Platform (RAP). The easiest way that we’ve found is to do your work through Git and GitHub on the SDCA GitHub Organization. We’ll explain the reasons in the Working on the RAP page. We expect everyone working on the Steno UK Biobank project to use Git and use Steno’s GitHub Organization account to store their project. We will set up the project for you on the Organization account, so you can start working on your project sooner.

Steps to take before using the RAP

Note

Most of these steps only need to be done once, except for one step that we will indicate below.

Tip

Some of these tasks can be really difficult to understand what’s going on, and that’s ok and totally normal. The very start of a project is often one of the most difficult stages of a project. But if you follow these tasks, you’ll have a solid foundation for doing your work within the special environment of the UK Biobank RAP.

The very first tasks you’ll need to do is to install the ukbAid package. Open RStudio and in the Console run these code:

# Install the pak package if you haven't already.
# install.packages("pak")
pak::pak("steno-aarhus/ukbAid")

After that, we need to make sure that your computer has Git configured properly. In your Console, type out the below code, replacing my name (Luke) and my email with your own:

ukbAid::proj_setup_git_config("Luke W. Johnston", "lwjohnst@gmail.com")

You should get an output showing your user.email and user.name.

Since we will be connecting to GitHub often through RStudio, we need a way to authenticate to GitHub that we are who we say we are. We do that in a relatively easy way by creating a thing called a Personal Access Token (PAT) in GitHub so that Git on your computer knows how to connect your project to GitHub. While in your RStudio project, in the Console type out this function:

usethis::create_github_token()

This will send you to your GitHub account and create a basic PAT for you. Change the token’s description to something like “For UKB project”. Set the “Expiry date” to 90 days (this is a good security feature). Scroll down and click the create token button, which will change pages and you’ll be shown a string of letters starting with ghp_. Copy this token and save it somewhere safe, preferably in a password manager. This token acts a bit like your password but is safer to use than your password. Once you’ve saved it somewhere, go back to RStudio and than run this function in the Console:

gitcreds::gitcreds_set()

Paste the token into the prompt in the Console. If it asks to replace an existing one, select the “yes” option. Doing this is a bit like using the 2FA message with the temporary passcode you get sent whenever you have to open your work’s email or when you use MitID or NemID (in Denmark).

Note

The function above is the one you will need to run more than once (aside from the usethis::creat_github_token() every 90 days). Every time you open RStudio (or start your computer), you need to run this function and give R the PAT so that it can connect to GitHub securely.

Important

Why do we need to do this on our computer? Because whenever you connect to GitHub through RStudio (like uploading and downloading your changes), GitHub needs you to authenticate that it is you and not someone else. In order to do that, you need to give GitHub a password to do that. In the past, you could use your same password connected to your GitHub account, but the problem with this method is that your password gets sent over the Internet many times, which increases the risk that someone will maliciously obtain your password. So instead, we use a temporary, restricted-access token that we can easily create and delete. This token only has limited access to your account, so it is safer to send over the Internet, and it can be very easy to delete without affecting your account.

Next, you will need to download (“clone” in Git language) the project repository that was created for you. In the Console, type out this, replacing my (Luke’s) project abbreviated name (mesh) with your project abbreviated name. You can find your GitHub project on SDCA’s account, if you don’t remember your project name.

usethis::create_from_github("steno-aarhus/mesh")

From here, go to the Desktop or wherever you created the project and open up the RStudio .Rproj file so that the project starts up in RStudio. In this project will be the files you need to get started on the project, and especially at this stage, the doc/protocol.qmd and the data-raw/project-variables.csv files. You’ll be working on these files before beginning to use the RAP and doing analyses.

Tip

Most likely, you are new to Git and GitHub. Git is a very powerful way of managing your files and your projects, but it also requires some major conceptual rewiring of how you work with files and computers. It takes some time to learn! Check out the section on Git in the Introduction to Reproducible Research in R course to learn more about using it.

You’ve just downloaded your GitHub project onto your computer. Unlike Dropbox or OneDrive, the files in your project on your computer don’t automatically synchronize with those on GitHub. You have to do it manually. Whenever you use Git and save your changes to the Git history, you need to “Push” (upload) your changes to your project files to GitHub. The diagram below shows how it conceptually looks like:

graph LR;
    Local -->|Push| GitHub
    GitHub -->|Pull| Local
    UKB-RAP -->|Push| GitHub
    GitHub -->|Pull| UKB-RAP

The “Local” is your own computer. Whenever you “push” to GitHub, it means it will upload your file changes (like synchronizing in Dropbox). Whenever you “pull” from GitHub, it takes any changes made on GitHub and downloads them to your “Local” computer.

When you work on the UKB-RAP, you will “pull” (download) your project from GitHub. As you work on it and save changes in your Git history, you “push” (upload) to GitHub often in order to keep your changes backed up. Then, when you get to the paper writing stage, you can pull your results from GitHub to your “Local” so you can work without getting charged.

Note

Why do we do it this way? For one, it is honestly the easiest that we could think of because the UKB RAP is a special environment that requires special steps to work in. Plus, using Git and GitHub makes it easier to have others (like me) collaborate on your project and help you out. So hypothetically, if you need help, I (Luke) could download your project from GitHub and make changes there. Conceptually it would look like:

graph TD;
    UKB-RAP -->|Push| GitHub
    GitHub -->|Pull| UKB-RAP
    You -->|Push| GitHub
    GitHub -->|Pull| You
    Luke -->|Push| GitHub
    GitHub -->|Pull| Luke

That way it makes it super easy for me (or others) to help out.

As you work on your project, specifically the protocol and selecting the variables for your project from the data-raw/project-variables.csv list, you’ll use Git to save the changes made and push up to GitHub. Note that there are some variables in the list of variables we have that don’t exist in the RAP database, for instance like date of birth. This may be due to privacy concerns, so instead you would have to use year of birth (p34) to determine their age. Once your protocol has been reviewed and uploaded to Zenodo, you’re now ready to start doing the data analysis on the RAP.

But before doing anything else, complete the tasks in the TODO.md file, which will direct you to fill in details in the README.md. After you’ve done the TODO items, start working towards writing the protocol and analysis plan before beginning your work in the UKB RAP.