Skip to Tutorial Content

Available Datasets

We have built some toy datasets into this shiny tutorial. Run the code below to see all three of them (vitals, labs, and sdoh)

## Rows: 4
## Columns: 6
## $ mrn       <int> 1, 3, 4, 5
## $ visit_id  <int> 123456, 133456, 133412, 133419
## $ sbp_mm_hg <int> 104, 84, 92, 152
## $ dbp_mm_hg <int> 66, 51, 60, 110
## $ hr_bpm    <int> 112, 114, 110, 88
## $ rr_bpm    <int> 24, 28, 22, 13
## Rows: 4
## Columns: 6
## $ mrn      <int> 1, 2, 4, 6
## $ visit_id <chr> "123456", "133123", "133412", "133417"
## $ wbc_e9_L <dbl> 18.2, 14.3, 10.9, 11.9
## $ hgb_g_dL <dbl> 8.7, 9.4, 11.1, 12.1
## $ hct_pct  <dbl> 25.3, 28.1, 33.5, 36.5
## $ plt_k_uL <int> 188, 226, 284, 313
## Rows: 4
## Columns: 7
## $ mrn          <chr> "1", "2", "3", "7"
## $ census_tract <dbl> 6037702002, 36047061600, 36061015002, 36061016900
## $ svi_theme1   <dbl> 0.0081, 0.2345, 0.0183, 0.0133
## $ svi_theme2   <dbl> 0.112, 0.172, 0.101, 0.091
## $ svi_theme3   <dbl> 0.3014, 0.3834, 0.3321, 0.1621
## $ svi_theme4   <dbl> 0.0701, 0.1911, 0.0634, 0.0724
## $ svi_overall  <dbl> 0.4886, 0.9310, 0.5148, 0.3388

Examine each dataset. Think about how you might acquire each dataset from your local medical records. Think about how you would possibly want to merge them into a single dataset that contains vital signs, lab results, and social determinants of health (by census tract) with one row for each patient. Think about what the unique key is to match the data between data sets. Check the data types, and make sure that these match between data sets. The join won’t work if these are different data types (hint - you may have to fix these before you merge) This will take some joining.

Building a left_join

Start by thinking about how to join the vitals to the labs dataset. Use the vitals as the base (x) dataset, and join the matching rows in vitals (the new or y dataset) to these records

Exercise 1

Write the R code required to left-join the base dataset (x, vitals) to the new dataset (y, labs).

vitals %>% 
  something_here() %>% 
  ---(labs)
vitals %>% 
  mutate(visit_id = as.character(visit_id)) %>% 
  ---(labs)
vitals %>% 
  mutate(visit_id = as.character(visit_id)) %>% 
  left_join(labs)

Note that:

  1. All of the rows in the x dataset (vitals) are retained
  2. Only the matching data from the y dataset (labs) are joined
  3. If there are rows in vitals that are not matched in labs, the values for these columns (lab values) in these rows (those without matches in labs) are listed as NA
  4. Note that rows (patient mrns) that are only in labs and not in vitals do not appear in the final joined dataset.

Exercise 2

Write the R code required to right join the new dataset (y, sdoh) after a bit of data wrangling, to the base dataset (x, vitals). This keeps the dataset vitals as the base dataset, which will retain all of its rows.

sdoh %>% 
  select(-census_tract) %>% 
  mutate(something) %>% 
  ---(vitals)
sdoh %>% 
  select(-census_tract) %>% 
  mutate(mrn = as.integer(mrn)) %>% 
  ---(vitals)
sdoh %>% 
  select(-census_tract) %>% 
  mutate(mrn = as.integer(mrn)) %>% 
  right_join(vitals)

Note that:

  1. All of the rows in the x dataset (vitals) are retained
  2. Only the matching data from the y dataset (sdoh) are joined
  3. If there are rows in vitals that are not matched in sdoh, the values for these columns (sdoh values) in these rows (those without matches in sdoh) are listed as NA
  4. Note that rows (patient mrns) that are only in sdoh and not in vitals do not appear in the final joined dataset.

Exercise 3

Write the R code required to do an inner_join to retain only the records that have matches for the unique key from both the x (labs) and y (vitals) datasets.

labs %>% 
  mutate(something = as.integer(something)) %>% 
  ---(vitals)
labs %>% 
  mutate(visit_id = as.integer(visit_id)) %>% 
  inner_join(vitals)

Note that:

  1. Only the matching rows in the x dataset (labs) are retained
  2. Only the matching data from the y dataset (vitals) are joined
  3. There are no NA values
  4. Note that rows (patient mrns) that are only in labs, or only in vitals, do not appear in the final joined dataset.

Exercise 4

Write the R code required to do a full_join to retain all records from both the x (vitals) and y (sdoh) datasets, even when there are not matches.

vitals %>% 
  mutate(something = as.character(something)) %>% 
  ---(sdoh)
vitals %>% 
  mutate(mrn = as.character(mrn)) %>% 
 full_join(sdoh)

Note that:

  1. All rows in the x dataset (vitals) are retained
  2. All rows from the y dataset (sdoh) are joined
  3. There are lots of NA values in every unmatched row

Exercise 5: Building up a Dataset

Write the R code required to do two sequential left_joins to start with vitals, then add labs, then add sdoh to your dataset. Note that you will have to mutate some data types to make this sequential merge work.

vitals %>% 
  mutate(something = as.character(something)) %>% 
  left_join(labs) %>% 
  mutate(something = as.character(something)) %>% 
  ---(sdoh)
vitals %>% 
  mutate(visit_id = as.character(visit_id)) %>% 
  left_join(labs) %>% 
  mutate(mrn = as.character(mrn)) %>% 
  left_join(sdoh)

Note that:

  1. All rows in the x dataset (vitals) are retained
  2. Matching rows from the y dataset (vital) are joined
  3. Matching rows from the second y dataset (sdoh) are joined.
  4. There are lots of NA values in every unmatched row

Tutorial