Skip to Tutorial Content

Data Wrangling

Please note,
It is strongly encouraged to

  • google dplyr functions
  • phone a friend
  • share screens
  • ask questions

Exercise

Edit the code below to

  • filter the prostate data by PreopPSA >5
  • filter for PVol >50 cc
  • select TVol, T.Stage, AA, TimeToRecurrence
  • group_by AA
  • summarize the mean_tvol, mean_tstage, and mean_time
prostate %>% 
  filter(---) %>% 
  filter(---) %>% 
  select(---) %>% 
  group_by(---) %>% 
  summarize(---)
prostate %>% 
  filter(PreopPSA >5) %>% 
  filter(PVol >50) %>% 
  select(TVol, T.Stage, AA, TimeToRecurrence) %>% 
  group_by(AA) %>% 
  summarize(mean_tvol = mean(TVol, na.rm = T),
            mean_tstage = mean(T.Stage, na.rm = T),
            mean_time = mean(TimeToRecurrence, na.rm = T))

Recoding with cut

Recode the variable Age into a new variable, age_decade, using the {base} function, cut(). Use the breaks argument.

prostate %>% 
  mutate(age_decade = ---) %>% 
  select(Age, age_decade)
prostate %>% 
  mutate(age_decade = cut(Age, breaks = seq(10,100,10))) %>% 
  select(Age, age_decade)

Recoding with {santoku} chop

Google the R package {santoku}, and the chop() function (with the breaks argument). Use this to create age_decade from the Age variable.

prostate %>% 
  mutate(age_decade = --- ) %>% 
  select(Age, age_decade)
prostate %>% 
  mutate(age_decade = chop(Age, breaks = c(seq(10,100, 10)))) %>% 
  select(Age, age_decade)

Dplyr Tutorial