Data Wrangling
Please note,
It is strongly encouraged to
- google dplyr functions
- phone a friend
- share screens
- ask questions
Exercise
Edit the code below to
- filter the prostate data by PreopPSA >5
- filter for PVol >50 cc
- select TVol, T.Stage, AA, TimeToRecurrence
- group_by AA
- summarize the mean_tvol, mean_tstage, and mean_time
prostate %>%
filter(---) %>%
filter(---) %>%
select(---) %>%
group_by(---) %>%
summarize(---)
prostate %>%
filter(PreopPSA >5) %>%
filter(PVol >50) %>%
select(TVol, T.Stage, AA, TimeToRecurrence) %>%
group_by(AA) %>%
summarize(mean_tvol = mean(TVol, na.rm = T),
mean_tstage = mean(T.Stage, na.rm = T),
mean_time = mean(TimeToRecurrence, na.rm = T))
Recoding with cut
Recode the variable Age into a new variable, age_decade, using the {base} function, cut(). Use the breaks argument.
prostate %>%
mutate(age_decade = ---) %>%
select(Age, age_decade)
prostate %>%
mutate(age_decade = cut(Age, breaks = seq(10,100,10))) %>%
select(Age, age_decade)
Recoding with {santoku} chop
Google the R package {santoku}, and the chop() function (with the breaks argument). Use this to create age_decade from the Age variable.
prostate %>%
mutate(age_decade = --- ) %>%
select(Age, age_decade)
prostate %>%
mutate(age_decade = chop(Age, breaks = c(seq(10,100, 10)))) %>%
select(Age, age_decade)