Objectives
By the end of this assignment, you should:
read_csvsummarize, mutate, group_by)This assignment is due Thursday, September 23 at noon. Please turn your .html AND .Rmd files into Canvas. Your .Rmd file should knit without an error before turning in the assignment.
To get started, you’ll need make a .Rmd document. You can start by using the template from the previous assignment and modifying it as appropriate (including title, name, etc). This assignment focuses on data from a recent paper examining the role that songs play in soothing infants (Bainbridge & Bertolo et al., 2021). To get started, you should give the paper a read and broadly understand the question the paper is trying to answer and the methods that they used (note that the methods are described in detail at the end of the paper). Note that for all the questions requiring code, you should use tidyverse functions.
In this assignment, we’ll focus on the heart rate data. You can download a lightly cleaned version of their heart rate data here:
bb2021 <- read_csv("https://raw.githubusercontent.com/mllewis/cumulative-science/master/static/data/bb_2021_hr_clean.csv")(if you’re curious, you can explore all their raw data by going to the repository associated with the paper, here).
There are seven variables in the data and each variable is described below. The first six rows of the data frame are also displayed below.
1, .8 seconds after the trial started would be coded as 2, etc.).bb2021 %>%
summarize(youngest = min(age),
oldest = max(age),
total = n())[a]
first_df <- bb2021 %>%
filter(obs_num == 1, trial_id == 1)
nrow(first_df)[b]
first_df %>%
group_by(age_cat) %>%
summarize(n = n())[c]
first_df %>%
arrange(age) [d]
first_df %>%
arrange(-age) group_by and ungroup).bb2021 %>%
group_by(participant_id, trial_id) %>%
summarize(n = n()) %>%
ungroup() %>%
summarize(mean = mean(n))bb2021 %>%
group_by(trial_type) %>%
summarize(n = n())hr_round that is the heart rate value rounded to the nearest hundredth (use the function round()).bb2021 <- bb2021 %>%
mutate(hr_round = round(zhr_pt, 2)) hr_round using the geom, geom_histogram. Be sure to add an appropriate title to your plot.bb2021 %>%
ggplot(aes(x = hr_round)) +
geom_histogram() +
ggtitle("Heart Rate Distribution")participant_means.participant_means<- bb2021 %>%
group_by(participant_id, trial_type) %>%
summarize(mean = mean(hr_round))## `summarise()` has grouped output by 'participant_id'. You can override using the `.groups` argument.
participant_means to create a violin plot showing the distribution of heart rates in the lullaby and non-lullaby conditions. Your plot should be a simplified version of Figure 2a in the paper with (a) two violins, (b) each violin a different color, and (c) points showing the underlying data. (hint: the order that you add geoms to your plot matters!).ggplot(participant_means, aes(x = trial_type, y = mean, color = trial_type)) +
geom_violin() +
geom_point() participant_means to calculate the overall means in the lullaby and non-lullaby conditions. Save this to a new dataframe called condition_means, and plot the two means as a bar plot of different colors. (hint: use geom_bar(stat = "identity")).condition_means <- paricipant_means %>%
group_by(trial_type) %>%
summarize(mean = mean(mean))
ggplot(condition_means, aes(x = trial_type, y = mean, fill = trial_type)) +
geom_bar(stat = "identity")trial_means <- bb2021 %>%
group_by(trial_type, trial_id) %>%
summarise(mean = mean(zhr_pt),
n = n(),
.groups = "keep")
ggplot(trial_means, aes(x = trial_id, y = mean, color = trial_type)) +
geom_line() +
geom_point() + #aes(size = n)
ggtitle("Mean heart rate by trial number")# Jonathan
bb2021 %>%
group_by(trial_type, trial_id) %>%
summarize(hr_means = mean(zhr_pt), num_trial = n_distinct(participant_id)) %>%
ggplot(mapping = aes(x = trial_id, y = hr_means, color = trial_type, size = num_trial)) +
geom_point() +
geom_line(size = 0.5) +
ggtitle("Mean heart rate by trial number") +
ylab("mean")# Nora
trial_means <- bb2021 %>%
group_by(trial_id, trial_type) %>%
summarize(mean_hr = mean(zhr_pt), total_trials = n_distinct(participant_id))
ggplot(trial_means, mapping = aes(x = trial_id, y = mean_hr, color = trial_type)) +
geom_point(mapping = aes(size = total_trials)) +
geom_line() +
ggtitle("Mean Heart Rate Change Per Trial #")# Emily (Visualizing the densities)
participant_means %>%
ggplot(mapping = aes(mean, color = trial_type)) +
geom_freqpoly() +
ggtitle("Frequency of Mean Heart Rates by Condition")## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Iris (Comparison across ages)
bb2021_by_age <- bb2021 %>%
group_by(age_cat, trial_type) %>%
summarise(age_mean = mean(zhr_pt))## `summarise()` has grouped output by 'age_cat'. You can override using the `.groups` argument.
ggplot(bb2021_by_age, mapping = aes(x = age_cat, y = age_mean, color = trial_type)) +
geom_col(fill = "white") +
ggtitle("Mean heart rate by age and trial type") +
xlab("age (month)") +
ylab("mean heart rate")# Bethany (outlier customization)
participant_means %>%
ggplot(mapping = aes(x=trial_type, y=mean)) +
geom_boxplot(outlier.colour = 'purple', outlier.size = 2, aes(fill=trial_type)) +
scale_fill_brewer(palette = 'Set3', name='Trial Condition') +
ggtitle(label='Average Participant Means by Trial Condition', subtitle = 'The mean heart rate across all trials for each participant based on trial condition') +
xlab('Trial Condition') +
ylab('Mean Heart Rate')