Objectives
By the end of this assignment, you should:
read_csv
summarize
, mutate
, group_by
)This assignment is due Thursday, September 23 at noon. Please turn your .html AND .Rmd files into Canvas. Your .Rmd file should knit without an error before turning in the assignment.
To get started, you’ll need make a .Rmd document. You can start by using the template from the previous assignment and modifying it as appropriate (including title, name, etc). This assignment focuses on data from a recent paper examining the role that songs play in soothing infants (Bainbridge & Bertolo et al., 2021). To get started, you should give the paper a read and broadly understand the question the paper is trying to answer and the methods that they used (note that the methods are described in detail at the end of the paper). Note that for all the questions requiring code, you should use tidyverse functions.
In this assignment, we’ll focus on the heart rate data. You can download a lightly cleaned version of their heart rate data here:
<- read_csv("https://raw.githubusercontent.com/mllewis/cumulative-science/master/static/data/bb_2021_hr_clean.csv") bb2021
(if you’re curious, you can explore all their raw data by going to the repository associated with the paper, here).
There are seven variables in the data and each variable is described below. The first six rows of the data frame are also displayed below.
1
, .8 seconds after the trial started would be coded as 2
, etc.).%>%
bb2021 summarize(youngest = min(age),
oldest = max(age),
total = n())
[a]
<- bb2021 %>%
first_df filter(obs_num == 1, trial_id == 1)
nrow(first_df)
[b]
%>%
first_df group_by(age_cat) %>%
summarize(n = n())
[c]
%>%
first_df arrange(age)
[d]
%>%
first_df arrange(-age)
group_by
and ungroup
).%>%
bb2021 group_by(participant_id, trial_id) %>%
summarize(n = n()) %>%
ungroup() %>%
summarize(mean = mean(n))
%>%
bb2021 group_by(trial_type) %>%
summarize(n = n())
hr_round
that is the heart rate value rounded to the nearest hundredth (use the function round()
).<- bb2021 %>%
bb2021 mutate(hr_round = round(zhr_pt, 2))
hr_round
using the geom, geom_histogram
. Be sure to add an appropriate title to your plot.%>%
bb2021 ggplot(aes(x = hr_round)) +
geom_histogram() +
ggtitle("Heart Rate Distribution")
participant_means
.<- bb2021 %>%
participant_meansgroup_by(participant_id, trial_type) %>%
summarize(mean = mean(hr_round))
## `summarise()` has grouped output by 'participant_id'. You can override using the `.groups` argument.
participant_means
to create a violin plot showing the distribution of heart rates in the lullaby and non-lullaby conditions. Your plot should be a simplified version of Figure 2a in the paper with (a) two violins, (b) each violin a different color, and (c) points showing the underlying data. (hint: the order that you add geoms to your plot matters!).ggplot(participant_means, aes(x = trial_type, y = mean, color = trial_type)) +
geom_violin() +
geom_point()
participant_means
to calculate the overall means in the lullaby and non-lullaby conditions. Save this to a new dataframe called condition_means
, and plot the two means as a bar plot of different colors. (hint: use geom_bar(stat = "identity")
).<- paricipant_means %>%
condition_means group_by(trial_type) %>%
summarize(mean = mean(mean))
ggplot(condition_means, aes(x = trial_type, y = mean, fill = trial_type)) +
geom_bar(stat = "identity")
<- bb2021 %>%
trial_means group_by(trial_type, trial_id) %>%
summarise(mean = mean(zhr_pt),
n = n(),
.groups = "keep")
ggplot(trial_means, aes(x = trial_id, y = mean, color = trial_type)) +
geom_line() +
geom_point() + #aes(size = n)
ggtitle("Mean heart rate by trial number")
# Jonathan
%>%
bb2021 group_by(trial_type, trial_id) %>%
summarize(hr_means = mean(zhr_pt), num_trial = n_distinct(participant_id)) %>%
ggplot(mapping = aes(x = trial_id, y = hr_means, color = trial_type, size = num_trial)) +
geom_point() +
geom_line(size = 0.5) +
ggtitle("Mean heart rate by trial number") +
ylab("mean")
# Nora
<- bb2021 %>%
trial_means group_by(trial_id, trial_type) %>%
summarize(mean_hr = mean(zhr_pt), total_trials = n_distinct(participant_id))
ggplot(trial_means, mapping = aes(x = trial_id, y = mean_hr, color = trial_type)) +
geom_point(mapping = aes(size = total_trials)) +
geom_line() +
ggtitle("Mean Heart Rate Change Per Trial #")
# Emily (Visualizing the densities)
%>%
participant_means ggplot(mapping = aes(mean, color = trial_type)) +
geom_freqpoly() +
ggtitle("Frequency of Mean Heart Rates by Condition")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Iris (Comparison across ages)
<- bb2021 %>%
bb2021_by_age group_by(age_cat, trial_type) %>%
summarise(age_mean = mean(zhr_pt))
## `summarise()` has grouped output by 'age_cat'. You can override using the `.groups` argument.
ggplot(bb2021_by_age, mapping = aes(x = age_cat, y = age_mean, color = trial_type)) +
geom_col(fill = "white") +
ggtitle("Mean heart rate by age and trial type") +
xlab("age (month)") +
ylab("mean heart rate")
# Bethany (outlier customization)
%>%
participant_means ggplot(mapping = aes(x=trial_type, y=mean)) +
geom_boxplot(outlier.colour = 'purple', outlier.size = 2, aes(fill=trial_type)) +
scale_fill_brewer(palette = 'Set3', name='Trial Condition') +
ggtitle(label='Average Participant Means by Trial Condition', subtitle = 'The mean heart rate across all trials for each participant based on trial condition') +
xlab('Trial Condition') +
ylab('Mean Heart Rate')