Objectives
By the end of this assignment, you should:
This assignment is due Thursday, October 14th at noon. Please turn your .html AND .Rmd files into Canvas. Your .Rmd file should knit without an error before turning in the assignment.
In the first few exercises, we’ll return to the data from Experiment 1A of Zettersten & Lupyan (2020) that you worked with in Assignment 4, and that we’ve talked about in lecture. Our goal will be to estimate the means in the two conditions, and quantify our certainty about these estimates using confidence intervals.
zl_exp1a from Assignment 4. zl_exp1a and modify it so it has a single row for each subject, condition, and block number. Accuracy should be summarized for each subject as the proportion of trials categorized correctly. Save the new data frame as zl_exp1a_by_subject. zl_exp1a_by_subject with a separate facet for each unique combination of condition and block number (make sure to set the binwidth to an appropriate value).zl_exp1a_by_subject to calculate a point estimate of the mean and a 95% confidence interval for each combination of condition and block number.
The next exercises concern the ManyBabies dataset we discussed in lab. The experiment replicates an experiment designed to test whether infants have a preference for infant directed speech (IDS) compared to adult directed speech (ADS). Like in the previous exercises, our goal will be to estimate the means in the two conditions, and quantify our certainty about this estimate using confidence intervals.
The dataset we’ll be working with in this assignment is a subset of the full ManyBabies dataset. It contains the data for 6 replication attempts from 6 different labs. The data (in tidy format) are located in the at the following address. Each row corresponds to the data from one subject.
https://raw.githubusercontent.com/mllewis/cumulative-science/master/static/data/many_babies_data.csv.
Each of the variables (columns) are described below.
lab - unique identifier for the labsubid - unique identifier for the subjectage_days - age of infant in dayscondition - whether the infant heard IDS or ADS speechmean_looking_time - how long infant looked at screen when speech was playing (seconds).Start by reading the data into R. Spend a few minutes exploring the data to understand its structure using functions like glimpse and summary.
kable(<DATAFRAME>) from the knitr package.geom_density rather than geom_histogram. geom_density plots frequencies as a smoothed probability distribution rather than raw counts. Set the alpha parameter in geom_density so that you can see both overlapping distributions.babylabprinceton (central tendency and dispersion). ids_by_labids_by_lab with condition on the x-axis and the data from each lab as a separate facet. As in your distribution plots above, each condition should be represented with a different color.