Assignment 5: Estimation and Data Wrangling

Objectives

By the end of this assignment, you should:

understand the difference between a point estimate and a confidence interval
understand how to create a confidence interval of a mean from raw data
understand how to plot and interpret a confidence interval of a mean

This assignment is due Thursday, October 14th at noon. Please turn your .html AND .Rmd files into Canvas. Your .Rmd file should knit without an error before turning in the assignment.

library(tidyverse)
library(knitr)
library(janitor)

In the first few exercises, we’ll return to the data from Experiment 1A of Zettersten & Lupyan (2020) that you worked with in Assignment 4, and that we’ve talked about in lecture. Our goal will be to estimate the means in the two conditions, and quantify our certainty about these estimates using confidence intervals.

To start, you’ll need to read in the data from Zettersten & Lupyan and recreate the data frame called zl_exp1a from Assignment 4.

[a] Take the data frame called zl_exp1a and modify it so it has a single row for each subject, condition, and block number. Accuracy should be summarized for each subject as the proportion of trials categorized correctly. Save the new data frame as zl_exp1a_by_subject.
[b] Plot a histogram of zl_exp1a_by_subject with a separate facet for each unique combination of condition and block number (make sure to set the binwidth to an appropriate value).

DATA_PATH <- "https://osf.io/a4dzb/download"
zl_data <- read_csv(DATA_PATH)

zl_clean <- zl_data %>%
  clean_names() %>%
  select(experiment, subject, age, condition, block_num, is_right)

zl_exp1a <- zl_clean %>%
  filter(experiment == "1A") 

zl_exp1a_by_subject <- zl_exp1a %>%
  group_by(subject, condition, block_num) %>%
  summarize(prop_right = sum(is_right)/n())

zl_exp1a_by_subject %>%
  ggplot(aes(x = prop_right)) +
  geom_histogram(bins = 10) +
  facet_grid(condition ~ block_num) +
  xlab("Proportion correct categorizations") +
  theme_classic(base_size = 12)

[a] Use zl_exp1a_by_subject to calculate a point estimate (mean) and 95% confidence interval for each combination of condition and block number.
[b] Recreate the plot that you made in Assignment 4, Exercise 2d, adding the confidence intervals you just calculated in part a.
[c] Do the values in your plot match the one in the paper (Fig. 4A)?

means_by_condition_with_ci_t <- zl_exp1a_by_subject %>%
  group_by(condition, block_num) %>%
  summarize(mean = mean(prop_right),
            sd = sd(prop_right),
            n = n()) %>%
  mutate(ci_range_95 =  qt(1 - (0.05 / 2), n - 1) * (sd/sqrt(n)), #<<
         ci_lower = mean - ci_range_95,
         ci_upper = mean + ci_range_95)

## `summarise()` has grouped output by 'condition'. You can override using the `.groups` argument.

ggplot(means_by_condition_with_ci_t, 
       aes(x = block_num, y = mean, color = condition)) +
  geom_pointrange(aes(ymin = ci_lower, ymax = ci_upper)) +
  ylim(.5,1) +
  scale_color_manual(values = c('red', "blue")) +
  geom_line() +
  theme_classic(base_size = 12)

[c] The values are larger in this plot, relative to the original. This is because the original plot (Fig. 4a) plots ranges that correspond to standard error rather than confidence intervals.

[a] What is the point estimate of the mean for the third block in the high nameability condition?
[b] What is the confidence interval of the mean for the third block in the high nameability condition?
[c] How should we interpret the range of values given by the confidence interval? In other words, what does this range mean?

[a] The point estimate is 0.905.
[b] The confidence interval is [0.841, 0.969].
[c] If we ran our experiment 100 times and plotted the confidence interval for each one, 95 of of the confidence intervals would cover the population value. One confidence interval gives you a set of plausible values for the population mean.

Compare the confidence intervals in the low versus high nameability conditions.

[a] How are they different (qualitatively)?
[b] Explain why they are different despite having the same number of participants in each condition. Make reference to the equations we talked about in class defining a confidence interval.

[a] The confidence intervals in the low condition are larger.
[b] The low nameability condition has higher variance/standard deviation. This leads to a larger confidence interval because the width of the confidence interval depends on the standard deviation (qt(1 - (0.05 / 2), n - 1) * (sd/sqrt(n))).

The next exercises concern the ManyBabies dataset we discussed in lab. The experiment replicates an experiment designed to test whether infants have a preference for infant directed speech (IDS) compared to adult directed speech (ADS). Like in the previous exercises, our goal will be to estimate the means in the two conditions, and quantify our certainty about this estimate using confidence intervals.

The dataset we’ll be working with in this assignment is a subset of the full ManyBabies dataset. It contains the data for 6 replication attempts from 6 different labs. The data (in tidy format) are located in the at the following address. Each row corresponds to the data from one subject.

https://raw.githubusercontent.com/mllewis/cumulative-science/master/static/data/many_babies_data.csv.

Each of the variables (columns) are described below.

lab - unique identifier for the lab
subid - unique identifier for the subject
age_days - age of infant in days
condition - whether the infant heard IDS or ADS speech
mean_looking_time - how long infant looked at screen when speech was playing (seconds).

Start by reading the data into R. Spend a few minutes exploring the data to understand its structure using functions like glimpse and summary.

many_babies_data <- read_csv("https://raw.githubusercontent.com/mllewis/cumulative-science/master/static/data/many_babies_data.csv")

Explain why these experiments are “replications”.

Piper: To replicate an experiment means to produce the same results from the original population, but with a new data set. These experiments qualify to be considered “replications” because we are changing the experimenter, data, analyst, code, estimate, and claim, while using similar methods and overall experimental design.

What is the predicted pattern for the mean looking times in the IDS versus the ADS conditions? Let’s call this pattern the “effect”.

Mean looking times in the IDS conditions are predicted to be longer than those in the ADS conditions.

How many babies were run in each lab? You can print out a nicely formatted version of this dataframe in your knitted html file using the function kable(<DATAFRAME>) from the knitr package.

many_babies_data %>%
  distinct(subid, lab) %>%
  count(lab) %>%
  kable()

lab	n
babylabnijmegen	55
babylabparisdescartes1	16
babylabplymouth	34
babylabpotsdam	32
babylabprinceton	14
infantllmadison	87

[a] Plot a histogram of looking time data. Show each condition as a different fill and the data in each lab in a separate facet.
[b] It’s a little hard to see the difference between the conditions with the data plotted as histograms. Make a second plot that uses geom_density rather than geom_histogram. geom_density plots frequencies as a smoothed probability distribution rather than raw counts. Set the alpha parameter in geom_density so that you can see both overlapping distributions.

many_babies_data %>%
  ggplot(aes(x = mean_looking_time, fill = condition)) +
  geom_histogram(binwidth = 1) +
  facet_wrap(~lab)+
  xlab("Mean Looking Time") +
  ggtitle("Looking time by lab and condition") +
  theme_classic()

many_babies_data %>%
  ggplot(aes(x = mean_looking_time, fill = condition)) +
  geom_density(alpha = .5) +
  facet_wrap(~lab) +
  xlab("Mean Looking Time") +
  ggtitle("Looking time by lab and condition") +
  theme_classic()

Look at the distribution plots you made in the previous exercises.

[a] Describe in words the distributions for IDS and ADS looking times for the lab named babylabprinceton (central tendency and dispersion).
[b] Compare the central tendencies in the two conditions across labs. What is the pattern in the results across the different labs?

[a] The central tendency is lower for ADS compared to IDS. IDS has greater dispersion than ADS. [b] There is a trend for IDS to be higher compare to ADS.

Let’s quantify our certainty in the pattern of results across labs using confidence intervals. Calculate a 95% confidence interval of the mean looking time for each condition in each lab. Save it to a data frame called ids_by_lab

ids_by_lab <- many_babies_data %>%
  group_by(lab, condition) %>%
  summarize(mean = mean(mean_looking_time),
            sd = sd(mean_looking_time),
            n = n()) %>%
  mutate(ci_range_95 =  qt(1 - (0.05 / 2), n - 1) *
           (sd/sqrt(n)), 
         ci_lower = mean - ci_range_95,
         ci_upper = mean + ci_range_95)

## `summarise()` has grouped output by 'lab'. You can override using the `.groups` argument.

kable(ids_by_lab)

lab	condition	mean	sd	n	ci_range_95	ci_lower	ci_upper
babylabnijmegen	ADS	6.878308	2.321326	55	0.6275423	6.250765	7.505850
babylabnijmegen	IDS	7.535700	2.876603	55	0.7776548	6.758045	8.313355
babylabparisdescartes1	ADS	4.449689	1.664878	16	0.8871508	3.562539	5.336840
babylabparisdescartes1	IDS	5.613009	1.608897	16	0.8573207	4.755689	6.470330
babylabplymouth	ADS	6.103096	1.609800	34	0.5616856	5.541411	6.664782
babylabplymouth	IDS	6.915758	2.291181	34	0.7994310	6.116327	7.715189
babylabpotsdam	ADS	7.240376	2.811162	32	1.0135321	6.226844	8.253908
babylabpotsdam	IDS	8.221106	3.620444	32	1.3053093	6.915796	9.526415
babylabprinceton	ADS	5.907301	2.221359	14	1.2825746	4.624726	7.189875
babylabprinceton	IDS	8.164119	3.735444	14	2.1567814	6.007338	10.320900
infantllmadison	ADS	5.739549	2.386168	87	0.5085614	5.230988	6.248110
infantllmadison	IDS	7.387775	2.828367	87	0.6028068	6.784968	7.990582

[a] Plot the point estimates and confidence intervals in ids_by_lab with condition on the x-axis and the data from each lab as a separate facet. As in your distribution plots above, each condition should be represented with a different color.
[b] Interpret the plots. What is the pattern within individual labs? Describe and explain your level of certainty about these claims (in words).
[c] In which lab is the effect largest? In which lab is the effect smallest? (no need to calculate anything, just “eyeball” it).

ggplot(ids_by_lab, aes(x = condition, y = mean, color = condition)) +
  geom_pointrange(aes(ymin = ci_lower, ymax = ci_upper)) +
  facet_wrap(~ lab) +
  ggtitle("Mean looking time by lab and condition") +

  theme_classic()

[b] Tze Ling: Overall, the labs all found a higher point estimate for mean looking time in the IDS than ADS condition. This would suggest that there is an effect that results in longer looking times in response to IDS than ADS. However, the level of certainty about the effect is relatively low. With the exception of infantllmadison, the confidence intervals of the two conditions have significant overlap, which means that it is plausible there is no effect present (difference between means = 0). The lack of overlap between the confidence intervals found by infantllmadison allows for relatively higher levels of certainty about the effect found.

Raina: The pattern within individual labs seems to be that the mean looking time in the IDS condition is higher than in the ADS condition. However, I am not very certain about this pattern since there is a great deal of overlap between the confidence intervals for the ADS and IDS conditions across all the labs. This makes it a little more challenging to be certain that the difference between the two conditions is significant. The only lab where there seems to be no overlap between the confidence intervals is the infantllmadison lab, where we can be more certain of the claim that the mean looking time is higher in the IDS condition compared to the ADS condition.

[c] Effect appears to be the largest in the “infantllmadison” or the “babylabprinceton” lab, and the smallest in the “babylabnijmegen” lab.

[a] The confidence intervals for the lab “infantllmadison” are smaller than for the lab “babylabprinceton”. Explain one reason why that might be.
[b] The confidence intervals for “babylabparisdescartes1” are smaller than for “babylabpotsdam”. Explain one reason why that might be.

[a] “infantllmadison” used a much larger sample as compared to “babylabprinceton”.
[b] Even though “babylabparisdescartes1” used a smaller sample, the mean looking times were less varied than those collected by “babylabprinceton”.

Assignment 5: Estimation and Data Wrangling - SOLUTIONS

Modern Research Methods