This exam is due Thursday, October 21st at noon. Please turn your .html AND .Rmd files into Canvas. You may use class materials (slides, notes, and readings) and the internet as resources to complete your exam. However, you must complete the exam on your own. You should not discuss the exam with anyone other than Professor Lewis or Roderick.


  1. [a] What does the term “cumulative science” mean? [1 paragraph]
    [b] Why might researchers fail to make cumulative scientific progress? [1 paragraph]

    Your responses should make reference to examples we’ve discussed in class and the course readings.


  1. Suppose you were interested in estimating how many words a child knows (their vocabulary size). Describe three different variables that you could measure to estimate the size of a child’s vocabulary size (each variable should be of a different type). For each variable, give a one sentence description of the variable, the variable type, and one example value of that variable with units.


  1. Look at this spreadsheet.

    [a] List 5 things about the data that are not tidy.
    [b] What are the observations in this dataset?

    If it’s helpful, you can learn more about this dataset here.


  1. The first 12 columns of the data in Question 3 are provided at the link below. Use these data to reproduce the following two plots [a-b].

DATA_PATH <- "https://raw.githubusercontent.com/mllewis/cumulative-science/master/static/data/tidy_UN_MigrantStockTotal_2019.csv"



  1. Go to the dataisugly subreddit and find one plot that you think is a particularly severe offender of the plotting guidelines we discussed in class.

    [a] Provide a link to that plot, and describe what you take to be the main message of the plot.
    [b] Describe 5 things you would change in the plot to make it better at conveying this message.


  1. Explain what “p-hacking” is, and why a researcher might do it.


  1. Rhoda is a researcher in the psychology department at CMU. She has the hypothesis that children will be more likely to show the mutual exclusivity effect in word learning if the child is asked to find the novel object in infant directed speech, compared to adult directed speech. Rhoda runs a study testing this hypothesis with a sample of children from the local nursery school.

    [a] Explain what it would mean to reproduce this study.
    [b] Explain what it would mean to replicate this study.


  1. [a] How fast can ten year-olds type? To estimate this value, researchers conducted a study in which they asked 10-yo to type “Alice in Wonderland” by Lewis Carroll. They calculated the mean number of words per minute typed, and repeated this study 99 more times with new participants for each study. Figure (a) below shows point estimates and confidence intervals for each of the 100 studies. The dashed line shows the underlying population value. Estimate the level of confidence interval shown in the plot.
    [b] Researchers completed this study 100 more times, shown in Figure (b) below. The confidence intervals are larger here, relative to the Figure (a). List two reasons this might be.


  1. Recall from class that we claimed that our replication of Zettersten and Lupyan (2020) was successful. However, the mean percentage of correct trials for the high nameability condition in our replication was 75%, which lies outside the 95% confidence interval for the same measure as estimated by Zettersten and Lupyan (M=84.0%, 95% CI=[78.6%, 89.4%]). Is this a problem for our claim? Why or why not?


  1. Imagine we conducted a replication of Bainbridge and Bertolo, et al. (2021; from Assignment 2) and got an effect size estimate of -.4 with a confidence interval of [-.5, -.3]. What would this mean?


  1. This question uses the data from the Many Babies project from Assignment 5.

    [a] Calculate effect sizes for the replications in two of the labs. Specifically, calculate an effect size for the replication experiment conducted by the “babylabnijmegen” lab and an effect size for the replication experiment conducted by the “infantllmadison” lab.
    [b] Plot the two effect sizes.
    [c] Which lab had the largest effect size? State the point estimate and confidence interval for the effect size for that lab.
    [d] Interpret the effect sizes. Explain what it means to have a large effect size in this experiment. Would you guess that these two effect sizes are statistically different from each other? Why or why not?


  1. How are the functions filter and distinct similar? How are they different? Use data from the Many Babies project to demonstrate your answer. Your answer should involve both code and a clear explanation.



The following questions concern the set of studies reported in “Gender stereotypes about intellectual ability emerge early and influence children’s interests” by Lian, Leslie, and Cimpian (2017). You can find additional information about the study methods in the Supplementary Materials. In addition, there is an OSF repository that contains the raw data from the studies reported in the paper. The repository includes a document explaining each of the variables. In order to complete the questions about this dataset, you’ll need to read the paper/supplementary materials to understand the design of Study 1 and Study 2.


  1. Sort the data from Study 1 so that children who have the highest value for stereo are at the top. Show the first 7 rows of this dataset.


When you look at the data in Study 1 you might notice that the values for the variables gender and trait are numbers. Coding qualitative variables with numbers is not ideal since it’s not obvious what these numbers correspond to, and so it’s easy to make errors when you’re interpreting the data. The code below recodes the variables gender and trait so that the values are interpretable strings, rather than numbers. Make sure to use the below code to fix the variables gender and trait in your data frame.

lian_data_tidy <- lian_data %>%
  mutate(gender = as_factor(gender),
         gender = fct_recode(gender, "boy" = "1", "girl" = "2"),
         trait = as_factor(trait),
         trait = fct_recode(trait, "nice" = "0", "smart" = "1"))


  1. How many children participated in the experiment in each age group?


  1. Calculate the standard error of the mean for the dependent variable (mean proportion of times children linked the target trait to their own gender) for girls on the “nice” trait task.


  1. Recreate Figure 1A and Figure 1B from the paper. Include error bars that are 95% confidence intervals (rather than standard errors).