Objectives

The primary objective of this assignment is to give you more practice with:

You should also:

This assignment is due Thursday, September 30th at noon. Please turn your .html AND .Rmd files into Canvas. Your .Rmd file should knit without an error before turning in the assignment.

This assignment concerns a dataset from an experiment that tested whether 2-4 year-old children could learn new words from exclusion (Lewis, Cristiano, Lake, Kwan & Frank, 2020).

There were two conditions. In the critical condition, children saw two objects. One of the objects was an object that the child knew the label for (e.g., a ball) and the other object was an object that the child did not know the label for (e.g., tongs). The experimenter then asked the child to point to the novel object by saying, e.g., “Can you find the tongs?”. If the child assumes that each object only has one name, they should assume that this new label refers to the tongs, and not the ball. This phenomenon is called “Mutual Exclusivity” in the literature (Markman & Wachtel, 1988), because children are thought to assume that a new label is mutually exclusive with an old one. Let’s call this condition the “Novel-Familiar” condition, or NF.

In the control condition, children again saw two objects. This time both of the objects were objects that the child knew a label for (e.g., a ball and a cup). The experimenter then asked the child to point to one of the objects by saying, e.g., “Can you find the ball?”. Let’s call this condition the “Familiar-Familiar” condition, or FF.

Each child completed 7 trials: 4 in the NF condition and 3 in the FF condition. On each trial we recorded which object was the correct choice, and whether or not the child pointed to the correct object. We also measured two variables for each child: The age of the child and their performance on an separate vocabulary test.

Each variable in the dataset is described below:

Here is the path to a lightly cleaned version of the dataset:

DATA_PATH <- "https://raw.githubusercontent.com/mllewis/cumulative-science/master/static/data/tidy_me_data.csv"



  1. Load the data frame and save it to a variable called me_data. Use the glimpse() function to determine:
    [a] how many observations there are in the data frame,
    [b] the variable type of sub_id, and
    [c] the variable type of target_object.


  1. [a] Use slice() to print rows 1 and 3 from me_data.
    [b] Use arrange and slice() to print 7 rows of the first trial (where trial_num is 1).


  1. [a] How many children participated in our experiment?
    [b] How many children participated in our experiment who were at least three-and-a-half years of age?


  1. How many individual trials were there where the target object was “balloon”, “apple” or “guitar”?
    [a] Use group_by to answer this question.
    [b] Use count to answer this question.


  1. For each child, calculate the proportion of trials they got correct in each condition. Save it to a data frame called subject_means.


  1. Use the subject_means data frame to calculate the mean proportion correct by condition. Plot the result as a bar plot. Include the following things:

Which condition are children better at?


  1. Do children get better at the NF trials as they get older? Create a plot that shows mean performance at each age group (in years) on only NF trials.


  1. Make a version of the previous plot that shows performance on the NF trials at each age group for each target object. Use facet_wrap(). You’ll need to create a new data frame like subject_means_with_years but one that also includes the variable target_object. Call the new data frame subject_means_with_years_obj.


  1. Using me_data, make a new variable called scaled_vocabulary_score that ranges from 0 to 1, rather than 0 to 100.


  1. Use me_data to plot the distribution of children’s scaled_vocabulary_score. To do this, you’ll need a data frame with only one row per child. Use geom_histogram().


  1. Do older children have higher vocabularies? Recreate the plot below:



  1. Recreate the plot below, where each point corresponds to an individual child.


  1. What other questions could we ask of this data?
    [a] Pose an analytical question that could be answered with this data set.
    [b] Make a clear, beautiful plot that helps answer this question. If appropriate, use multiple geoms in your plot. (use a geom other than geom_bar, geom_violin, geom_boxplot, geom_histogram)
    [c] Interpret your plot.

For inspiration, check out the R ggplot gallery.