3.2 Solutions

Reload the Muslim prejudice data:

load("prejudice.RData")

# create new outcome variable
table <- mutate(
  table, # call the dataset
  outcome = # new variable called 'outcome'
    case_when(
      is.na(srw_therm_1_h_1) ~ # for those rows where srw_therm_1_h_1 is missing
        srw_therm_2_h_1, # new variable will take value of srw_therm_2_h_1
      is.na(srw_therm_2_h_1) ~ # for those rows where srw_therm_2_h_1 is missing
        srw_therm_1_h_1 # new variable will take value of srw_therm_1_h_1
    )
)

library(tidyverse)

3.2.1 Exercise 1

In the Muslim prejudice study, the researchers explain:

Regarding Muslim Americans, media and political actors often promote misperceptions, particularly about Muslims and terrorism… As a result, I evaluated the treatment’s robustness to an environment in which fear of terrorism was present by randomly assigning half of the respondents to a question priming terrorism threats.

This means that this terrorism priming is another treatment variable whose average effect we can estimate through a difference in means. Turn the variable terrorism into a factor variable whose value is “no priming” if it is currently 0 and “primed” if it is currently 1.

We can do this using the factor() function from week 2. It is also possible using mutate() and case_when().

# just using factor()
table$terrorism <- factor(table$terrorism, # specify the variable to make categorical
                          labels = c("no priming", # label corresponding to 0
                                    "primed"), # label corresponding to 1
                          levels = c(0, 1)) # previous levels of variable

# or using mutate() and case_when()
# table <- mutate(table,
#                terrorism = factor(case_when(
#                  terrorism == 0 ~ "no priming",
#                  terrorism == 1 ~ "primed"
#                )))

3.2.2 Exercise 2

Delete all rows with missing values on the outcome variable in your original dataset (called table unless you changed it).

This is easily done with drop_na().

table <- drop_na(table,
                 outcome) # variable to determine missingness

3.2.3 Exercise 3

Visualise the distribution of outcome conditional on terrorism.

A conditional distribution plot, such as the one we made in week 2, works well here:

gg_priming <- ggplot(data = table, # tell R the dataset to work from
                     mapping = aes(x = outcome, # put hdi on the x axis
                                   group = terrorism, # group observations by former colony status
                                   colour = terrorism)) + # allow colour to vary by former colour status
  geom_density() + # create the 'density' geom (the curve, essentially a smoothed histogram)
  labs(x = "Muslim American Feeling Thermometer",
       y = "Density",
       title = "Conditional distribution of feelings towards Muslim Americans",
       subtitle = "Feelings conditional on priming misperceptions about terrorism") + # clearer labels
  theme_minimal() # change the colours
  
gg_priming

The distributions appear to be very similar, although the most positive feelings seem to be slightly more likely when terrorism misperceptions are not brought to mind.

3.2.4 Exercise 4

Estimate the difference in means, or average treatment effect, of terrorism on the outcome.

# separate two groups
treatment_group <- filter(table, terrorism == "primed")
control_group <- filter(table, terrorism == "no priming")

difference_in_means <-
  mean(treatment_group$outcome) -
  mean(control_group$outcome)

difference_in_means

[1] -1.851343

The difference in means is very small, but is in the direction the author expected: those in the control group – whose misperceptions about Muslim involvement in terror attacks were not primed – have a slightly larger mean.

3.2.5 Exercise 5

Estimate the standard error of the difference in means.

We use the formula again:

\[ \sqrt{ \frac{ s^2_{Yx=0} }{ n_{x=0} } + \frac{ s^2_{Yx=1} }{ n_{x=1} }} \]

And apply this in R:

# squared standard deviation in control group
s2ycontrol <- sd(control_group$outcome)^2

# n in control group
ncontrol <- nrow(control_group)

# squared standard deviation in treatment group
s2ytreatment <- sd(treatment_group$outcome)^2

# n in treatment group
ntreatment <- nrow(treatment_group)

# calculation
se_difference_in_means <-
  sqrt( # wrap it all in a square root
    (s2ycontrol/ncontrol) + # control group
      (s2ytreatment/ntreatment) # treatment group
  )

se_difference_in_means

[1] 0.9268306

The standard error of the difference in means is approximately 0.93.

3.2.6 Exercise 6

Calculate the 95 percent confidence interval of the difference in means.

# upper bound
difference_in_means + 1.96*se_difference_in_means

[1] -0.03475478

# lower bound 
difference_in_means - 1.96*se_difference_in_means

[1] -3.667931

The confidence interval does not overlap with zero, suggesting that we can reject the null hypothesis at the 95% confidence level.

3.2.7 Exercise 7

Calculate the 99 percent confidence interval of the difference in means. What conclusions can you draw from the result?

# upper bound
difference_in_means + 2.58*se_difference_in_means

[1] 0.5398802

# lower bound 
difference_in_means - 2.58*se_difference_in_means

[1] -4.242566

The confidence interval overlaps with zero, suggesting that we cannot reject the null hypothesis at the 99% confidence level.

3.2.8 Exercise 8

Confirm your results (within errors of rounding) using the t.test() function, first setting conf.level = 0.05 and then conf.level = 0.01.

t.test(outcome ~ terrorism,
       data = table,
       conf.level = 0.95)


    Welch Two Sample t-test

data:  outcome by terrorism
t = 1.9975, df = 3577.2, p-value = 0.04585
alternative hypothesis: true difference in means between group no priming and group primed is not equal to 0
95 percent confidence interval:
 0.03417332 3.66851238
sample estimates:
mean in group no priming     mean in group primed 
                63.03863                 61.18729

t.test(outcome ~ terrorism,
       data = table,
       conf.level = 0.99)


    Welch Two Sample t-test

data:  outcome by terrorism
t = 1.9975, df = 3577.2, p-value = 0.04585
alternative hypothesis: true difference in means between group no priming and group primed is not equal to 0
99 percent confidence interval:
 -0.5372892  4.2399749
sample estimates:
mean in group no priming     mean in group primed 
                63.03863                 61.18729