3.2 Solutions
Reload the Muslim prejudice data:
# create new outcome variable
table <- mutate(
table, # call the dataset
outcome = # new variable called 'outcome'
case_when(
is.na(srw_therm_1_h_1) ~ # for those rows where srw_therm_1_h_1 is missing
srw_therm_2_h_1, # new variable will take value of srw_therm_2_h_1
is.na(srw_therm_2_h_1) ~ # for those rows where srw_therm_2_h_1 is missing
srw_therm_1_h_1 # new variable will take value of srw_therm_1_h_1
)
)
3.2.1 Exercise 1
In the Muslim prejudice study, the researchers explain:
Regarding Muslim Americans, media and political actors often promote misperceptions, particularly about Muslims and terrorism… As a result, I evaluated the treatment’s robustness to an environment in which fear of terrorism was present by randomly assigning half of the respondents to a question priming terrorism threats.
This means that this terrorism priming is another treatment variable whose average effect we can estimate through a difference in means. Turn the variable terrorism
into a factor variable whose value is “no priming” if it is currently 0 and “primed” if it is currently 1.
We can do this using the factor()
function from week 2. It is also possible using mutate()
and case_when()
.
# just using factor()
table$terrorism <- factor(table$terrorism, # specify the variable to make categorical
labels = c("no priming", # label corresponding to 0
"primed"), # label corresponding to 1
levels = c(0, 1)) # previous levels of variable
# or using mutate() and case_when()
# table <- mutate(table,
# terrorism = factor(case_when(
# terrorism == 0 ~ "no priming",
# terrorism == 1 ~ "primed"
# )))
3.2.2 Exercise 2
Delete all rows with missing values on the outcome
variable in your original dataset (called table
unless you changed it).
This is easily done with drop_na()
.
3.2.3 Exercise 3
Visualise the distribution of outcome
conditional on terrorism
.
A conditional distribution plot, such as the one we made in week 2, works well here:
gg_priming <- ggplot(data = table, # tell R the dataset to work from
mapping = aes(x = outcome, # put hdi on the x axis
group = terrorism, # group observations by former colony status
colour = terrorism)) + # allow colour to vary by former colour status
geom_density() + # create the 'density' geom (the curve, essentially a smoothed histogram)
labs(x = "Muslim American Feeling Thermometer",
y = "Density",
title = "Conditional distribution of feelings towards Muslim Americans",
subtitle = "Feelings conditional on priming misperceptions about terrorism") + # clearer labels
theme_minimal() # change the colours
gg_priming
The distributions appear to be very similar, although the most positive feelings seem to be slightly more likely when terrorism misperceptions are not brought to mind.
3.2.4 Exercise 4
Estimate the difference in means, or average treatment effect, of terrorism
on the outcome
.
# separate two groups
treatment_group <- filter(table, terrorism == "primed")
control_group <- filter(table, terrorism == "no priming")
difference_in_means <-
mean(treatment_group$outcome) -
mean(control_group$outcome)
difference_in_means
[1] -1.851343
The difference in means is very small, but is in the direction the author expected: those in the control group – whose misperceptions about Muslim involvement in terror attacks were not primed – have a slightly larger mean.
3.2.5 Exercise 5
Estimate the standard error of the difference in means.
We use the formula again:
\[ \sqrt{ \frac{ s^2_{Yx=0} }{ n_{x=0} } + \frac{ s^2_{Yx=1} }{ n_{x=1} }} \]
And apply this in R:
# squared standard deviation in control group
s2ycontrol <- sd(control_group$outcome)^2
# n in control group
ncontrol <- nrow(control_group)
# squared standard deviation in treatment group
s2ytreatment <- sd(treatment_group$outcome)^2
# n in treatment group
ntreatment <- nrow(treatment_group)
# calculation
se_difference_in_means <-
sqrt( # wrap it all in a square root
(s2ycontrol/ncontrol) + # control group
(s2ytreatment/ntreatment) # treatment group
)
se_difference_in_means
[1] 0.9268306
The standard error of the difference in means is approximately 0.93.
3.2.6 Exercise 6
Calculate the 95 percent confidence interval of the difference in means.
[1] -0.03475478
[1] -3.667931
The confidence interval does not overlap with zero, suggesting that we can reject the null hypothesis at the 95% confidence level.
3.2.7 Exercise 7
Calculate the 99 percent confidence interval of the difference in means. What conclusions can you draw from the result?
[1] 0.5398802
[1] -4.242566
The confidence interval overlaps with zero, suggesting that we cannot reject the null hypothesis at the 99% confidence level.
3.2.8 Exercise 8
Confirm your results (within errors of rounding) using the t.test()
function, first setting conf.level = 0.05
and then conf.level = 0.01
.
Welch Two Sample t-test
data: outcome by terrorism
t = 1.9975, df = 3577.2, p-value = 0.04585
alternative hypothesis: true difference in means between group no priming and group primed is not equal to 0
95 percent confidence interval:
0.03417332 3.66851238
sample estimates:
mean in group no priming mean in group primed
63.03863 61.18729
Welch Two Sample t-test
data: outcome by terrorism
t = 1.9975, df = 3577.2, p-value = 0.04585
alternative hypothesis: true difference in means between group no priming and group primed is not equal to 0
99 percent confidence interval:
-0.5372892 4.2399749
sample estimates:
mean in group no priming mean in group primed
63.03863 61.18729