Chapter 3 T-test for Difference in Means and Hypothesis Testing

3.1 Seminar – in class exercises

We’ll be using data on constituency-level results of the 2016 referendum on membership of the EU. Load in the data and familiarise yourself with the codebook below. This time we are loading a .rds file, so we use readRDS(). Also, load the tidyverse.

# load in brexit data
brexit <- readRDS(url('https://github.com/QMUL-SPIR/Public_files/blob/master/datasets/BrexitResults.rds?raw=true'))

library(tidyverse)

If you want to save this file to your computer once you have downloaded it here, you can do so using saveRDS().

saveRDS(brexit,
        file = "BrexitResults.rds")
Variable Description
pano Press Association constituency number.
ConstituencyName Name of the Constituency according to the Electoral Commission.
BrexitVote Estimated share of votes for Brexit in the EU referendum in each constituency (Hanretty, 2017).
London 0 if the constituency is located outside of London and 1 if it is inside. It’s a factor variable.
Country Scotland, England or Wales.
Region Regional divisions of Great Britain.
Winner15 Party that won in the constituency in 2015.
Con15 Share of votes for the Conservatives in 2015.
Lab15 Share of votes for Labour in 2015.
LD15 Share of votes for the Liberal Democrats in 2015.
SNP15 Share of votes for the Scottish National Party in 2015.
PC15 Share of votes for Plaid Cymru in 2015.
UKIP15 Share of votes for UKIP in 2015.
Green15 Share of votes for the Greens in 2015.
Other15 Share of votes for other parties in 2015.
Majority15 Difference between the winner and the second party in each constituency.
Turnout15 Turnout in each constituency in 2015.
BornUK % of people in the constituency born in the UK (Census 2011).
PercFemale % of women in the constituency (Census 2011).
PopulationDensity Average number of people per square mile in each constituency (Census 2011).

We can get summary statistics of each variable in the dataset by using the summary() function over the dataset.

summary(brexit)
      pano       ConstituencyName     BrexitVote              London   
 Min.   :  1.0   Length:632         Min.   :20.54   Out of London:559  
 1st Qu.:165.8   Class :character   1st Qu.:45.39   London       : 73  
 Median :327.5   Mode  :character   Median :53.74                      
 Mean   :327.5                      Mean   :52.07                      
 3rd Qu.:488.2                      3rd Qu.:60.14                      
 Max.   :650.0                      Max.   :74.96                      
                                                                       
     Country                Region                       Winner15  
 England :533   South East     : 84   Conservative           :330  
 Scotland: 59   North West     : 75   Labour                 :232  
 Wales   : 40   London         : 73   Scottish National Party: 56  
                West Midlands  : 59   Liberal Democrat       :  8  
                Scotland       : 59   Plaid Cymru            :  3  
                East of England: 58   (Other)                :  2  
                (Other)        :224   NA's                   :  1  
     Con15            Lab15             LD15             SNP15      
 Min.   : 4.673   Min.   : 4.506   Min.   : 0.7497   Min.   :33.79  
 1st Qu.:22.174   1st Qu.:17.700   1st Qu.: 2.9757   1st Qu.:44.93  
 Median :41.021   Median :31.280   Median : 4.5944   Median :50.94  
 Mean   :36.662   Mean   :32.350   Mean   : 7.8217   Mean   :50.16  
 3rd Qu.:50.845   3rd Qu.:44.389   3rd Qu.: 8.5784   3rd Qu.:55.64  
 Max.   :65.876   Max.   :81.301   Max.   :51.4909   Max.   :61.91  
 NA's   :1        NA's   :1        NA's   :1         NA's   :573    
      PC15            UKIP15          Green15           Other15       
 Min.   : 3.506   Min.   : 1.033   Min.   : 0.8982   Min.   : 0.1179  
 1st Qu.: 5.686   1st Qu.: 9.807   1st Qu.: 2.6407   1st Qu.: 0.4416  
 Median : 9.780   Median :13.923   Median : 3.6148   Median : 0.7820  
 Mean   :12.792   Mean   :13.488   Mean   : 4.1832   Mean   : 1.5540  
 3rd Qu.:14.099   3rd Qu.:17.284   3rd Qu.: 4.7401   3rd Qu.: 1.3355  
 Max.   :43.932   Max.   :44.432   Max.   :41.8300   Max.   :64.4733  
 NA's   :592      NA's   :18       NA's   :64        NA's   :253      
   Majority15         Turnout15         BornUK        PercFemale   
 Min.   : 0.06315   Min.   :51.26   Min.   :40.73   Min.   :46.95  
 1st Qu.:12.85206   1st Qu.:63.00   1st Qu.:86.42   1st Qu.:50.57  
 Median :24.35840   Median :67.09   Median :92.48   Median :50.98  
 Mean   :24.25556   Mean   :66.38   Mean   :88.15   Mean   :50.93  
 3rd Qu.:34.58712   3rd Qu.:70.33   3rd Qu.:95.42   3rd Qu.:51.39  
 Max.   :72.33029   Max.   :81.94   Max.   :98.02   Max.   :53.14  
                                                                   
 PopulationDensity  
 Min.   :  0.05572  
 1st Qu.:  2.63869  
 Median :  9.59660  
 Mean   : 20.22218  
 3rd Qu.: 31.29706  
 Max.   :146.38461  
                    

3.1.1 T-test (one sample hypothesis test)

A knowledgeable friend tells you that the average population of women in British constituencies is 52%. You would like to know whether she is right. To that end, you assume that her claim is the population mean. Unfortunately, you don’t have access to the data we have here, so you go and collect your own data, on 400 randomly sampled Britsh constituencies.

brexit_400 <- sample_n(brexit, 400)

You then estimate the mean of PercFemale in your sample. If the difference is large enough, so that it is unlikely that it could have occurred by chance alone, you can probably reject your friend’s claim.

So, first you take the mean of the PercFemale variable. Bear in mind, your answer will be slightly different.

mean(brexit_400$PercFemale)
[1] 50.91976

Your friend’s claim is quite close. Substantively, the difference between your friend’s claim and your estimate is small, but you could still find that the difference is statistically significant (i.e. that it’s a noticeable systematic difference).

Because we do not have information on all constituencies, your 50.92 is an estimate and the true population mean – the population here would be all constituencies – may be 52% as your friend claims. You can test this statistically.

In statistics jargon: you would like to test whether your estimate is statistically different from the 52% figure (the null hypothesis) suggested by your friend. Put differently, you would like to know the probability that you estimate 50.92 if the true mean of all constituencies is 52%.

Recall that the standard error of the mean (which is the estimate of the true standard deviation of the population mean) is estimated as:

\[ \frac{s_{Y}}{\sqrt{n}} \]

Before you estimate the standard error, you need to get \(n\) (the number of observations). You can save the number of observations in an object that you call n.

n <- length(brexit_400$PercFemale)
n
[1] 400

With the function length(brexit_400$PercFemale) you get all observations in the data. Now, take the standard error of the mean.

# standard error calculation
se.y_bar <- sd(brexit_400$PercFemale) / sqrt(n)

We know that 1 standard error is one average deviation from the population mean. The sampling distribution is approximately normal. 95 percent of the observations under the normal distribution are within 2 (1.96, to be precise) standard deviations of the mean.

You can construct the confidence interval within which the population mean lies with 95 percent probability in the following way. First, take the mean of PercFemale. That’s the sample mean and not the population mean. Second, go 1.96 standard errors to the left of it. This is the lower bound of the confidence interval. Third, go 1.96 standard deviations to the right of the sample mean. That is the upper bound of the confidence interval.

The 95 percent confidence interval around the sample means gives the interval within which the population mean lies with 95 percent probability.

You want to know what the population mean is, so you want the confidence interval to be as narrow as possible. The narrower the confidence interval, the more precise your estimate of the population mean. For instance, saying the population mean of is between 48% and 52% is more precise than saying the population mean is between 45% and 55%. Both would give the same mean estimate of 50%, though.

You construct the confidence interval with the standard error. That means, the smaller the standard error, the more precise the estimate. The formula for the 95 percent confidence interval is:

\[ \bar{Y} \pm 1.96 \times SE(\bar{Y}) \]

You can now construct your confidence interval. Your sample is large enough to assume that the sampling distribution is approximately normal. So, go \(1.96\) standard deviations to the left and to the right of the mean to construct your \(95\%\) confidence interval. Remember that your results will be slightly different.

# lower bound
lb <- mean(brexit_400$PercFemale) - 1.96 * se.y_bar
# upper bound
ub <- mean(brexit_400$PercFemale) + 1.96 * se.y_bar
# results (the population mean lies within this interval with 95% probability)
lb # lower bound
[1] 50.83722
mean(brexit_400$PercFemale) # sample mean
[1] 50.91976
ub # upper bound
[1] 51.00231

You can make this look a little more like a table like so:

ci <- cbind(lower_bound = lb, 
            mean = mean(brexit_400$PercFemale), 
            upper_bound = ub)
ci
     lower_bound     mean upper_bound
[1,]    50.83722 50.91976    51.00231

The cbind() function stands for column-bind and creates a \(1\times3\) matrix.

So you are \(95\%\) confident that the average proportion of females in UK constituencies is between 50.84% and 51% . You can see that you are quite certain about your estimate and can be quite confident in rejecting your friend’s claim.

A different way of describing our finding is to emphasize the logic of (hypothetical) repeated sampling. In a process of repeated sampling you can expect that the confidence interval that we calculate for each sample will include the true population value \(95\%\) of the time.

3.1.1.1 The t value

You can now estimate the t value. Recall that your friend claimed that the population mean was 52%. This is the null hypothesis that you wish to falsify. You estimated something else in your data, namely 50.9197645. The t value is the difference between your estimate (the result you got by looking at data) and the population mean under the null hypothesis, divided by the standard error of the mean.

\[ \frac{ \bar{Y} - \mu_0 } {SE(\bar{Y})} \]

Where \(\bar{Y}\) is the mean in your data, \(\mu_0\) is the population mean under the null hypothesis and \(SE(\bar{Y})\) is the standard error of the mean.

Let’s compute this. Your results will be slightly different.

t.value <- (mean(brexit_400$PercFemale) - 52) / se.y_bar

t.value
[1] -25.6506

Look at the formula until you understand what is going on. In the numerator you take the difference between your estimate and the population mean under the null hypothesis. In expectation that difference should be 0 — assuming that the null hypothesis is true. The larger that difference, the less likely it is that the null hypothesis is true.

Dividing by the standard error transforms the units of the difference into standard deviations. Before transforming the units, the difference was in the units of the variable itself. By dividing by the standard error, you have normalised the variable. Its units are now standard deviations from the mean.

Assume that the null hypothesis is true. In expectation the difference between your estimate in the data and the population mean should be 0 standard deviations. The more standard deviations your estimate is away from the population mean under the null hypothesis, the less likely it is that the null hypothesis is true.

Within 1.96 standard deviations from the mean lie 95 percent of all observations. That means, it is very unlikely that the null hypothesis is true, if the difference that we estimated is further than 1.96 standard deviations from the mean. How unlikely? For this, you would need the p value, for the exact probability. However, the probability is less than 5 percent if the estimated difference is more than 1.96 standard deviations from the population mean under the null hypothesis.

Back to the t value. We estimated a t value of -25.6506021. That means that a sample estimate of 50.9197645 is 25.6506021 standard deviations below the population mean under the null hypothesis — which is 52%.

This t value suggests that your sample estimate would be 25.6506021 standard deviations below the population mean under the null. That is very unlikely. You can very confidently reject the null hypothesis in this case. Your friend is very unlikely to be right, based on your findings.

3.1.1.2 The p value

Let’s estimate the precise p-value by calculating how likely it would be to observe a t-statistic of -25.6506021 from a t-distribution with n - 1 (399) degrees of freedom.

The function pt() can be used to get the p-value associated with a given t score. However, t-scores can be negative, which confuses the calculation. Because we are dealing with symmetric, two-tailed distributions, we can account for this by just taking the absolute value using abs(). This removes the negative sign. Then, we set lower.tail = FALSE to get the probability to the right – that is, of all values above – of our t value. We then multiply this by two to get our two-tailed p value. That is, we get the probability that we see a t.value in the right tail or in the left tail of the distribution.

# calculate p value
p.value <- 2* pt(abs(t.value), df = (n-1), lower.tail = FALSE)

p.value
[1] 1.987977e-86

Our p-value is so small that R isn’t even displaying it as a decimal number, but using scientific notation instead, the shorten the output. This number is very close to zero. It is extremely unlikely that our sample would give us the estimated mean that it gave us if the true population mean were 52%, as your friend said.

Let’s verify our result using the the t-test function t.test(). The syntax of the function is:

t.test(formula, mu, alt, conf.level)

Lets have a look at the arguments.

Arguments Description
formula Here, we input the vector that we calculate the mean of. For the one-sample t test, in our example, this is the mean of PercFemale. Below we talk about two-sample tests.
mu Here, we set the null hypothesis. The null hypothesis is that the true population mean is 52%. Thus, we set mu = 52.
alt There are two alternatives to the null hypothesis that the difference in means is zero. The difference could either be smaller or it could be larger than zero. To test against both alternatives, we set alt = "two.sided".
conf.level Here, we set the level of confidence that we want in rejecting the null hypothesis. Common confidence intervals are: 95% (0.95), 99% (0.99), and 99.9% (0.999)—they correspond to alpha levels of 0.05, 0.01 and 0.001 respectively.
t.test(brexit_400$PercFemale, 
       mu = 52, 
       alt = "two.sided",
       conf.level = 0.95) 

    One Sample t-test

data:  brexit_400$PercFemale
t = -25.651, df = 399, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 52
95 percent confidence interval:
 50.83697 51.00256
sample estimates:
mean of x 
 50.91976 

The results are similar. This p value might look quite different, but that’s because 2.2e-16 is the smallest value R’s statistical calculations provide. Beyond this, it is of no use to us to know how small a number is, because this number is already so close to 0. We can conclude that you are able to reject the null hypothesis suggested by your friend that the population mean is equal to 52% with considerable confidence.

3.1.1.3 Critical Values

In social sciences, we usually operate with an alpha level of 0.05. That means, we reject the null hypothesis if the p value is smaller than 0.05. Or put differently, we reject the null hypothesis if the 95 percent confidence interval does not include the population mean under the null hypothesis — which is always the case if our estimate is further than two standard errors from the mean under null hypothesis (usually 0).

We said earlier that the critical value is 1.96 for an alpha level of 0.05. That is true in large samples where the distribution of the t value follows a normal distribution. 95 percent of all observations are within 1.96 standard deviations of the mean.

The green area under the curve covers 95 percent of all observations. There are 2.5 percent in each tail. We reject the null hypothesis if our estimate is in the tails of the distribution. It must be further than 1.96 standard deviations from the mean. But how did we know that 95 percent of the area under the curve is within 1.96 standard deviations from the mean?

Separate the curve in your mind into 3 pieces. The left tail covers 2.5 percent of the area under the curve. The central green section covers 95 percent and the right tail again 2.5 percent. Now think of these as cumulative probabilities. The left tail ends at 2.5 percent cumulative probability. The green area ends at 97.5 percent cumulative probability and the right tail ends at 100 percent.

The critical value is were the left tail ends or the right tail starts (looking at the curve from left to right). Let’s get the value where the cumulative probability is 2.5 percent—where the left tail ends.

qnorm(0.025, mean = 0, sd = 1)
[1] -1.959964

If you look at the x-axis of our curve that is indeed where the left tail ends. We add a red dot to our graph to highlight it.

Now, let’s get the critical value of where the right tail starts. That is at the cumulative probability of 97.5 percent.

qnorm(0.975, mean = 0, sd = 1)
[1] 1.959964

As you can see, this is the same number, only positive instead of negative. That’s always the case because the normal distribution is symmetrical. Let’s add that point in blue to our graph.

This is how we get the critical value for the 95 percent confidence interval. In the past you would have had to look up critical values in critical values tables at the end of statistics textbooks (you can find the tables in Kellstedt and Whitten, for example).

As you can see our red and blue dots are the borders of the green area, the 95 percent interval around the mean. You can get the critical values for any other interval (e.g., the 99 percent interval) similar to what we did just now.

We now do the same for the t distribution. In the t distribution, the critical value depends on the shape of the t distribution which is characterised by its degrees of freedom. Let’s consider a t distribution with 5 degrees of freedom.

Although, it looks like a standard normal distribution, it is not. The t with 5 degrees of freedom has fatter tails. We show this by overlaying the t with a standard normal distribution.

The red area is the difference between the standard normal distribution and the t distribution with 5 degrees of freedom.

The tails are fatter and that means that the probabilities of getting a value somewhere in the tails is larger. Let’s calculate the critical value for a t distribution with 5 degrees of freedom.

# value for cumulative probability 95 percent in the t distribution with 5 degrees of freedom
qt(0.975, df = 5)
[1] 2.570582

See how much larger that value is than 1.96. Under a t distribution with 5 degrees of freedom 95 percent of the observations around the mean are within the interval from negative 2.5705818 to positive 2.5705818.

Let’s illustrate that.

Remember the critical values for the t distribution are always more extreme or similar to the critical values for the standard normal distribution. If the t distribution has few degrees of freedom, the critical values (for the same percentage area around the mean) are much more extreme. If the t distribution has many degrees of freedom, the critical values are very similar.

3.1.2 T-test for difference in means

Sometimes, rather than simply assessing whether a mean differs from our hypothesis, we want to establish whether two groups have significantly different means. For example, we could be interested in whether there is a difference in average Brexit vote share between constituencies inside and outside of London.

We want to get the mean of BrexitVote for each location of constituency individually. This can be done by subsetting.

# mean in London
mean_london <- mean(brexit$BrexitVote[brexit$London == "London"])

# mean out of London
mean_nolondon <- mean(brexit$BrexitVote[brexit$London == "Out of London"])

# or use filter()!
mean_london <- mean(filter(brexit, London == "London")$BrexitVote)
mean_nolondon <- mean(filter(brexit, London == "Out of London")$BrexitVote)

# or, even simpler, do it in two stages
london <- filter(brexit, London == "London") # subset to just London
not_london <- filter(brexit, London == "Out of London") # subset to just out of London

mean_london <- mean(london$BrexitVote) # get mean in London
mean_nolondon <- mean(not_london$BrexitVote) # get mean out of London

Here, the ‘magrittr pipe’ could make this code more efficient. If you want to find out more, see Going further with the tidyverse: dplyr, magrittr and ggplot2.

Simply subtracting one from the other gives us the difference in means.

# difference in means

mean_london - mean_nolondon
[1] -13.43873

To establish whether this difference in means is a meaningful, systematic or ‘statistically significant’ difference, the t-test is the appropriate test statistic. Our interval-level dependent variable is BrexitVote, which is the percentage of people who voted to leave. Let’s look at the distribution of this variable.

brexit_hist <- ggplot(data = brexit, aes(BrexitVote)) +
  geom_histogram() +
  labs(x = "Constituency-level percentage voting leave",
       y = "Frequency",
       title = "Distribution of constituency-level Brexit vote share") +
  theme_minimal()

brexit_hist
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The percentage voting leave in different constituencies is quite clustered in 50-60%, but lots of constituencies also had leave vote shares lower than 50%, and some had vote shares higher than 60%.

Our binary independent variable is London which tells us whether a constituency is in London or not. Let’s look at how it is distributed.

london_bar <- ggplot(data = brexit, aes(London)) +
  geom_bar() +
  labs(x = "Constituency location",
       y = "Frequency",
       title = "Number of constituencies inside and outside of London") +
  theme_minimal()

london_bar

As we would expect, the vast majority of constituencies are not considered to be in London.

If the ggplot2 code is still unclear to you, the information in Going further with the tidyverse: dplyr, magrittr and ggplot2 should help.

We use formula syntax to see how our dependent variable (here BrexitVote) varies between the levels of our independent variable (here London). The dependent variable goes before a tilde (~) and the independent variable comes after. So what we are doing here is taking a continuous variable and comparing its means in different groups of a binary nominal variable to see if there is a significant, systematic difference between those groups.

t.test(BrexitVote ~ London, # formula y ~ x
       data = brexit, # dataset where the variables are found
       mu = 0, # difference under the null hypothesis
       alt = "two.sided",  # two sided test (difference in means could be smaller or larger than 0)
       conf.level = 0.95) # confidence interval

    Welch Two Sample t-test

data:  BrexitVote by London
t = 8.3366, df = 83.455, p-value = 1.338e-12
alternative hypothesis: true difference in means between group Out of London and group London is not equal to 0
95 percent confidence interval:
 10.23275 16.64471
sample estimates:
mean in group Out of London        mean in group London 
                   53.62497                    40.18624 

Let’s interpret the results you get from t.test(). The first line tells us which groups we are comparing. In our example: Do constituencies in London have different average vote shares for leave compared to those outside of London?

In the following line you see the t-value, the degrees of freedom and the p-value. Knowing the t-value and the degrees of freedom you can check in a table of t distributions how likely you were to observe this data, if the null-hypothesis were true. The p-value gives you this probability directly. For example, a p-value of 0.02 would mean that the probability of seeing this data given that there is no difference in average support for leave in the population, is 2%. Here the p-value is much smaller than this: 1.338e-12 = 0.000000000001338!

In the next line you see the 95% confidence interval because we specified conf=0.95. If you were to take 100 samples and in each you checked the means of the two groups, 95 times the difference in means would be within the interval you see there.

At the very bottom you see the means of the dependent variable by the two groups of the independent variable. These are the means that we estimated above. In our example, you see the mean Brexit vote share in constituencies in London, and in constituencies outside of London.

Furthermore, note that in the t test for the differences in means, the degrees of freedom depend on the variances in each group.

3.1.3 Extended example: difference in means as an average treatment effect

In some cases, a difference in means can be considered to be an average treatment effect. This means that the difference in means captures a change brought about, on average, by some treatment. For example, in a drugs trial, the average blood pressure in those who were randomly assigned to take the drug might be significantly lower than those in the ‘control group’, who were given some placebo.

Note that in the example we use here, the randomised experimental setting means that we can talk confidently about the difference in means as an ‘effect’ in a causal sense. This is not always the case, but frequently we will nonetheless refer to differences in means observed in normal survey data as average effects, too.

Consider the following example, recently published in the Journal of Experimental Political Science. The abstract desribes the study as follows:

Muslim Americans constitute one of the United States’ most vulnerable minority groups, facing frequent discrimination from both the public and the government. Despite this vulnerability, few studies evaluate interventions for reducing prejudice against Muslim Americans. Building from an insightful literature on the sources of prejudice against Muslim Americans, this paper tests whether attitudes can be improved with information countering misperceptions of the community as particularly foreign, threatening, and disloyal to the United States. The experimental treatment modestly improved attitudes…

By ‘modestly improved attitudes’, the researchers mean that the people who were given information ‘countering misperceptions of the community’ reported (slightly) more favourable attitudes towards Muslim Americans on average. There was a significant difference in means.

Download the dataset from this experiment here: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/GKW5Q5. The file you want is called Muslim_Prejudice_Data_SRW.tab. Click the Access File button and select ‘RData Format’ from the dropdown menu. Save the file in the same folder as your POLM083 R-Project, and name it prejudice.RData. You can then load the data into RStudio:

load("prejudice.RData")

This will load the dataset in as table. The rest of the code assumes you keep this name, but if you want to, you can overwrite it and then make the necessary adjustments to the code.

Whether or not people were ‘treated’ with the information is captured by the treatment_srw variable, but this currently has three levels where it should have two, because both 1 and 2 refer to treatment, where 0 refers to control. We can make this clearer using mutate(), with case_when(), to create a new variable. Take a look at the documentation for case_when if you want to understand how it works.

Also, these code blocks might make more sense after you have worked through Going further with the tidyverse: dplyr, magrittr and ggplot2.

# create new treatment indicator
table <- mutate(
  table, # call the dataset
  treatment = # new variable called just 'treatment'
    case_when(
      treatment_srw == 1 | treatment_srw == 2 ~ # for those rows where treatment_srw is 1 OR 2
        "treated", # new variable will take value 'treated'
      treatment_srw == 0 ~ # for those rows where treatment_srw is 0
        "control" # new variable will take value 'control'
    )
)

Similarly, our outcome variable is not ready. Instead, there are two variables we are interested in – srw_therm_1_h_1 and srw_therm_2_h_1 – which need to be merged into one. We can use the same approach here:

# create new outcome variable
table <- mutate(
  table, # call the dataset
  outcome = # new variable called 'outcome'
    case_when(
      is.na(srw_therm_1_h_1) ~ # for those rows where srw_therm_1_h_1 is missing
        srw_therm_2_h_1, # new variable will take value of srw_therm_2_h_1
      is.na(srw_therm_2_h_1) ~ # for those rows where srw_therm_2_h_1 is missing
        srw_therm_1_h_1 # new variable will take value of srw_therm_1_h_1
    )
)

Now we have everything we need. First, we can calculate the difference in means:

# separate two groups
treatment_group <- filter(table, treatment == "treated")
control_group <- filter(table, treatment == "control")

difference_in_means <-
  mean(treatment_group$outcome, na.rm = TRUE) - # use na.rm = TRUE to remove NAs from calculation
  mean(control_group$outcome, na.rm = TRUE)

difference_in_means
[1] 4.893189

The difference in means is approximately 4.89. But is this a significant difference? We know from the lecture that we can calculate the standard error of the difference in means using the formula

\[ \sqrt{ \frac{ s^2_{Yx=0} }{ n_{x=0} } + \frac{ s^2_{Yx=1} }{ n_{x=1} }} \] In our case, this can be rewritten as

\[ \sqrt{ \frac{ s^2_{Ycontrol} }{ n_{control} } + \frac{ s^2_{Ytreatment} }{ n_{treatment} }} \]

We can apply this to our data using R:

# first, remove missing values to avoid complications
control_group <- drop_na(control_group, # dataset
                         outcome) # variable to determine missingness
treatment_group <- drop_na(treatment_group, # dataset
                           outcome) # variable to determine missingness

# squared standard deviation in control group
s2ycontrol <- sd(control_group$outcome)^2

# n in control group
ncontrol <- nrow(control_group)

# squared standard deviation in treatment group
s2ytreatment <- sd(treatment_group$outcome)^2

# n in treatment group
ntreatment <- nrow(treatment_group)

# calculation
se_difference_in_means <-
  sqrt( # wrap it all in a square root
    (s2ycontrol/ncontrol) + # control group
      (s2ytreatment/ntreatment) # treatment group
  )

se_difference_in_means
[1] 1.004268

This is the standard error of our difference in means. Given our large sample size, we can base our confidence interval calculation on the normal distribution, meaning the 95% confidence interval will range from 1.96 standard errors above our estimated difference in means to 1.96 standard errors below it.

# upper bound
difference_in_means + 1.96*se_difference_in_means
[1] 6.861555
# lower bound 
difference_in_means - 1.96*se_difference_in_means
[1] 2.924823

We can see, then, that the difference in means is significant at the 95% level. The 95% confidence interval does not include zero. We can be 95% confident that exposure to the positive information countering misperceptions about Muslim Americans increased positive feelings towards this community by between roughly 2.9 and 6.7 points (on a 0-100 ‘feeling thermometer’).

We can confirm these results by running a simple two-sample t-test, as above, in R:

t.test(outcome ~ treatment,
       data = table)

    Welch Two Sample t-test

data:  outcome by treatment
t = -4.8724, df = 2239, p-value = 1.18e-06
alternative hypothesis: true difference in means between group control and group treated is not equal to 0
95 percent confidence interval:
 -6.862583 -2.923795
sample estimates:
mean in group control mean in group treated 
             58.85105              63.74423 

If you see negative numbers where you expected to see positive numbers in this output, that is just because the t-test function is calculating the difference in the opposite direction. It is subtracting the treatment mean from the control mean. Otherwise, the results are equivalent, within rounding errors.

3.1.4 Homework exercises

Load the new Muslim prejudice data:

load("prejudice.RData")

Run the following code to create the outcome variable again:

# create new outcome variable
table <- mutate(
  table, # call the dataset
  outcome = # new variable called 'outcome'
    case_when(
      is.na(srw_therm_1_h_1) ~ # for those rows where srw_therm_1_h_1 is missing
        srw_therm_2_h_1, # new variable will take value of srw_therm_2_h_1
      is.na(srw_therm_2_h_1) ~ # for those rows where srw_therm_2_h_1 is missing
        srw_therm_1_h_1 # new variable will take value of srw_therm_1_h_1
    )
)
  1. In the Muslim prejudice study, the researcher explains:

Regarding Muslim Americans, media and political actors often promote misperceptions, particularly about Muslims and terrorism… As a result, I evaluated the treatment’s robustness to an environment in which fear of terrorism was present by randomly assigning half of the respondents to a question priming terrorism threats.

This means that this terrorism priming is another treatment variable whose average effect we can estimate through a difference in means. Turn the variable terrorism into a factor variable whose value is “no priming” if it is currently 0 and “primed” if it is currently 1.

  1. Delete all rows with missing values on the outcome variable in your original dataset (called table unless you changed it).

  2. Visualise the distribution of outcome conditional on terrorism.

  3. Estimate the difference in means, or average treatment effect, of terrorism on the outcome.

  4. Estimate the standard error of the difference in means.

  5. Calculate the 95 percent confidence interval of the difference in means. What conclusions can you draw from the result?

  6. Calculate the 99 percent confidence interval of the difference in means. What conclusions can you draw from the result?

  7. Confirm your results (within errors of rounding) using the t.test() function, first setting conf.level = 0.95 and then conf.level = 0.99.