6.2 Solutions
library(tidyverse)
world.data <- read.csv("https://raw.githubusercontent.com/QMUL-SPIR/Public_files/master/datasets/QoG2012.csv")
6.2.1 Exercises
- Find the means of political stability in countries that (1) were former colonies, (2) were not former colonies. Hint: it will be useful to turn
former_col
into a factor variable for this, something we have done in previous seminars. - Is the the difference in means statistically significant at an alpha level of 0.05? And at 0.01?
- Check the claim that the true population mean of
undp_hdi
is 0.85, reporting and interpreting the t statistic and the p value. - An angry citizen who wants to defund international development claims that countries that were former colonies have reached 75% of the level of wealth of countries that were not colonised. Assess this claim statistically.
6.2.1.1 Exercise 1
Find the means of political stability in countries that (1) were former colonies, (2) were not former colonies. Hint: it will be useful to turn former_col
into a factor variable for this, something we have done in previous seminars.
# turn former_col into a factor
world.data$former_col <- factor(world.data$former_col)
# check it worked
str(world.data$former_col)
Factor w/ 2 levels "0","1": 1 1 2 1 2 2 1 2 1 1 ...
# tidyverse grouped means
colony_means <- # assign to object
world.data %>% # pipe dataset
group_by(former_col) %>% # group by whether countries were colonies
summarise(mean = mean(wbgi_pse), # get mean of pol stab for each group
n = n()) # also get number of observations in each group
colony_means
# A tibble: 2 x 3
former_col mean n
<fct> <dbl> <int>
1 0 0.286 72
2 1 -0.232 122
The average level of political stability in countries that were not colonised is 0.2858409. Mean political stability in countries that were colonised is -0.231612. The variable political stability wbgi_pse
is an index. Larger values correspond with more political stability. We see that political stability is higher in countries that were not colonised.
Looking at this difference, we might conclude that the legacy of colonialism is still visible today and manifests itself in lower political stability. Let’s investigate further to see whether the difference in means is statistically significant.
6.2.1.2 Exercise 2
Is the the difference in means statistically significant at an alpha level of 0.05? And at 0.01?
This calls for a t test.
# filter dataset into colonies and non-colonies
colonies <- world.data %>%
filter(former_col == 1)
non_colonies <- world.data %>%
filter(former_col == 0)
# t test for difference in means
t.test(colonies$wbgi_pse,
non_colonies$wbgi_pse,
mu = 0,
alt = "two.sided")
Welch Two Sample t-test
data: colonies$wbgi_pse and non_colonies$wbgi_pse
t = -3.4674, df = 139.35, p-value = 0.0006992
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.8125053 -0.2224004
sample estimates:
mean of x mean of y
-0.2316120 0.2858409
# or
t.test(world.data$wbgi_pse ~ world.data$former_col,
mu = 0,
alt = "two.sided")
Welch Two Sample t-test
data: world.data$wbgi_pse by world.data$former_col
t = 3.4674, df = 139.35, p-value = 0.0006992
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.2224004 0.8125053
sample estimates:
mean in group 0 mean in group 1
0.2858409 -0.2316120
# or
t.test(wbgi_pse ~ former_col,
data = world.data,
mu = 0,
alt = "two.sided")
Welch Two Sample t-test
data: wbgi_pse by former_col
t = 3.4674, df = 139.35, p-value = 0.0006992
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.2224004 0.8125053
sample estimates:
mean in group 0 mean in group 1
0.2858409 -0.2316120
As we can see, the difference is not only large it is also a noticeable systematic difference. The p value is small. Smaller than the conventional alpha level of 0.05, and the stricter level of 0.01. We can also look at the confidence interval which ranges from 0.2224004 to 0.8125053. So, if we were to repeatedly sample, the confidence interval of each sample would include the true population mean \(95\%\) of the time. Or more intuitively, we are \(95\%\) confident that the average population level of political stability is within our interval.
6.2.1.3 Exercise 3
Check the claim that the true population mean of undp_hdi
is 0.85, reporting and interpreting the t statistic and the p value.
Let’s estimate the mean from our sample
summary(world.data$undp_hdi)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.2730 0.5390 0.7510 0.6982 0.8335 0.9560 19
Our estimate is 0.6982. The claim is that it is 0.85.
Null hypothesis: The true population mean of the human development index is: 0.85. Alternative hypothesis: The true population mean is different from 0.85.
We pick an alpha level of 0.05 for our test.
t.test(world.data$undp_hdi,
mu = 0.85,
alt = "two.sided")
One Sample t-test
data: world.data$undp_hdi
t = -11.139, df = 174, p-value < 0.00000000000000022
alternative hypothesis: true mean is not equal to 0.85
95 percent confidence interval:
0.6713502 0.7251298
sample estimates:
mean of x
0.69824
The p-value is lower than 0.05 and hence we reject the null hypothesis (hdi is 0.85). Looking at our confidence interval, we expect that if we were to repeatedly sample, the population mean would fall into the interval 0.6713502 to 0.7251298 \(95\%\) of the time.
6.2.1.4 Exercise 4
An angry citizen who wants to defund international development claims that countries that were former colonies have reached 75% of the level of wealth of countries that were not colonised. Assess this claim statistically.
First, we drop missings from wdi_gdpc.
world.data <- drop_na(world.data, wdi_gdpc)
The null hypothesis is that there is no difference between the level of wealth in countries that were former colonies and 0.75 times the level of wealth in countries that were not former colonies. This is tricky to assess because the citizen’s claim is not actually in our data. At the same time we don’t know the true level of wealth in countries that were not colonised, we only have an estimate. We have to manipulate our data to get there.
Filter the dataset into colonies and non-colonies and adjust the level of wealth in the group of countries that were not colonised down to the citizen’s claim. The citizen’s claim is that we should then not find a difference in means between the two groups (“was colonised” and “not colonised”) anymore.
# do the filter again because of missings removed
colonies <- world.data %>%
filter(former_col == 1)
non_colonies <- world.data %>%
filter(former_col == 0)
# adjust wealth in non-colonies
non_colonies <- non_colonies %>%
mutate(angry_gdp = wdi_gdpc * 0.75)
# t test to check for significant difference
t.test(colonies$wdi_gdpc,
non_colonies$angry_gdp,
mu = 0,
alt = "two.sided")
Welch Two Sample t-test
data: colonies$wdi_gdpc and non_colonies$angry_gdp
t = -3.6218, df = 127.72, p-value = 0.0004206
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-8832.37 -2591.29
sample estimates:
mean of x mean of y
6599.714 12311.544
Clearly, we can reject the citizen’s claim. Our p value implies that the probability that we see this huge difference in our data, given that there really is no difference, is \(0.04\%\) (0.004 times 100). Our conventional alpha level is \(5\%\).