1.2 Solutions

1.2.1 Exercise 3

Calculate the square root of 1369 using the sqrt() function.

sqrt(1369)
[1] 37

1.2.2 Exercise 4

Square the number 13 using the ^ operator.

13^2
[1] 169

1.2.3 Exercise 5

What is the result of summing all numbers from 1 to 100?

# sequence of numbers from 1 to 100 in steps of 1
numbers_1_to_100 <- seq(from = 1, to = 100, by = 1)
# sum over the vector
result <- sum(numbers_1_to_100)
# print the result
result
[1] 5050

The result is 5050.

1.2.4 Exercise 6

Create the variable income with the values from our fake London sample in R.

# create the income variable using the c() function
income <- c(
  19395, 22698, 40587, 25705, 26292, 42150, 29609, 12349, 18131, 
  20543, 37240, 28598, 29007, 26106, 19441, 42869, 29978,  5333,
  32013, 20272, 14321, 22820, 14739, 17711, 18749)

1.2.5 Exercise 7

Describe London income using the appropriate measures of central tendency and dispersion.

We use the mean for the central tendency of income. The variable is interval scaled and the mean is the appropriate measure of central tendency for interval scaled variables. Our income variable is also normally distributed. Income distributions in most countries are right skewed. Therefore, the central tendency of income is often described using the median.

When asked, e.g., in an exam, to describe the central tendency of an interval scaled variable, use the mean. You can also use the median if you tell us why.

# central tendency of income
mean(income)
[1] 24666.24
# dispersion
sd(income)
[1] 9467.383

Average income in our Berlin sample is 2.466624^{4}. The average difference from that value is 9467.38.

1.2.6 Exercise 8

Compute the standard deviation without using the sd() function.

We do this in several steps. First, we compute the mean.

mean.income <- sum(income) / length(income)

# let's print the mean
mean.income
[1] 24666.24

Second, we take the differences between each individual realisation of income and the mean of income. The result must be a vector with the same amount of elements as the income vector.

# individual differences between each realisation of income and the mean of income
diffs.from.mean <- income - mean.income

# let's print the vector of differences
diffs.from.mean
 [1]  -5271.24  -1968.24  15920.76   1038.76   1625.76  17483.76   4942.76
 [8] -12317.24  -6535.24  -4123.24  12573.76   3931.76   4340.76   1439.76
[15]  -5225.24  18202.76   5311.76 -19333.24   7346.76  -4394.24 -10345.24
[22]  -1846.24  -9927.24  -6955.24  -5917.24

You may be surprised that this works. After all, income is a vector with 25 elements and mean.income is a scalar (only one value). R treats all variables as vectors. It notices that mean.income is a shorter vector than income. The former has 1 element and the latter 25. The vector mean.income is recycled, so that it has the same length as income where each element is the same: the mean of income. If you did not understand this don’t worry. The important thing is that it works.

Our next step is to square the differences from the mean.

# square each element in the diffs.from.mean vector
squared.diffs.from.mean <- diffs.from.mean^2

# print the squared vecto
squared.diffs.from.mean
 [1]  27785971   3873969 253470599   1079022   2643096 305681864  24430876
 [8] 151714401  42709362  17001108 158099441  15458737  18842197   2072909
[15]  27303133 331340472  28214794 373774169  53974882  19309345 107023991
[22]   3408602  98550094  48375363  35013729

We squared each individual element in the vector. Therefore, our new variable squared.diffs.from.mean still has 25 elements.

Squaring a value does two things. First, all values in our vector have become positive. Second, the marginal increase increases with distance, i.e., values that are close to the mean are only somewhat larger whereas values that are further from the mean become way larger.

# a vector of x values from negative 100 to positive 100
a <- seq(from = -100, to = 100, length.out = 200)

# the square of that vector
b <- a^2

We are taking individual differences from the mean. Hence, if a value is exactly at the mean, the difference is zero. The further, the value is from the mean (in any direction), the larger the output value.

We will sum over the individual elements in the next step. Hence, values that are further from the mean have a larger impact on the sum than values that are closer to the mean.

In the next step, we take the sum over our squared deviations from the mean

# sum over squared deviations vector
sum.of.squared.deviations <- sum(squared.diffs.from.mean)

# print the sum
sum.of.squared.deviations
[1] 2151152127

By summing over all elements of a vector, we end up with a scalar. The sum is 2.1511521^{9}.

We divide the sum of squared deviations by \(n-1\). Recall, that \(n\) is the number of observations (elements in the vector) and \(-1\) is our sample adjustment.

# get the variance
var.income <- sum.of.squared.deviations / ( length(income) - 1 )

# print the variance
var.income
[1] 89631339

The squared average deviation from mean income is 8.9631339^{7}.

In the last step, we take the square root over the variance to return to our original units of income.

# get the standard deviation
sqrt(var.income)
[1] 9467.383

The average deviation from mean income in Berlin (2.466624^{4}) is 9467.38.

1.2.7 Exercise 9

Create the variable married with the values from our fake sample. The rep() function used above might be useful.

married <- c(rep("married", times = 16),
             rep("unmarried", times = 9))

1.2.8 Exercise 10

Describe the marriage status of our sample using appropriate measures of central tendency and dispersion.

This is a nominal variable, so we can assess its mode and the proportion in each category.

table(married)
married
  married unmarried 
       16         9 

The mode is ‘married’.

prop.table(table(married))
married
  married unmarried 
     0.64      0.36 

64% of the sample are married, and 36% are unmarried.

1.2.9 Exercise 11

Create the variable education with the values from our fake sample.

education <- c(3, 3, 3, 2, 3, 2, 2, 2, 3, 1, 2, 2, 1, 2, 3, 1, 3, 3, 1, 1, 1, 1, 3, 2, 3)

1.2.10 Exercise 12

Describe the education status of our fake sample using appropriate measures of central tendency and dispersion.

This is an ordinal variable so we can measure its central tendency using the median.

median(education)
[1] 2

Its central tendency is well described using the inter-quartile range.

quantile(education, 0.25)
25% 
  1 
quantile(education, 0.75)
75% 
  3