5. Mean Median Mode & Sample Standard Deviation

Two videos to help you do your Blackboard Homework on Mean Median Mode and Standard Deviation

Video 1 (15 minutes. 4 homework style questions with solutions): 
Question 1. Finding the median.
Question 2.  A question about about unimodal, bimodal, and uniform distributions.
Question 3. Finding the sample mean $\bar{x}$.
Question 4. Finding the sample standard deviation $S_x$ using the computer software program R.

 

 

Video 2 (9 minutes. 1 homework style question with the solution): 
Question 5. A question about finding the sample standard deviation $S_x$ by hand (without using a computer to do the calculation).

 

Note. Some of the mathematics might not display properly on your cell phone. If this is the case, try viewing in landscape mode, or better yet, on a regular computer screen.

Note. An R script for mean, median, and standard deviation is at the  bottom of this page.

Notation

Suppose  s = \{s_1, s_2, \ldots, s_n \} is a sample of size n taken from a population S and x is a random variable defined on S (so that is also defined on s).To save space, we let:

 x_1 = x(s_1), x_2 = x(s_2), \ldots, x_n = x(s_n)

and write

 x = x_1, x_2, \ldots, x_n

instead of

 \{x(s_1), x(s_2), \ldots, x(s_n) \}

Example.  Suppose s is the sample consisting of three students Abe, Ben, and Chris, who are taking a statistics class together. Let x be the random variable which gives the final exam score and that Abe scored 90, Ben scored 93, and Chris scored 99, so
x(Abe) = 90,
x(Ben) = 93,
x(Chris) = 99.
To save space, we can write

x = 90, 93, 99

especially if we don’t care who scored what.

Mean of a sample

The mean is the average.

If  x = x_1, x_2, \ldots, x_n then the (sample) mean of x is:

 \bar{x} = \dfrac{x_1 + x_2 + \cdots + x_n}{n}

 \bar{x} is pronounced “x bar”. The bar on top signifies we are taking the mean of a sample.

Keep in mind, that  \bar{x} is the mean of the random variable on a sample.

Question 1. Suppose x = 90, 93, 99. Find  \bar{x} .

Answer.

 \bar{x} = \dfrac{90 + 93 + 99}{3} = \dfrac{282}{3} = 94

Question 2. Suppose y = -2, 6, 8, 0. Find  \bar{y} .

Answer.

 \bar{y} = \dfrac{-2 + 6 + 8 + 0}{4} = \dfrac{12}{4} = 3

Standard Deviation of a Sample

A sample’s standard deviation measures the average amount of variation in that sample. The variation is relative to the mean of that sample  \bar{x} .
If x is random variable with  x = x_1, x_2, \ldots, x_n then the sample standard deviation of x is:

 S_x = \sqrt{\dfrac{(x_1 - \bar{x})^2 + (x_2 - \bar{x})^2 + \cdots + (x_n - \bar{x})^2}{n-1}}

The S in  S_x stands for “sample standard deviation” and the x is the name of random variable.

Question 3. Suppose y = -2, 6, 8, 0 Find  S_y .

Answer.

We’ve already calculated:

 \bar{y} = 3

in  Question 2. Then, using the formula for sample standard deviation (with the random variable being y, instead of x):

S_y = \sqrt{\dfrac{(y_1 - \bar{y})^2 + (y_2 - \bar{y})^2 + \cdots + (y_n - \bar{y})^2}{n-1}}

we get:

 S_y = \sqrt{ \dfrac{(-2 - 3)^2 + (6 - 3)^2 + (8-3)^2 + (0 - 3)^2}{4 - 1}}

 = \sqrt{ \dfrac{(-5)^2 + (3)^2 + (5)^2 + (- 3)^2}{3}}

 = \sqrt{ \dfrac{25 + 9 + 25 + 9}{3}}

 = \sqrt{ \dfrac{68}{3}}

 = 4.76   (to two decimal places)

Median

The median of  x = x_1, x_2, \ldots, x_n is the value  M_x for which half the data will be greater than  M_x and half the data will be less than  M_x .

The formula for median depends on whether you have an even or an odd amount of data.

Question 4.  Find the median of  x = 3, 1, 1, 7, 4 .

Answer. We order the data from smallest to largest:

 1, 1, 3, 4, 7

and since we have an odd amount of data (n = 5) we select the value in the middle, which is 3. So, the median is 3.  See the Figure below for how to do the calculation.

Question 5. Find the median of  x = 8, 3, 1, 1, 7, 4 .

Answer. We order the data from smallest to largest:

 1, 1, 3, 4, 7, 8

and since we have an even amount of data (n = 6) the median is the value half way between the lower and upper halves of the data, which is 3.5. So, the median is 3.5. See the Figure below for how to do the calculation.

So, when we have an even amount of data, to find the median requires a little extra work, in that we have to calculate the midpoint by dividing by 2, as was done in the above Figure.

Mode

For us, the mode will be the most frequent data value.

Example. The mode of x = 8, 3, 1, 1, 7, 4 is 1 because 1 is the most frequent data value, it happens with frequency 2.

Example. The mode of x = 1, 2, 3, 3, 3, 3, 4, 5, 5 is 3 because 3 is the most frequent data value, it happens with frequency 4.

The mode tries to capture the most frequent value a random variable takes on. However, how to do that is not always clear: see “Tricky Example 1” .

Tricky Example 1. The mode of

x = 1.01, 1.02, 1.03, 1.04, 2, 2

is 2, if we use the “most frequent data value” definition. On the other hand, the values

1.01, 1.02, 1.03, 1.04,

are all very close together, so maybe the mode should be their mean, which would be 1.025.

I won’t ask any questions like Tricky Example 1 on your homework or exams.

The expression “mode” is most commonly used to indicate values about which the random variable clusters: see “Trick Example 2”.

Tricky Example 2.  If

x =   0.98, 0.99, 1.01, 1.02,      2, 3, 4,       5.49, 5.50, 5.51

we might say this data is has two modes, one at x = 1.00 and x = 5.50.

I won’t ask any questions like Tricky Example 2 on your homework or exams.

Multimodal Data

When data has two modes we say it is bimodal. If it has one mode we say it is unimodal. Some data has no modes. When data has more than one mode we say it is multimodal.

I will ask questions about the following Figures which illustrate what is meant by, no modes,  unimodal and bimodal.

 

Using R to calculate the mean and sample standard deviation.

Question 6 (Do the calculation using R). If $x = 5, 4, 4, 11$, calculate

  • the mean of $x$, denoted by $\bar{x}$,
  • the median of x,
  • the sample standard deviation of $x$, denoted by $s_x$.

Answer to Question 6.

Copy and paste the following script into R:

 #R script to calculate mean and sd of x 
x = c(5, 4, 4, 11) 
mean(x) 
median(x)
sd(x)  # sd is the R command for sample standard deviation

R should output the answers: