16. Hypothesis Testing: One Sample t-test

The following videos (run times are from about 5 to 15 minutes) will help you with your Blackboard homework on this unit.

Please scroll down past these homework videos for an in-depth explanation of Hypothesis testing and the one sample t-test.

Question 1

Video for Question 1 continued (below):

Question 2

Question 3

Question 4

Video for Question 4 continued (below):

Please scroll down for another very helpful video (9 minutes).

Hypothesis Testing: One Sample t-test

This unit is about how to conduct a hypothesis test called the “one sample t-test” about the mean of a sample.

The one sample t-test uses the t-distribution (with df = n-1). Recall that the t-distribution has one parameter, its “degrees of freedom” df and that the PDF of the t-distribution looks like the PDF of the standard normal distribution $z$, only it is a little flatter (see Figure below).

The above Figure compares t-distributions with different degrees of freedom to the standard normal distribution (the z-distribution).

For more about the t-distribution, see our unit on “confidence intervals and the t-distribution“. You will also find an embedded t-table in Part II of that page, which you can use online or download.

Question 1.

Test the following claim, at an $\alpha = .05$ significance level, using the “one sample t-test”. Estimate the p-value.

Claim: the average weight of the adult male Eastern American Toad (Bufo a. americanus) is more than 25 grams (g).

Let $x(\text{adult male Eastern American Toad}) $ $= \text{ his weight in grams}$

Let $\bar{x}(\text{samples of n = 64 adult male Eastern American Toads})$ $ = \text{the mean weight of such samples in grams}$

Data: In a sample of 64 adult male Eastern American Toads the mean weight was $\bar{x} = 26.3 $ grams and the standard deviation was $S_x = 5.0 $ grams.

Solution to Question 1.

Video (9 minutes) of how to solve Question 1.

Important details on how to solve Question 1.

The t-test hypothesis test and p-value calculation starts off the same as when we do the exact binomial test.

We write the alternative and null hypothesis:

(claim) $H_A: \mu_x > 25 $ g
(null) $H_0: \mu_x = 25 $ g

Then we calculate p-value, which is the maximum probability of making a type I error if we are willing to accept the claim as true based upon the statistics from the data in our sample.

In other words, to calculate the p-value, we assume that the claim is false by as little as possible (the the null hypothesis $H_0$) and then we calculate the probability of getting a sample that is as likely to make us believe the claim is true, as the one we actually got.

To do this calculation we apply the Central Limit Theorem (CLT), and then make use of the t-distribution. In particular, since $n \geq 30$ the CLT tells us that (approximately)

$$\bar{x} \sim N\left( \mu_x, \dfrac{\sigma_x}{\sqrt{n}} \right)$$

and so the z-transformation for $\bar{x}$ will be

$$z(\bar{x}) = \dfrac{\bar{x} – \mu_x}{\sigma_x/\sqrt{n}}$$

However, $\sigma_x$ is unknown. So, we will approximate $\sigma_x$ by $S_x$ and we will use the t-distribution (with n-1 degrees of freedom) instead of the z-distribution.

With these substitutions $z(\bar{x})$ becomes the t-statistic:

$$t(\bar{x}) = \dfrac{\bar{x} – \mu_x}{S_x/\sqrt{n}}$$

The t-statistic can be viewed is a random variable on samples of size $n$. If $\bar{x}$ is normally distributed (see CLT) then the t-statistic will follow the t-distribution with n – 1 degrees of freedom. We write $t(\bar{x})$ for convenience, technically speaking the t-statistic as a formula is a function of the four variables $\bar{x}$, $S_x$, $\mu_x$, not just of $\bar{x}$.

When we want to emphasize the degrees of freedom of a t-distribution we will write $t_{\text{df}}$.

For Question 1:

$n = 64$, so degrees of freedom = df = n-1 = 63, so the t-statistic will follow the $t_{63}$ distribution (i.e., the t-distribution with 63 degrees of freedom).

$\bar{x} = 26.3$
$S_x = 5.0$
By the null hypothesis, $H_0$ we are assuming that $\mu_x = 25$

Substituting these values into the t-statistic formula:

$$t(\bar{x}) = \dfrac{\bar{x} – \mu_x}{S_x/\sqrt{n}}$$

we get

$t(26.3) = \dfrac{26.3 – 25}{5/\sqrt{64}} = \dfrac{1.3}{5/8} = 2.08$

and so

$\text{p-value} = P(\bar{x} \geq 26.3 \mid \mu_x = 25 \ g) $
$\approx P\left(z(\bar{x}) \geq z(26.3)\right)$
$\approx P\left(t(\bar{x}) \geq t(26.3)\right)$
$= P\left(t_{63} \geq t(26.3)\right)$
$= P\left(t_{63} \geq 2.08\right)$ Equation (1)

Note. In the above calculation, the first approximation sign $\approx$ is due to $\bar{x}$ being approximately normally distributed (see CLT) and the second approximation sign $\approx$ is due to using the t-distribution rather than the z-distribution. In subsequent calculations I won’t be so pedantic and I will just use equal signs.

Note. Usually, in probability calculations, I will just write $t$ rather than $t_{\text{df}}$ to save space.

Doing the probability calculation for Question 1 using R

We can use R to calculate $P\left(t \geq 2.08\right)$.

R’s

$$\text{pt}(t, \text{df})$$

command gives the cumulative area (probability) from $-\infty$ to $t$ for the t-distribution with df degrees of freedom,. In other words,

$$\text{pt}(t, \text{df}) = P(t_\text{df} < t)$$

Continuing the calculation from Equation (1) above:

$\text{p-value} = P\left(t \geq 2.08\right) $
$= 1 – P(t < 2.08)$
$=1 – \text{pt}(2.08, 63)$
$=1- 0.9792002$
$= 0.02079977$

See Figure:

So, the p-value = .02079977, which is less that $\alpha = .05$. Hence, the data provides statistically significant evidence that the mean weight of all adult male Eastern American Toads is greater than 25 grams.

One way to report the above results (APA style) would be:

The mean weight of Eastern American Toads (M = 26.3, SD = 5.0) is greater than 25 grams, t(63) = 2.08, p = .0208.

In the above
M = the mean of the sample = $\bar{x}$ = 26.3
SD = the standard deviation of the sameple = $S_x$ = 5.0
the 63 is the degrees of freedom (df = n-1)
the 2.08 is the t-statistic we calculated
the p is the p-value.

Note. In the APA reporting style t(63) = 2.08 is short hand for saying we are using the t-distribution with 63 degrees of freedom and that the t-statistic’s value is 2.08. It doesn’t mean the t-statistic of 63 is 2.08.

Here is an R script to do the above calculation.

1 - pt(2.08, 63)
# End of Script

R returns the p-value for Question 1:

0.02079977

Doing the probability calculation for Question 1 using the t-table

It is better to use R to calculate the p-value. Here is how to use the t-table to calculate the p-value. Unfortunately, the t-table won’t give us an exact value for the p-value.

\begin{align*}
\text{p-value} &= \max P(\text{making type I error if are willing to accept the claim as true if }
\bar{x}(\text{sample}) \geq 26.3 \ g) \\ \\
&= P(\bar{x} \geq 26.3 \mid \mu_x = 25 \ g) \ \ \text{ (max prob of making type I error is when claim is false by as little as possible)} \\ \\
&= P(t(\bar{x}) \geq t(26.3) \mid \mu_x = 25 \ g), \ \ \ \ \text{calc t-statistic: } \
t(\bar{x}) = \dfrac{\bar{x} – \mu_x}{S_x/\sqrt{n}}
= \dfrac{26.3 – 25}{5/\sqrt{64}} = 2.08 \\ \\
&= P\left(t \geq 2.08\right), \ \ \ \ df = n – 1 = 63 \ \ (\text{look on line 63 of look up table, use one-tailed area}) \\ \\
& .01 < \underbrace{P\left(t \geq 2.08\right)}_{p-value} < .025. \ \ \ \ \text{So, p-value } < .05
\end{align*}

Notice that the p-value found by R, p-value = .02079977, is between .01 and .025, and so is in agreement with the results found by using the t-table.

See Figure below for how to use the t-table to estimate the p-value for Question 1.

Since $\text{p-value} < (\alpha = .05)$ the data provides statistically significant evidence that the claim is true.

Question 2.

Test the following claim, at an $\alpha = .05$ significance level, using the t-test.

Estimate the p-value.

Claim: the average weight of the adult male Eastern American Toad is less than 27 grams (g).

Data: in a sample of 64 adult male Eastern American Toads the mean weight was $\bar{x} = 26.3$ grams and the standard deviation was $S_x = 5.0 $ grams.

Let

$x(\text{adult male Eastern American Toad}) $ $= \text{his weight in grams}$

$\bar{x}(\text{samples of n = 64 adult male Eastern American Toads})$ $= \text{ the mean weight of such samples in grams}$

Answer to Question 2.

(claim) $H_A: \mu_x < 27$ grams
(null) $H_0: \mu_x = 27$ grams

$\bar{x} = 26.3$
$S_x = 5.0$
$n = 64$, so degrees of freedom = df = n-1 = 63
By the null hypothesis $H_0$ we are assuming that $\mu_x = 27$.

Since $n = 64 \geq 30$, the CLT tells us that $\bar{x}$ will be (approximately) normal, and so we can use the t-test.

Substituting these values into the t-statistic formula:

$$t(\bar{x}) = \dfrac{\bar{x} – \mu_x}{S_x/\sqrt{n}}$$

we get

$t(26.3) = \dfrac{26.3 – 27}{5/\sqrt{64}} = \dfrac{-0.7}{5/8} = -1.12$$

and so (with the help of R) we get:

$\text{p-value} = P(\bar{x} \leq 26.3 \mid \mu_x = 27 \ g) $
$= P\left(t(\bar{x}) \leq t(26.3)\right)$
$= P\left(t \leq t(26.3)\right)$
$= P\left(t \leq -1.12\right)$
$=\text{pt}(-1.12, 63)$ (because df = n – 1 = 64 – 1 = 63)
$= 0.1334827$

 pt(-1.12, 63) 
# End of R Script

See Figure:

Since the p-value = .1334827 the sample does not provide statistically significant evidence that the claim is true.

We can also use the t-table to solve Question 2.

Using the t-table to solve Question 2.

The t-distribution is symmetric about $t = 0$. So, if $a$ is a positive number, we have: $P\left(t \leq -a \right) = P\left(t \geq a \right)$. This allows us to use the t-table if the t-statistic is negative.

\begin{align*}
\text{p-value} &= \max P(\text{making type I error if are willing to accept the claim as true if }
\bar{x}(\text{sample}) \leq 26.3 \ g) \\ \\
&= P(\bar{x} \leq 26.3 \mid \mu_x = 27 \ g) \\
&= P(t(\bar{x}) \leq t(26.3) \mid \mu_x = 27 \ g), \ \ \ \ \text{calc t-statistic: } \
t(\bar{x}) = \dfrac{\bar{x} – \mu_x}{S_x/\sqrt{n}}
= \dfrac{26.3 – 27}{5/\sqrt{64}} = -1.12 \\ \\
&= P\left(t \leq -1.12\right), \ \ \text{the t-distribution is symmetric about t = 0, (same as the z-distribution)} \\ \\
&= P\left(t \geq 1.12\right),
\ \ \ \ df = n – 1 = 63 \ \ (\text{look on line 63 of look up table, use one-tailed area}) \\ \\
& .10 < \underbrace{P\left(t \geq 1.12\right)}_{p-value} < .15 \ \ \ \ \text{So, p-value } > .05
\end{align*}

See Figure below for how to use the t-table to estimate the p-value for Question 2.

Since $\text{p-value} > (\alpha = .05)$ the data fails to provide statistically significant evidence that the claim is true.

Question 3.

Claim: $\mu_x < 27.5$

Suppose the data is:

x = 25, 25, 25, 25, 26, 26, 26, 26, 27, 27, 27, 27, 28, 28, 29, 29.

Note: n = 16.

Using the one sample t-test, estimate the p-value, and test the claim at the $\alpha = .05$ significance level. Assume that x is normally distributed.

Solution to Question 3.

$H_A: \mu_x < 27.5$ grams
$H_0: \mu_x = 27.5$ grams

In Question 3 we are told that $x$ is normally distributed so we can use the t-test even though $n = 16 < 30$.

Using R, I can find the mean $\bar{x}$, the standard deviation $S_x$ and the t-statistic:

 
x=c(25, 25, 25, 25, 26, 26, 26, 26, 27, 27, 27, 27, 28, 28, 29, 29);
mux = 27.5;
#--------------------- no user input needed below this line
xbar = mean(x); xbar;
Sx = sd(x); Sx;
n = length(x); n;
t_statistic = (xbar - mux)/(Sx/sqrt(n));
t_statistic;
# End of Script

R outputs:

So,
$\bar{x} = 26.625$
$S_x = 1.360147$
t-statistic = $-2.573251$

Note.
$\text{t-statistic} $
$= t(\bar{x}) = \dfrac{\bar{x} – \mu_x}{S_x/\sqrt{n}}$
$t(26.625) = \dfrac{26.625 – 27.5}{1.360147/\sqrt{16}} = -2.573251 $

Now we can calculate the p-value (with some additional help from R):

$\text{p-value} = P(\bar{x} \leq 26.625 \mid \mu_x = 27.5 \ g) $
$= P\left(t(\bar{x}) \leq t(26.625)\right)$
$= P\left(t \leq t(26.625)\right)$
$= P\left(t \leq -2.573251\right)$
$=\text{pt}(-2.573251, 15)$ (because df = n – 1 = 16 – 1 = 15)
$= 0.01059851$

 pt(-2.573251, 15) 
# End of R Script

See Figure:

Since the p-value = .01059851 is less than $\alpha = .05$ the sample provides statistically significant evidence the claim is true.

Alternatively, we can use the t-table for Question 3.

$\text{p-value} = P(\bar{x} < 26.625 \mid \mu_x = 27.5)$
$= P(t(\bar{x}) < t(26.625))$
$= P(t < -2.573251)$
$= P(t > 2.573251)$
where the last equality follows from the symmetry of the t-distribution.

To finish the p-value calculation we use the t-distribution table to approximate $P(t > 2.573251)$, which is the p-value.

Since, df = n – 1 = 15, we look on line 15 of the t-distribution look-up table and we see that $t = 2.573251$ is between 2.131 and 2.602. Then we go up to the one-tailed areas, to see that

$$ .01 < \underbrace{P(t > 2.573251)}_{\text{p-value}} < .025$$

See Figure below for how to use the t-table to estimate the p-value for Question 3.

So, the p-value is less than .05. So the sample provides statistically significant evidence the claim is true. Which agrees with the answer we got using R for the entire calculation.

Question 4.

Calculate the p-value and test the following claim at an $\alpha = .05$ significance level.

Claim: The average daily commute (round trip) for New York City residents is over 2 hours (120 minutes).

Data: A sample of 36 New Yorkers had the following round trip commute time in minutes:

110, 110, 110, 110, 110, 110,
130, 130, 130, 130,130, 130,
135, 135, 135, 135, 135, 135,
134, 130, 129, 122, 105, 105,
120, 122, 123, 124, 119, 118,
105, 115, 125, 125, 135, 145

Suggestion. You can copy and paste the above data into R.

Answer to Question 4.

$H_A: \mu_x > 120$ minutes
$H_0: \mu_x = 120$ minutes

Since $n = 36 \geq 30$, the CLT tells us that $\bar{x}$ will be (approximately) normal, and so we can use the t-test.

We use R to calculate the mean, standard deviation, the t-statistic, and the p-value.

x = c(110, 110, 110, 110, 110, 110,
130, 130, 130, 130,130, 130,
135, 135, 135, 135, 135, 135,
134, 130, 129, 122, 105, 105,
120, 122, 123, 124, 119, 118,
105, 115, 125, 125, 135, 145);
mux = 120; 
#---- No user input needed below this line ----
xbar = mean(x); xbar;
Sx = sd(x); Sx;
n = length(x); n;
t_statistic = (xbar - mux)/(Sx/sqrt(n));
t_statistic;
"p-value for claim of type mu greater than"; 
1 - pt(t_statistic, n - 1);
# End of R Script

Here are the details of the calculations done by the above R script to solve Question 4.

The above R script outputs:
$\bar{x} = 123.6389$
$S_x = 10.80781$

Then:

$\text{t-statistic} $
$= t(\bar{x}) = \dfrac{\bar{x} – \mu_x}{S_x/\sqrt{n}}$
$t(123.6389) = \dfrac{123.6389 – 120}{10.80781/\sqrt{36}} = 2.020144 $

Finally:
$\text{p-value} = P(\bar{x} \geq 123.6389 \mid \mu_x = 120 \ \text{minutes}) $
$= P\left(t(\bar{x}) \geq t(123.6389)\right)$
$= P\left(t \geq t(123.6389)\right)$
$= P\left(t \geq 2.020144\right)$
$=1 – \text{pt}(2.020144, 35)$ (because df = n – 1 = 36 – 1 = 35)
$= 1 – 0.9744629$
$= 0.02553709$

Since the p-value = .02553709 is less than $\alpha = .05$ the sample provides statistically significant evidence the claim is true.

Question 5.

Calculate the p-value and test the following claim at the $\alpha = .05$ significance level.

Claim: The average daily commute (round trip) for New York City residents is under 125 minutes.

Data: A sample of 36 New Yorkers had the following round trip commute time in minutes:

110, 110, 110, 110, 110, 110,
130, 130, 130, 130,130, 130,
135, 135, 135, 135, 135, 135,
134, 130, 129, 122, 105, 105,
120, 122, 123, 124, 119, 118,
105, 115, 125, 125, 135, 145

Suggestion. You can copy and paste the above data into R.

Answer to Question 5.

$H_A: \mu_x < 125$ minutes
$H_0: \mu_x = 125$ minutes

Since $n = 36 \geq 30$, the CLT tells us that $\bar{x}$ will be (approximately) normal, and so we can use the t-test.

We use R to calculate the mean, standard deviation, the t-statistic, and the p-value.

x = c(110, 110, 110, 110, 110, 110,
130, 130, 130, 130,130, 130,
135, 135, 135, 135, 135, 135,
134, 130, 129, 122, 105, 105,
120, 122, 123, 124, 119, 118,
105, 115, 125, 125, 135, 145);
mux = 125; 
#---- No user input needed below this line ----
xbar = mean(x); xbar;
Sx = sd(x); Sx;
n = length(x); n;
t_statistic = (xbar - mux)/(Sx/sqrt(n));
t_statistic;
"p-value for claim of type mu less than"; 
pt(t_statistic, n - 1);
# End of R Script

R outputs:

Here are the details of the calculations done by the above R script to solve Question 5.

The above R script outputs:
$\bar{x} = 123.6389$
$S_x = 10.80781$

Then:

$\text{t-statistic} $
$= t(\bar{x}) = \dfrac{\bar{x} – \mu_x}{S_x/\sqrt{n}}$
$t(123.6389) = \dfrac{123.6389 – 125}{10.80781/\sqrt{36}} = -0.7556265 $

Finally:
$\text{p-value} = P(\bar{x} \leq 123.6389 \mid \mu_x = 125 \ \text{minutes}) $
$= P\left(t(\bar{x}) \leq t(123.6389)\right)$
$= P\left(t \leq t(123.6389)\right)$
$= P\left(t \leq -0.7556265\right)$
$=\text{pt}( -0.7556265, 35)$ (because df = n – 1 = 36 – 1 = 35)
$= 0.2274644$

Since the p-value = .2274644 is greater than $\alpha = .05$ the sample does not provide statistically significant evidence the claim is true.

Professor McCarthy Statistics

Mat 150 and Mat 150.5 BMCC

The following videos (run times are from about 5 to 15 minutes) will help you with your Blackboard homework on this unit.

Hypothesis Testing: One Sample t-test

Need help with the Commons?