T-Tests for Two Samples

Two Independent Samples

\boxed{ \sigma_1 = \sigma_2 \text{ if } \frac{ max(s_1, s_2) }{ min(s_1, s_2) } < 2 }

\boxed{ \begin{array}{cc} \begin{aligned} \sigma_1 = \sigma_2 &: \begin{cases} T &= \frac{\bar{X}_1 - \bar{X}_2}{SE_{\text{pooled}}} \\~\\ df^* &= n_1 + n_2 - 2 \end{cases} \\~\\ \sigma_1 \ne \sigma_2 &: \begin{cases} T &= \frac{\bar{X}_1 - \bar{X}_2}{SE_{\text{unpooled}}} \\~\\ df^* &= min(n_1 - 1, n_2 - 1) \end{cases} \end{aligned} & \begin{aligned} SE_{\text{unpooled}} &= \sqrt{ \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} } \\ S^2_{\text{pooled}} &= \frac{ (n_1 - 1)s_1^2 + (n_2 - 1) s_2^2 }{ n_1 + n_2 - 2 } \\ SE_{\text{pooled}} &= \sqrt{ s^2_{\text{pooled}} \left( \frac{1}{n_1} + \frac{1}{n_2} \right) } \end{aligned} \end{array} }

\boxed{ \text{Reject $H_0$ if: } \begin{cases} H_1 : \mu_1 \ne \mu_2 \leftrightarrow \mu_1 - \mu_2 \ne 0 &\quad |T| > t_{\alpha / 2 , df} &\quad \text{(two-tailed)} \\ H_1 : \mu_1 > \mu_2 \leftrightarrow \mu_1 - \mu_2 > 0 &\quad |T| > t_{\alpha , df} &\quad \text{(right-tailed)} \\ H_1 : \mu_1 < \mu_2 \leftrightarrow \mu_1 - \mu_2 < 0 &\quad |T| < - t_{\alpha , df} &\quad \text{(left-tailed)} \end{cases} }

On Notation:
We use subscripts to denote which population a measurement is from.
e.g., \mu_i, \sigma_i, \bar{X}_i, etc.

Why?

Unpooled:

Hypotheses: \begin{aligned} H_0 : \mu_1 = \mu_2 \quad &\text{vs} \quad H_1: \mu_1 \ne \mu_2 \\ &\downarrow \\ H_0 : \mu_1 - \mu_2 = 0 \quad &\text{vs} \quad H_1: \mu_1 - \mu_2 \ne 0 \end{aligned}

Test Statistic: \begin{aligned} T &= \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{SE_{(\bar{X}_1 - \bar{X}_2)}} \\ &= \frac{\bar{X}_1 - \bar{X}_2}{SE_{(\bar{X}_1 - \bar{X}_2)}} \text{ under $H_0$} \end{aligned}

Something to do with covariance.

Degree of Freedom:

We went with a conservative df, but there are other options, like Satterhwaite.

Pooled:

If the variances of the two samples are close enough, we can simplify by taking the weighted average of their SDs.

This gives us a test statistic that’s exactly on the t distribution, rather than a conservative estimate.

Example: Soybean Data

These are the stem lengths of soybeans under different fertilizers.

F1	20.2	22.9	23.3	20.0	19.4	22.0	22.1	22.0	21.9	21.5	19.7	21.5	20.9
F2	22.2	23.5	21.5	23.5	22.4	23.8	22.4	20.4	21.0	24.7

Q: Is there a significant difference between the two fertilizers?

Let \mu_i: pop. mean of stem lengths of soybean with fertilizer i, i = 1,2

H_0: \mu_1 - \mu_2 = 0
H_1: \mu_1 - \mu_2 \ne 0
\alpha = 0.05

We can get:

\bar{X}_1 = 21.34
s_1 = 1.22
n_1 = 13
\bar{X}_2 = 22.54
s_2 = 1.35
n_2 = 10

Further:

\bar{X}_1 - \bar{X}_2 = -1.2
SE_{\bar{X}_1 - \bar{X}_2} = 0.5447
df^* = min(13-1, 10-1) = 9
t_{\alpha / 2,df^*} = t_{0.025,9} = 2.262

From this, we can calculate T

\begin{aligned} T &= \frac{\bar{X}_1 - \bar{X}_2}{SE_{(\bar{X}_1 - \bar{X}_2)}} \\ &= \frac{21.34 - 22.54}{0.5447} \\ &= -2.203 \end{aligned}

We can see |T| < t, so we accept H_0 at \alpha = 0.05.

\boxed{ \text{C.I. for } \mu_1 - \mu_2 \text{ has } 0 \leftrightarrow \text{Accept $H_0$} }

Two Paired Samples

\boxed{ \text{Paired: } \begin{cases} H_0 : \text{use } \mu_d \\ d_i = x_{1i} - x_{2i} \forall i=1,...,n \\ \text{compute } \bar{d}, s_d, SE_{\bar{d}} \\ SE_{\bar{d}} = s_d / \sqrt{n} \\ T = \frac{\bar{d}}{SE_{\bar{d}}} \sim t(n-1) \end{cases} }

\boxed{ \text{Reject $H_0$ if: } \begin{cases} H_1 : \mu_d \ne 0 &\quad |T| > t_{\alpha / 2 , n - 1} &\quad \text{(two-tailed)} \\ H_1 : \mu_d > 0 &\quad T > t_{\alpha , n - 1} &\quad \text{(right-tailed)} \\ H_1 : \mu_d < 0 &\quad T < - t_{\alpha , n - 1} &\quad \text{(left-tailed)} \end{cases} }

Example: Weight Loss Data

The compound m-Faminol (mFAM) is thought to affect appetite and food intake in humans. Nine moderately obese women were given mFAM in a double-blind, placebo-controlled experiment.

Some took mFAM for two weeks, nothing for two weeks (the washout period), and then a placebo for two weeks.
Some took the placebo for two weeks, nothing for two weeks, and then mFAM for two weeks.

The weight loss in kg for each woman was recorder under each condition. Assume the weight losses form a normal distribution.

Woman	1	2	3	4	5	6	7	8	9	mean	sd
mFAM	1.1	1.3	0.8	1.7	1.4	0.1	0.5	1.6	-0.5	0.8889	0.7373
placebo	0.5	-0.3	0.6	0.3	0.7	-0.2	0.6	0.9	-1.5	0.1778	0.7463

Q: Does mFAM have an effect? (\alpha = 0.05)

Let H_0 : \mu_d = 0, \mu_d = \mu_{mFAM} - \mu_{placebo}

Now we will calculate d_i:

Woman	1	2	3	4	5	6	7	8	9
mFAM (x_{1i})	1.1	1.3	0.8	1.7	1.4	0.1	0.5	1.6	-0.5
placebo (x_{2i})	0.5	-0.3	0.6	0.3	0.7	-0.2	0.6	0.9	-1.5
d_i = x_{1i} - x_{2i}	0.6	1.6	0.2	1.4	0.7	0.3	-0.1	0.7	1.0

We were told the samples were from the normal distribution, so we can use d.

We can calculate \bar{d} = 0.711 and s_d = 0.553 to get SE_{\bar{d}}:

\begin{aligned} SE_{\bar{d}} &= \frac{ s_d }{ \sqrt{n} } \\ &= \frac{ 0.553 }{ \sqrt{9} } \\ &= 0.1844 \end{aligned}

Now we can get T and t for two-tailed test:

\begin{aligned} T &= \frac{\bar{d}}{SE_{\bar{d}}} \\ &= \frac{0.711}{0.1844} \\ &= 3.8563 \end{aligned}

t_{\alpha / 2, n-1} = t_{0.025, 8} = 2.306

Because |T| > t, we reject H_0 at \alpha = 0.05

\boxed{ \text{Paired C.I.: } \bar{d} \pm t_{\alpha / 2, n-1} SE_{\bar{d}} } \\ \small\textit{accept $H_0$ if contains 0}