4.2 The \(Z\) and \(t\) tests

In this section, we will cover the \(Z-\)test for testing the sample mean when the population standard deviation is known, as well as the \(t-\)test for cases when the population standard deviation is unknown.

Additionally, we will delve into various applications of these tests, including testing proportions, comparing means when variances are known or unknown, conducting two-sample proportion tests, and performing two-sample tests.

4.2.1 The \(Z-\)test: Testing for the sample mean when \(\sigma\) is known

Suppose we have a random sample \(X_1, X_2, \dots, X_n\) from a normal distribution \(N(\mu, \sigma^2)\), where \(\mu\) is an unknown mean, but \(\sigma\) is a known standard deviation. We assume that the parameter space is real, i.e., \(\mu \in \mathcal{P} = \mathbb{R}\). The null hypothesis assumes that \(\mu = c\). The sample mean \(\overline{X}\) serves as an estimator of \(\mu = c\). However, we need a better way to evaluate \(\overline{X}\).

To answer this question, we introduce the concept of a null distribution. We assume that the empirical values of the sample \(X_1, X_2, \dots, X_n\) are known, and we consider an i.i.d. sample \(Y_1, Y_2, \dots, Y_{n}\) drawn from the same distribution \(N(\mu, \sigma^2)\). The \(Y_j\) variables mimic the sampling procedure, which is a common approach in tests of significance. We refer to \(Y_j\) as the null distribution.

We then calculate \(\Prob{Y \geq X}\), where \(Y\) is viewed as a random variable and \(X\) is the observed sample average. The \(X\) statistic, calculated from the observed data, is known as the “test statistic.” The probability \(\Prob{Y \geq X}\) describes how likely it is for the test statistic to be at least as far away from \(\mu\) as the observed value. This probability can be computed precisely because the distribution of \(Y\) is known exactly. Specifically, if the null hypothesis is true, then

\[ Z = \sqrt{n}\left(\frac{\overline{Y}-c}{\sigma}\right) \sim N(0,1) \]

As you have seen earlier, we compute the \(p-\)value using the following expression:

\[\begin{align*} \Prob{Y \geq X} & = \Prob{\sqrt{n}\left(\frac{\overline{Y}-c}{\sigma}\right) \geq \sqrt{n}\left(\frac{\overline{X}-c}{\sigma}\right)} \\ & = \Prob{Z \geq \sqrt{n}\left(\frac{\overline{X}-c}{\sigma}\right)} \end{align*}\]

By computing the \(p-\)value, we can determine the probability and make the hypothesis: \(H_0: \mu = c\) vs. \(H_A: \mu > c\).

In practice, before taking a sample, a “significance level” \(\alpha \in (0, 1)\) is typically selected. If the \(p-\)value \(\Prob{Y \geq X} < \alpha\), we reject the null hypothesis \(\mu = c\) because the sample average is significantly different from the assumed mean \(c\). On the other hand, if the \(p-\)value \(\Prob{Y \geq X} \geq \alpha\), we consider the sample mean consistent with the assumption \(\mu = c\), and the null hypothesis is not rejected.

Example 4.3 Suppose a programmer is developing an app to identify faces based on digital photographs taken from social media. She wants to ensure that the app has an accuracy rate of more than 90% in the long run. She takes a random sample of 500 such photos, and her app correctly identifies the faces in 462 photos, resulting in a success rate of 92.4%. The programmer hopes that this indicates the app performs better than the 90% threshold in the long run. However, it is also possible that the long-term success rate is only 90%, and the app happened to perform above this bar in the 500-photo sample. What does a \(Z-\)test say about the null hypothesis that the app is only 90% accurate (compared to an alternate hypothesis that the app is more than 90% accurate with a significance level of \(\alpha = 0.05\))?

The random variables in question are modeled by a Bernoulli distribution since the app either makes a correct identification or not. Under the null hypothesis, \(Y_1, Y_2, \dots, Y_{100} \sim \Ber{0.9}\) and are independent. Although the sample proportion does not precisely follow a normal distribution, the Central Limit Theorem implies that the standardized quantity

\[\frac{Y_1 + \cdots + Y_{500} - 450}{\sqrt{45}}\]

should have an approximate standard normal distribution (\(N(0, 1)\)). Therefore,

\[\begin{align*} \mathbb{P}\left( \frac{Y_1 + \cdots + Y_{500}}{500} \geq \frac{X_1 + \cdots + X_{500}}{500} \right) & = \Prob{ \frac{Y_1 + \cdots + Y_{500} - 450}{\sqrt{45}} \geq \frac{X_1 + \cdots + X_{500} - 450}{\sqrt{45}} } \\ & \approx \Prob{Z \geq \frac{462-450}{\sqrt{45}}} \approx 0.03 \end{align*}\]

Since 0.03 is less than \(\alpha = 0.05\), the null hypothesis is rejected, and we conclude that the success rate for the app is greater than 90%.

The examples above focus on tests on the right-hand tail of a normal curve. That is, they test a null hypothesis \(\mu = c\) against an alternate hypothesis \(\mu > c\). However, it is also possible to perform tests on the left-hand tail (testing a null hypothesis \(\mu = c\) against an alternate hypothesis \(\mu < c\)) and even two-tailed tests (testing a null hypothesis \(\mu = c\) against an alternate hypothesis \(\mu \neq c\)), as shown in the following examples:

(Left-tailed) \(H_0: \mu = c\) vs. \(H_A: \mu < c\)
(Two-tailed) \(H_0: \mu = c\) vs \(H_A: \mu \neq c\)

To compute them, we may modify the computation accordingly:

For \(H_0: \mu = c\) vs \(H_A: \mu < c\), we compute the probability:

\[\begin{equation*} \mathbb{P}\left(\frac{\sqrt{n}(\overline{Y}-c)}{\sigma} \leq \frac{\sqrt{n}(\overline{X}-c)}{\sigma}\right) \end{equation*}\]

If the \(p-\)value is less than \(\alpha\), we reject the null hypothesis \(\mu = c\) in favor of the alternative hypothesis \(\mu < c\).

For \(H_0: \mu = c\) vs \(H_A: \mu \neq c\), we compute the probability:

\[\begin{equation*} \mathbb{P}\left(\left|\frac{\sqrt{n}(\overline{Y}-c)}{\sigma}\right| \geq \left|\frac{\sqrt{n}(\overline{X}-c)}{\sigma}\right|\right) \end{equation*}\]

If the \(p-\)value is less than \(\alpha\), we reject the null hypothesis \(\mu = c\) in favor of the alternative hypothesis \(\mu \neq c\).

4.2.1.1 Testing Proportions

When testing the equality of proportions, we often encounter the scenario of testing whether a proportion is equal to a specific value, such as testing \(p=0.5\).

To perform such tests, we sample \(X_1, X_2, \dots, X_n \iidd \Ber{p}\) random variables and, we to test the hypothesis: \(H_0: p=0.5\) vs \(H_A: p \not= 0.5\)

We can use Binomial Central Limit Theorem 2.1 to deduce that, \[ \frac{\sqrt{n}(\overline{X}-p)}{\sqrt{p(1-p)}} \overset{d}{\to} Z \sim N(0, 1) \]

In R, the prop.test() function allows us to conduct a \(Z-\)test for proportions. This function works as following:

It computes the \(p-\)value by evaluating the probability \(\Prob{ \left| Z - 0.5 \right| \geq \left| \frac{\sqrt{n}(\overline{X}-0.5)}{\sqrt{0.5(1-0.5)}} - 0.5 \right| }\). Here, \(Z\) is a standardized normal random variable.
And also calculates a confidence interval using the standard normal distribution. It determines the region of \(p\) where the inequality \(\left| \frac{\sqrt{n}(\overline{X}-p)}{\sqrt{p(1-p)}} \right| < z_{\frac{\alpha}{2}}\) holds, with \(\Prob{ Z > z_{\frac{\alpha}{2}} } = \frac{\alpha}{2}\). (See Figure 4.1).

Figure 4.1: Two-tailed Test

For example, consider the test with observed proportion \(\frac{43}{100}\). Let’s perform the \(Z-\)test using R:

prop.test(43, 100)
#> 
#>  1-sample proportions test with continuity correction
#> 
#> data:  43 out of 100, null probability 0.5
#> X-squared = 1.69, df = 1, p-value = 0.1936
#> alternative hypothesis: true p is not equal to 0.5
#> 95 percent confidence interval:
#>  0.3326536 0.5327873
#> sample estimates:
#>    p 
#> 0.43

By interpreting the \(p-\)value and examining the confidence interval provided by prop.test(), we can make conclusions about the hypothesis being tested.

4.2.1.2 Implementing the \(Z-\)test in R

To perform a \(Z-\)test and calculate the confidence interval for a given dataset x in R, you can use the following function:

z_test_ci <- function(x, mu = 0, sigma = 1, alpha = 0.95) {
  z_statistic <- qnorm((1 - alpha) / 2, lower.tail = FALSE)
  sd_x <- sigma / sqrt(length(x))
  p_value <- pnorm((mean(x) - mu) / sd_x, lower.tail = FALSE)

  cat(
    100 * alpha,
    "% Confidence Interval: (",
    mean(x) - z_statistic * sd_x,
    ", ",
    mean(x) + z_statistic * sd_x,
    ")\n",
    sep = ""
  )
  cat("p-value: ", p_value, sep = "")
}

To test a dataset x with a given mean mu and standard deviation sigma, you can call the function z_test_ci(x, mu, sigma).

Let’s consider an example:

x <- c(75, 76, 73, 75, 74, 73, 76, 73, 79)
z_test_ci(x, mu = 76, sigma = 1.5)
#> 95% Confidence Interval: (73.90891, 75.86887)
#> p-value: 0.9868659

The function z_test_ci() calculates the confidence interval and provides the p-value for the \(Z-\)test. You can interpret the results to draw conclusions about the hypothesis being tested.

4.2.2 The \(t-\)test: Test for sample mean when \(\sigma\) is unknown

As in the case of confidence intervals, when \(\sigma\) is unknown and estimated from the sample standard deviation \(S\), an adjustment must be made by using the \(t-\)distribution.

Suppose we have a random sample \(X_1, X_2, \dots, X_n\) from a normal distribution \(N(\mu, \sigma^2)\), where both \(\mu\) and \(\sigma\) are unknown.

Again, we are interested in testing the hypotheses:

\(H_0: \mu = c\) vs \(H_A: \mu > c\)
\(H_0: \mu = c\) vs \(H_A: \mu < c\)
\(H_0: \mu = c\) vs \(H_A: \mu \neq c\)

The test statistic in this case is the \(t-\)statistic, given by: \[ t = \frac{\overline{X} - c}{\frac{s}{\sqrt{n}}} \]

where \(\overline{X}\) is the sample mean, \(s\) is the sample standard deviation, and \(n\) is the sample size.

The \(t-\)statistic follows a \(t-\)distribution with \(n-1\) degrees of freedom.

We compute the probability:

\[\begin{equation*} \mathbb{P}\left(T \geq \frac{\overline{X} - c}{\frac{s}{\sqrt{n}}}\right) \end{equation*}\]

\[\begin{equation*} \mathbb{P}\left(T \leq \frac{\overline{X} - c}{\frac{s}{\sqrt{n}}}\right) \end{equation*}\]

\[\begin{equation*} \mathbb{P}\left(|T| \geq \left|\frac{\overline{X} - c}{\frac{s}{\sqrt{n}}}\right|\right) \end{equation*}\]

depending on the alternative hypothesis.

If the \(p-\)value is less than \(\alpha\), we reject the null hypothesis in favor of the alternative hypothesis.

Keep in mind that in all cases, it’s necessary to assume that the samples are independent and representative of the population.

The function t.test() can be used to perform \(t-\)test in R.

t.test(x, mu = 74)
#> 
#>  One Sample t-test
#> 
#> data:  x
#> t = 1.3571, df = 8, p-value = 0.2118
#> alternative hypothesis: true mean is not equal to 74
#> 95 percent confidence interval:
#>  73.37848 76.39930
#> sample estimates:
#> mean of x 
#>  74.88889

Here we have used the same data x as before and only specified the mean and not the standard deviation.

4.2.3 Applications

4.2.3.1 Test for Equality of Means When Variance is Known

In certain cases, we may need to compare two populations. One common scenario is testing for equality of means when the variances are known. Consider two random variables, \(X\) and \(Y\), following normal distributions:

\[\begin{align*} X &\sim N(\mu_1, \sigma_1^2) \\ Y &\sim N(\mu_2, \sigma_2^2) \end{align*}\]

where both \(\sigma_1\) and \(\sigma_2\) are known.

The hypothesis can be formulated as follows:

\(H_0: \mu_1 = \mu_2\) vs \(H_A: \mu_1 \neq \mu_2\)

This is equivalent to:

\(H_0: \mu_1 - \mu_2 = 0\) vs \(H_A: \mu_1 - \mu_2 \neq 0\)

To test this hypothesis, we take independent and identically distributed samples: \(X_1, X_2, \dots, X_{n_1}\) from \(X\) and \(Y_1, Y_2, \dots, Y_{n_2}\) from \(Y\).

Intuitively, we may want to check if \(\overline{X} - \overline{Y}\) is close to \(0\) or not.

Under the assumptions, we have:

\[ \overline{X} - \overline{Y} \sim N \left(\mu_1-\mu_2, \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} \right) \]

We can standardize this quantity using \(Z \sim N(0,1)\). Fix a significance level \(\alpha \in (0,1)\). If:

\[ \mathbb{P}\left( |Z| \geq \frac{ \left|\overline{X} - \overline{Y} \right| }{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} }} \right) < \alpha \]

then we reject \(H_0\).

4.2.3.2 Test for Proportions When Variance is Unknown

Another commonly encountered situation is testing for proportions when the variances are unknown. Consider two coins with probabilities \(p_1\) and \(p_2\) of landing on heads.

The hypothesis can be stated as follows:

\(H_0: p_1 = p_2\) vs \(H_A: p_1 \neq p_2\)

We sample \(X_1, X_2, \dots, X_n\) and \(Y_1, Y_2, \dots, Y_{n}\) by tossing the coins, where \(X_1, X_2, \dots, X_n \iidd \Ber{p_1}\) and \(Y_1, Y_2, \dots, Y_{n} \iidd \Ber{p_2}\).

Let \(\hat{p}_1 = \overline{X}\) and \(\hat{p}_2 = \overline{Y}\) be the sample proportions. The test statistic is given by:

\[ \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\frac{2\hat{p}(1-\hat{p})}{n}}} \]

where \(\hat{p} = \frac{\hat{p}_1 + \hat{p}_2}{2}\). The term \(\frac{2\hat{p}(1-\hat{p})}{n}\) is referred to as the pooled variance.

4.2.3.3 Two-Sample Proportion Test

When comparing two independent random samples, \(X_1, X_2, \dots, X_n\) and \(Y_1, Y_2, \dots, Y_{m}\), we can conduct a two-sample proportion test. Let’s assume:

\[\begin{align*} X_1, X_2, \dots, X_n &\iidd N(\mu_X, \sigma_1^2)\\ Y_1, Y_2, \dots, Y_{m} &\iidd N(\mu_Y, \sigma_2^2) \end{align*}\]

If the variances are equal, \(\sigma_1 = \sigma_2\), we can use the following test statistic:

\[ T := \frac{ (\overline{X} - \overline{Y}) - (\mu_X - \mu_Y) }{ S_{\textsf{pooled}} \sqrt{ \frac{1}{n} + \frac{1}{m} } } \sim t_{m+n-1} \] where, \[ S_{\textsf{pooled}}^2 = \frac{ (n-1)S_X^2 + (m-1)S_Y^2 }{n+m-2} \]

4.2.3.4 Two-Sample Test

In the case of independent random samples, \(X_1, X_2, \dots, X_n\) and \(Y_1, Y_2, \dots, Y_{m}\), we can perform a two-sample test. Consider the following assumptions:

\[\begin{align*} X_1, X_2, \dots, X_n &\iidd N(\mu_X, \sigma_1^2)\\ Y_1, Y_2, \dots, Y_{m} &\iidd N(\mu_Y, \sigma_2^2) \end{align*}\]

If the variances are unequal, \(\sigma_1 \neq \sigma_2\), we can use the following test statistic:

\[ T := \frac{ (\overline{X} - \overline{Y}) - (\mu_X - \mu_Y) }{ S_{\textsf{pooled}} } \sim t_{d} \] where, \[ S_{\textsf{pooled}}^2 = \frac{S_X^2}{ (n-1)} +\frac{S_Y^2 }{m-1} \] and, \[ d = \frac{\left( \frac{S_X^2}{n} + \frac{S_Y^2}{m} \right)^2}{\frac{\left( \frac{S_X^2}{n} \right)^2}{n-1} + \frac{\left( \frac{S_Y^2}{m} \right)^2}{m-1}} \]

Exercises

Exercise 4.1 Express the null and the alternative hypothesis for an appropriately designed test:

The placement committee at ISI claims that the mean annual starting salary for B.Math(Hons.) graduating students is greater than Rupees 7,00,000.
The standard deviation for measurement of temperature from Siva’s thermometer equals \(2^{\circ} C\).
The proportion of students in India that suffered from COVID-19 is less than 9%.
The standard deviation of duration times (in minutes) of continuous rainfall in the summer monsoon is less than 35 minutes.

Exercise 4.2 For each of the following do the hypothesis test and give proper reasoning if you “fail to reject” or “reject” the null hypothesis(\(H_0\)):

A test is made of \(H_0: \mu = 45\) versus \(H_{A}: \mu > 43\). A sample size of \(n = 90\) is drawn and the sample mean is found to be 42. The population is known to be Normal with standard deviation \(\sigma = 20\). Decide if there is evidence to reject the null hypothesis at 5% level of significance.
A test is made of \(H_0: \mu = 42\) versus \(H_A: \mu \not= 42\). The population is known to be Normal with known variance. The test statistic for the \(Z-\)test is found to be -2.71. Decide if there is evidence to reject the null hypothesis at 1% level of significance.
In a simple random sample of size 95, there were 66 individuals in the category of interest. It is desired to test \(H_0: \mu = 0.78\) versus \(H_A: \mu < 0.78\). Would you reject the null hypothesis at the 5% level of significance?

Exercise 4.3 If one sees 80 heads in 100 tosses, can one reasonably conclude that the coin is biased? Explain your answer in the context of hypothesis testing.

Exercise 4.4 The test statistic for hypothesis tests involving a single proportion is given by: \(\frac{p-\hat{p}}{\sqrt{\frac{p(1-p)}{n}}}\). Find the value of the test statistic for the claim that the proportion of faculty with black hair equals 0.25, where the sample involved includes 580 faculty with 152 of them having black hair.

Exercise 4.5 For each situation below, decide if you “reject” or “fail to reject” the null hypothesis:

The test statistic in a left-tailed test is \(z = -1.25\)
The test statistic in a two-tailed test is \(z = 1.75\)
With \(H_A: \mu \not= 0.707\), the test statistic is \(z = -2.75\)
With \(H_A: \mu > 0.25\), the test statistic is \(z = 2.30\)

Exercise 4.6 Suppose we wish to test if the coin given to us is fair.

We toss it a 100 times and find that there are 45 heads. Using the inbuilt prop.test() in R, describe each output of the command prop.test(45, 100).
Suppose we toss the coin a 10000 times and find that there are 4500 heads. Then will you conclude that the coin is fair?

Exercise 4.7 Suppose Doddapple manufactures claims that their batteries last 25 years. Students from B.Nothing (Hons.) sample 10 users and find the sample mean time for battery life was 21 with a sample standard deviation of 1.7. Is Doddapple claim believable ?

Exercise 4.8 Super-shakti-malt is supposed to help patients recover faster from common cold. Recovery time is measured in Days. A placebo group is also used. The data is as follows:

Super-shakti-malt	9	1	4	1	3	2	15	3	8	2
Placebo	9	8	6	2	8	1	10	4	9	6

Perform a Box-plot to see if the equal means and variances are valid.
Using the inbuilt t.test() under the assumptions that variances are equal, decide if the means are equal. Describe each output of the command. Please explain all the inferences you can make from the output.
Using the inbuilt t.test() under the assumptions that variances are unequal, decide if the means are equal. Describe each output of the command. Please explain all the inferences you can make from the output.
Explain if any the differences in the results in part (b) and part (c).

Exercise 4.9 (Two sample test of means for paired data) Let \(n \geq 1, X_1, X_2, \dots, X_n\) and \(Y_1, Y_2, \dots, Y_{n}\) be two samples, that are paired. Then the test-statistic is given by: \[\begin{equation*} T := \frac{(\overline{X} - \overline{Y}) - (\mu_{X} - \mu_{Y})}{\frac{S}{\sqrt{n}}} \qquad \textsf{where} \qquad S^2 := \frac{1}{n-1}\sum_{i=1}^{n} \left( z_{i} - \overline{z} \right)^2 \qquad \textsf{with} \qquad z_{i} = x_{i} - y_{i} \end{equation*}\] with \(T \sim t_{n-1}\)

Write a function paired_t_test() that performs the above test for paired data \(x\) and \(y\).
Consider the shoes dataset from the package MASS. It contains shoe wear data. There is a list of two vectors, giving the wear of shoes of materials \(A\) and \(B\) for one foot each of ten boys.
1. Using the function paired_t_test() perform the paired \(t-\)test on the above data to see if the two types of shoes have different mean wear amounts.
2. Verify the same using the inbuilt function t.test(A, B, paired = TRUE).
3. If you were to use the function t.test() without the paired = TRUE parameter then can you still perform the above test?

Exercise 4.10 Suppose Somadev finds that his weight in kgs during each month of year to be

75, 76, 73, 75, 74, 73, 73, 76, 73, 79, 77, 75

Write a function called z.ci() that takes in the weights above as a vector x, assumes a known standard deviation of 1.5 and produces default 95% confidence interval.
Write a function called t.ci() that takes in the weights above as a vector x, assumes that variance is unknown and produces default 95% confidence interval.
Use the inbuilt t.test() command on the vector x(as above), describe each output of the command t.test(x). Please explain all the inferences you can make from the output.

Exercise 4.11 Siva decides suddenly that fairness is important when it comes to grading. He mandates that each quiz is graded twice by Siva and Sarvesh. The data is as follows:

Siva	30	0	50	22	55	50	55	40	44	60
Sarvesh	20	10	40	11	44	30	33	20	33	60

Using the t.test() function, perform the paired t-test to see if the scores are statistically different.
Suppose we assume that two sets of students are independent, decide if the scores are statistically different.