\[ \newcommand{\nRV}[2]{{#1}_1, {#1}_2, \ldots, {#1}_{#2}} \newcommand{\pnRV}[3]{{#1}_1^{#3}, {#1}_2^{#3}, \ldots, {#1}_{#2}^{#3}} \newcommand{\onRV}[2]{{#1}_{(1)} \le {#1}_{(2)} \le \ldots \le {#1}_{(#2)}} \newcommand{\RR}{\mathbb{R}} \newcommand{\Prob}[1]{\mathbb{P}\left({#1}\right)} \newcommand{\PP}{\mathcal{P}} \newcommand{\iidd}{\overset{\mathsf{iid}}{\sim}} \newcommand{\X}{\times} \newcommand{\EE}[1]{\mathbb{E}\left[{#1}\right]} \newcommand{\Var}[1]{\mathsf{Var}\left({#1}\right)} \newcommand{\Ber}[1]{\mathsf{Ber}\left({#1}\right)} \newcommand{\Geom}[1]{\mathsf{Geom}\left({#1}\right)} \newcommand{\Bin}[1]{\mathsf{Bin}\left({#1}\right)} \newcommand{\Poi}[1]{\mathsf{Pois}\left({#1}\right)} \newcommand{\Exp}[1]{\mathsf{Exp}\left({#1}\right)} \newcommand{\SD}[1]{\mathsf{SD}\left({#1}\right)} \newcommand{\sgn}[1]{\mathsf{sgn}} \newcommand{\dd}[1]{\operatorname{d}\!{#1}} \]
4.1 Introduction
So far, we have discussed models, especially the Linear Model, where the ingredients were the population of interest, sample data from the population, and the assumption that the data came from a certain model, allowing us to estimate the corresponding parameters.
In this chapter, we will shift our focus and will take an interest to the following questions:
- Are two sub-population “different” or “same”?
- Are the measured attributes independent of each other?
For example,
- Are temperatures today higher than they were 100 years ago?
- Does smoking reduce life expectency?
- Is treatment A genuinely different from treatment B?
In all of the above cases, we are interested in making inferences about how the value of a parameter relates to a specific numerical value. Is it less than, equal to, or greater than a specified given number? This type of inference is called Hypothesis Testing.
4.1.1 Elements of Hypothesis testing
Definition 4.1 (Hypothesis) A statistical hypothesis is a statement about the numerical value of a population parameter.
Hypothesis testing involves hypothesizing a statement about the data and developing a test that will determine whether the hypothesis is reasonable or not. The key steps in hypothesis testing are as follows:
- Make a conjecture: Formulate a null hypothesis and an alternative hypothesis.
- Perform statistical computation: Calculate the probability or statistical measure to test the conjecture.
During the computation step, various probabilistic distributions learned earlier will appear in different tests. The likelihood of the observed data under the null hypothesis is computed to assess the validity of the hypothesis.
Let’s consider an example:
Suppose we have a coin, and we are interested in the probability of showing heads when the coin is tossed. We toss the coin 100 times and record the outcome as \(X_1, X_2, \dots, X_{100} \overset{\mathrm{iid}}{\sim} \Ber{p}\). Let’s say we observed that \(\sum_{i=1}^{100} X_i = 67\). Up to this point, we have used this data to estimate the value of \(p\) using methods like Method of Moments (MOM) and Maximum Likelihood Estimation (MLE).
Now, let’s ask a different question: Is the hypothesis \(p=0.5\) valid given the observed findings? In other words, we conjecture that \(p=0.5\) and want to test this hypothesis.
Observe that if the coin had an equal chance of showing heads and tails, the probability of observing 67 heads in 100 tosses is approximately 0.04: \[ \binom{100}{67} \left( \frac{1}{2} \right)^{100} \approx 0.04 \]
Now, let’s delve into hypothesis testing.
\(Z-\)test:
Suppose \(X_1, X_2, \dots, X_n \overset{\mathrm{iid}}{\sim} N(\mu, \sigma^2)\), where \(\sigma\) is known but \(\mu\) is unknown. We compute the sample mean as follows: \[ \overline{X} = \frac{X_1 + X_2 + \cdots + X_n}{n} \]
The question is whether the null hypothesis \(\mu = c\) or the alternative hypothesis \(\mu > c\) is true.
Given \(\overline{X}\), we want to assess how likely it is to obtain a sample mean as large as the observed value \(\overline{X}\) if the null hypothesis were true.
To answer this, let \(Y_1, Y_2, \ldots, Y_n \overset{\mathrm{iid}}{\sim} N(c, \sigma^2)\). We compute the following probability: \[ \mathbb{P}\Big(\underset{\textsf{Random variable}}{\overline{Y}} \geq \underset{\textsf{Determistic quantity}}{\overline{X}}\Big) \]
This probability represents the likelihood of observing a sample mean as large as or larger than the observed value \(\overline{X}\), assuming that the null hypothesis \(H_0: \mu = c\) is true.
To compute this probability, we need to standardize the sample mean \(\overline{Y}\) under the null hypothesis. Since we know the population standard deviation \(\sigma\), we can use the \(Z-\)score formula:
\[ Z = \frac{\sqrt{n}(\overline{Y}-c)}{\sigma} \]
Under the null hypothesis, \(Z\) follows a standard normal distribution \(N(0,1)\). Therefore, we can compute the probability as:
\[\begin{align*} & \mathbb{P}({\overline{Y}} \geq \overline{X}) \\ & = \mathbb{P}\left(\frac{\sqrt{n}(\overline{Y}-c)}{\sigma} \geq \frac{\sqrt{n}(\overline{X}-c)}{\sigma}\right) \\ & = \mathbb{P}\left(Z \geq \frac{\sqrt{n}(\overline{X}-c)}{\sigma}\right) \end{align*}\]
To perform the hypothesis test, we choose a significance level \(\alpha\), which represents the threshold for rejecting the null hypothesis. Common choices for \(\alpha\) are 0.05 or 0.01. If the probability \(\mathbb{P}(\overline{Y} \geq \overline{X})\) is less than or equal to \(\alpha\), we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis.
Example 4.1 Consider a scenario where we have a random variable \(X\) that follows a normal distribution with mean \(\mu\) and standard deviation \(\sigma\), i.e., \(X \sim N(\mu, \sigma^2)\). Let’s suppose we draw a sample of size \(16\), denoted as \(X_1, X_2, \ldots, X_{16}\), from this distribution, and we observe that the sample mean is \(\overline{X} = 10.2\).
Now, let’s formulate the hypotheses for a statistical test:
- Null hypothesis, \(H_0: \mu = 9.5\)
- Alternative hypothesis, \(H_A: \mu > 9.5\)
- Level of significance, \(\alpha = 0.05\)
To test these hypotheses, we can use the standard normal distribution. We calculate the test statistic using the formula: \[ \frac{\sqrt{n}(\overline{X}-\mu_0)}{\sigma} = \frac{\sqrt{16}(10.2-9.5)}{3} \]
Using the standard normal distribution table, we can find the probability \(\Prob{Z \geq \frac{\sqrt{n}(\overline{X}-c)}{\sigma}}\)
Substituting the values, we get: \[ \Prob{Z \geq \frac{\sqrt{n}(\overline{X}-c)}{\sigma}} = \Prob{Z \geq \frac{4 \times 0.7}{3}} \approx 0.175\]
Since the level of significance \(\alpha\) is \(0.05\), we observe that \(\Prob{Z \geq \frac{\sqrt{n}(\overline{X}-c)}{\sigma}} = 0.175 > \alpha\). Therefore, we cannot reject the null hypothesis at the \(\alpha = 0.05\) level of significance. In other words, we do not have sufficient evidence to conclude that the population mean is greater than 9.5.
Example 4.1 Suppose we have the same distribution \(X \sim N(\mu, \sigma^2)\) and draw a sample of size \(16\) with a sample mean \(\overline{X} = 10.2\).
Now, we set up the following hypotheses:
- Null hypothesis, \(H_0: \mu = 8.5\)
- Alternative hypothesis, \(H_A: \mu > 8.5\)
- Level of significance, \(\alpha = 0.05\)
Using the same procedure as before, we calculate the test statistic: \[ \frac{\sqrt{n}(\overline{X}-c)}{\sigma} = \frac{\sqrt{16}(10.2-8.5)}{3} \]
Substituting the values, we find: \[ \Prob{Z \geq \frac{\sqrt{n}(\overline{X}-c)}{\sigma}} = \Prob{Z \geq \frac{4 \times 0.7}{3}} \approx 0.012\]
Since \(\Prob{Z \geq \frac{\sqrt{n}(\overline{X}-c)}{\sigma}} = 0.012 < \alpha\), we can reject the null hypothesis at the \(\alpha = 0.05\) level of significance. This means that we have sufficient evidence to conclude that the population mean is greater than 8.5.
Now, let’s explore a scenario in the context of a medical test:
Suppose a medical research team aims to determine the effectiveness of a newly developed vaccine for a new disease. The experiment involves the following steps:
- Select a sample of \(n\) individuals from the population, where \(n_1\) individuals receive the vaccine, and \(n_2 = n - n_1\) individuals receive a placebo.
- Allow a specific amount of time to elapse and observe the number of individuals affected by the disease.
- Summarize the findings in a \(2 \times 2\) table as follows:
Infected | Not Infected | ||
---|---|---|---|
Vaccine | \(X_{11}\) | \(X_{12}\) | \(n_1\) |
Placebo | \(X_{21}\) | \(X_{22}\) | \(n_2\) |
The research team can analyze these results to determine the effectiveness of the vaccine.
In this context, we start by assuming independence, implying that the vaccine has no effect unless there is evidence to suggest otherwise.
This experimental setup can be generalized to situations where we have two treatments applied to a group of experimental units, and one of two possible outcomes is recorded. We can define random variables \(X_{ij}\) to represent the number of participants given treatment \(i\) and experiencing outcome \(j\).
Now, let’s consider a specific example to illustrate the concept:
Suppose we assume that the chance of getting the disease is \(p\). We can ask: What is the distribution of \(X_{11}\), the number of participants given the vaccine and infected, under this assumption?
To make decisions regarding the effectiveness of the vaccine, we can compare the observed value of \(X_{11}\) with the expected values based on the assumption of independence. If the observed value falls among the more likely possibilities under the assumption, we have no reason to suspect the vaccine’s ineffectiveness. However, if the observed value falls among the “impossible” possibilities, we may have evidence to reject the hypothesis of vaccine ineffectiveness.
Let’s consider another approach, which involves testing in a parametric setup, providing a more intuitive way to analyze statistical hypotheses.
To illustrate this approach, let’s look at the following example.
Example 4.2 (Coin Toss) Suppose we have a coin, and we are interested in determining the probability \(p\) of showing heads when the coin is tossed. To investigate this, we toss the coin \(100\) times and record the sample data as \(X_1, X_2, \ldots, X_{100}\), where \(X_i\) follows a Bernoulli distribution \(\Ber{p}\).
Now, the question arises: Is \(p=0.5\), or is \(p \neq 0.5\)?
To answer this question, we compute the probability of observing a sum of \(67\) heads out of \(100\) tosses under the assumption that \(p = 0.5\): \[ \Prob{\sum_{i=1}^{100} X_i = 67 \mid p = 0.5} \]
Based on this computation, we can draw conclusions about the accuracy of the hypothesis. Additionally, we can estimate the maximum likelihood estimate (MLE) of \(p\) using the observed sample \(X_1, X_2, \ldots, X_{100}\) and provide a confidence interval for \(p\).
In general, the broad procedure for conducting a statistical test is as follows:
- Consider an independent and identically distributed (i.i.d.) sample \(X_1, X_2, \ldots, X_{100}\) from a random variable \(X\).
- Assume that \(X\) has a probability mass function (PMF) or probability density function (PDF) denoted as \(f(x|p)\), where \(p\) belongs to a parameter space \(\mathcal{P} \subseteq \mathbb{R}^\theta\).
Next, we formulate a hypothesis, such as \(p = 0.5\), where we restrict the possible values of \(p\) within a parameter space \(\mathbb{P}\).
To test this hypothesis, we devise a computation to find a test statistic, which is a function of the sample \(X_1, X_2, \dots, X_n\).
By analyzing the test statistic and comparing its value to a critical value or computing the corresponding p-value, we can draw conclusions about the hypothesis and gain insights into the underlying phenomenon we are investigating.
In summary, statistical testing involves formulating hypotheses, selecting appropriate test statistics, conducting computations, and making inferences based on the results. This process allows us to assess the validity of hypotheses and gain a deeper understanding of the parameters of interest.