4.3 Likelihood Approach

In the general approach, we make the following assumptions:

We assume that a random variable \(X\) has a probability density function (pdf) or probability mass function (pmf) \(f(\cdot | p)\) with \(p \in \PP \subseteq \RR\).
We have a sample \(X_1, X_2, \dots, X_n \iidd X\).
The likelihood function of the sample \(X_1, X_2, \dots, X_n\) is defined as \(L(p; X_1, X_2, \dots, X_n) = \prod_{i=1}^{n} f(X_i | p)\).

Recall that the maximum likelihood estimator (MLE) is given by \(\hat{p} = \argmax_{p \in \PP}L(p; X_1, X_2, \dots, X_n)\).

We can view a hypothesis test as a restriction of \(\PP\) to a smaller subset \(\PP_0\). For example, in the “intuitive approach” discussed earlier, we have \(\PP = \{c\}\) and the hypotheses are \(H_0: \mu \in \PP_0\) and \(H_A: \mu \not\in \PP_0\).

4.3.1 MLE Approach under Null Hypothesis with \(p \in {\PP}_0 \subset \PP\)

To apply the MLE approach under the null hypothesis, we consider \(\hat{p} = \argmax_{p \in \PP} L(p; X_1, X_2, \dots, X_n)\).

We define the likelihood ratio as follows: \[ \lambda(X_1, X_2, \dots, X_n) = \frac{L(\hat{p}_0; X_1, X_2, \dots, X_n)}{L(\hat{p}; X_1, X_2, \dots, X_n)} \] and the log-likelihood ratio as: \[ \Lambda(X_1, X_2, \dots, X_n) = -\log{\lambda(X_1, X_2, \dots, X_n)} = -\log{\frac{L(\hat{p}_0; X_1, X_2, \dots, X_n)}{L(\hat{p}; X_1, X_2, \dots, X_n)}} \]

We have \(\Lambda(X_1, X_2, \dots, X_n) \geq 0\) and \(\Lambda(X_1, X_2, \dots, X_n) = \log{\frac{L(\hat{p}; X_1, X_2, \dots, X_n)}{L(\hat{p}_0; X_1, X_2, \dots, X_n)}}\).

If \(\hat{p}\) is far from \(\PP_0\) in terms of \(L\), the null hypothesis \(\PP_0\) is less likely to be true, resulting in larger values of \(\Lambda\).

4.3.2 \(Z\)-Test with Log-Likelihood

Consider \(X \sim N(\mu, \sigma^2)\) with \(\mu \in \PP = \RR\) and \(\sigma\) known. The null and alternate hypotheses are defined as \(H_0: \mu = c\) and \(H_A: \mu \neq c\), where \(\PP_0 = \{c\}\).

For a given sample \(X_1, X_2, \dots, X_n\), the log-likelihood function for \(\mu\) is: \[ L(\mu; X_1, X_2, \dots, X_n) = \prod_{i=1}^{n} \frac{e^{-\frac{(X_i - \mu)^2}{2\sigma^2}}}{\sqrt{2\pi}\sigma} \]

We can compute the log-likelihood ratio as follows: \[\begin{align*} \Lambda(X_1, X_2, \dots, X_n) &= \log{\frac{L(\mu; X_1, X_2, \dots, X_n)}{L(\mu_0; X_1, X_2, \dots, X_n)}} \\ &= \log{\frac{L(\overline{X}; X_1, X_2, \dots, X_n)}{L(c; X_1, X_2, \dots, X_n)}} \\ &= \log{\frac{\prod_{i=1}^{n} \frac{e^{-\frac{(X_i - \overline{X})^2}{2\sigma^2}}}{\sqrt{2\pi}\sigma}}{\prod_{i=1}^{n} \frac{e^{-\frac{(X_i - c)^2}{2\sigma^2}}}{\sqrt{2\pi}\sigma}}} \\ &= \frac{1}{2} \frac{n}{\sigma^2}(\overline{X} - c)^2 \\ &= \frac{1}{2} \left( \frac{\sqrt{n}(\overline{X} - c)}{\sigma} \right)^2 \end{align*}\]

Let \(Y_1, Y_2, \dots, Y_n\) be iid random variables that imitate the sample under \(H_0\). We need to check: \[ \Prob{ \Lambda(Y_1, Y_2, \dots, Y_n) \geq \Lambda(X_1, X_2, \dots, X_n)} \]

We know that: \[\begin{align*} \Lambda(Y_1, Y_2, \dots, Y_n) &= \frac{1}{2} \left( \frac{\sqrt{n}(\overline{Y} - c)}{\sigma} \right)^2 \\ &= \frac{Z^2}{2} \end{align*}\] where \(Z = \frac{\sqrt{n}(\overline{Y} - c)}{\sigma} \sim N(0,1)\). Thus, the \(p\)-value can be computed as \(\Prob{Z^2 \geq \left( \frac{\sqrt{n}(\overline{X} - c)}{\sigma} \right)^2}\).

4.3.3 Testing if the Mean is Larger than \(c\)

Consider \(X \sim N(\mu, \sigma^2)\) with \(\sigma\) known. We have the hypotheses \(H_0: \mu \leq c\) and \(H_A: \mu > c\).

Given a sample \(X_1, X_2, \dots, X_n\) from the population, we can compute the log-likelihood ratio: \[ \Lambda(X_1, X_2, \dots, X_n) = \log{\frac{L(\hat{\mu}; X_1, X_2, \dots, X_n)}{L(\hat{\mu}_0; X_1, X_2, \dots, X_n)}} \] where \(\hat{\mu}_0 = \argmax_{\mu \in \PP_0} L(\mu; X_1, \ldots, X_n)\) with \(\PP_0 = (-\infty, c]\) and \(\hat{\mu} = \argmax_{\mu \in \PP} L(\mu; X_1, \ldots, X_n)\) with \(\PP = \RR\).

Finally, we can compute the \(p\)-value as \(\Prob{\frac{\sqrt{n}(\overline{Y} - c)}{\sigma} \geq \frac{\sqrt{n}(\overline{X} - c)}{\sigma}} = \Prob{Z \geq \frac{\sqrt{n}(\overline{X} - c)}{\sigma}}\).

Exercises

Exercise 4.12 (From Examples) Do the following while reading the examples:

\[ \textsf{If } \PP_0 \subseteq \PP \textsf{ then } 0 \leq \frac{L(\hat{p}_0; X_1, X_2, \dots, X_n)}{L(\hat{p}; X_1, X_2, \dots, X_n)} \leq 1 \]
Show that, \[\begin{align*} \hat{\mu} = \argmax_{\mu \in \PP} \ L(\mu; X_1, X_2, \dots, X_n) = \overline{X} \end{align*}\] and, \[\begin{align*} \hat{\mu}_0 = \argmax_{\mu \in \PP_0} L(\mu; X_1, X_2, \dots, X_n) = c \end{align*}\]
Prove that, \[ \log{\frac{\prod_{i=1}^{n} \frac{e^{-\frac{(X_i - \overline{X})^2}{2\sigma^2}}}{\sqrt{2\pi}\sigma}}{\prod_{i=1}^{n} \frac{e^{-\frac{(X_i - c)^2}{2\sigma^2}}}{\sqrt{2\pi}\sigma}}} = \frac{1}{2}\frac{n}{\sigma^2}(\overline{X} - c)^2 \]
Show the following:
1. \(\hat{\mu} = \overline{X}\)
2. \(\hat{\mu}_0 = \argmax_{\mu \in \PP_0} \prod_{i=1}^{n} \frac{e^{-\frac{(X_i - \mu)^2}{2\sigma^2}}}{\sqrt{2\pi}\sigma} = \min{\{\overline{X}, c\}}\)
3. \[\begin{align*} \Lambda(X_1, X_2, \dots, X_n) &= \log{\frac{L(\hat{\mu}; X_1, X_2, \dots, X_n)}{L(\hat{\mu}_0; X_1, X_2, \dots, X_n)}} \\ &= \begin{cases} 0 &\textsf{if } \ \overline{X} \leq c \\ \frac{n(\overline{X} - c)^2}{2\sigma^2} &\textsf{if } \ \overline{X} = c \end{cases} \end{align*}\]