4.5 Non-parametric tests

We have seen a variety of tests for continuous and discrete outcomes in hypothesis testing. Tests for continuous outcomes have focused on comparisons of means, while those for discrete outcomes have focused on proportions. These analyses, known as parametric tests, rely on assumptions such as the normal distribution of outcomes in the population, even though this may not hold in the observed sample. Despite this, many tests are robust and retain their validity when assumptions are violated, especially in large samples, as emphasised by the central limit theorem. For scenarios involving small sample sizes or unknown non-normally distributed outcomes, non-parametric tests, which make fewer distributional assumptions, become a suitable alternatives.

4.5.1 Sign test

The sign test, a non-parametric method, is suitable for testing whether the population median is equal to a specified value when testing hypotheses about the population median.

Given a random sample \(X_1, X_2, \ldots, X_n\) from a continuous distribution with median \(\theta\), the null hypothesis for the sign test is \(H_0: \theta = \theta_0\) against the alternative \(H_A: \theta \neq \theta_0\). The test statistic, denoted \(S^+\), is defined as

\[ S^+ = \sum_{i=1}^n I(X_i > \theta_0) \]

where \(I(X_i > \theta_0)\) is the indicator function, which takes the value 1 if \(X_i > \theta_0\) and 0 otherwise. So \(S^+\) calculates the number of observations greater than the hypothesised median. The null hypothesis is rejected if \(S^+\) is significantly different from the expected value.

Since we can think of the test statistic as the number of successes in a sequence of independent Bernoulli trials, the test statistic follows a binomial distribution with parameters \(n\) and \(p = 0.5\). The \(p-\)value is calculated as \[ p-\textsf{value} = \begin{cases} \Prob{S^+ \leq t} & \text{if } H_A: \theta > \theta_0 \\ \Prob{S^+ \geq t} & \text{if } H_A: \theta < \theta_0 \\ 2 \times \min \left\{ \Prob{S^+ \leq t}, \Prob{S^+ \geq t} \right\} & \text{if } H_A: \theta \neq \theta_0 \end{cases} \] where \(t\) is the observed value of the test statistic.

It’s important to note that the sign test is inherently a one-sided test, and to obtain a two-tailed test the \(p-\)value is doubled. This statistical approach allows us to assess whether the median of the paired differences is significantly different from a given value.

Example 4.6 Suppose we are interested in the preference between shorts and trousers. We interview 12 people and find that 10 prefer shorts to trousers, one prefers trousers to shorts and one has no preference. It is natural to ask whether there is a strong preference for shorts over trousers.

We set \(X_i = 1\) if the \(i\)th person prefers shorts to trousers, and \(X_i = -1\) if the \(i\)th person prefers trousers to shorts, and \(X_i = 0\) if the \(i\)th person has no preference. The null hypothesis is that \(H_0:\) there is no preference for shorts over trousers, and the alternative is that \(H_A:\) there is a preference for shorts over trousers.

Now \(S^+ = \sum_{i=1}^{12} I(X_i > 0) \sim \Bin{12, 0.5}\) and the observed test statistic is \(t = 10\). So the \(p-\)value is \[2 \times \Prob{S^+ \geq 10} \approx 0.03857 < 0.05\]. So we reject the null hypothesis at the \(5\%\) significance level.

4.5.2 Wilcoxon signed-rank test

The Wilcoxon signed-rank test is another non-parametric method used to test whether the population median is equal to a specified value when testing hypotheses about the population median.

Given a random sample \(X_1, X_2, \ldots, X_n\) with median \(\theta\), the test compares the null hypothesis \(H_0: \theta = \theta_0\) with the alternative \(H_A: \theta \neq \theta_0\).

\(W\) is defined as the signed rank sum, \[\begin{equation} W = \sum_{i=1}^n \sgn(X_i - \theta_0) \textsf{rank}(|X_i - \theta_0|) \tag{4.2} \end{equation}\] and the test statistic is defined as the positive-rank sum \[\begin{equation} W^+ = \sum_{i=1}^n I(X_i > \theta_0) \textsf{rank}(|X_i - \theta_0|) \tag{4.3} \end{equation}\]

The null hypothesis is rejected if \(W^+\) is significantly different from the expected value. The \(p-\)value is given by \[ p-\textsf{value} = \begin{cases} \Prob{W^+ \leq t} & \text{if } H_A: \theta > \theta_0 \\ \Prob{W^+ \geq t} & \text{if } H_A: \theta < \theta_0 \\ 2 \times \min \left\{ \Prob{W^+ \leq t}, \Prob{W^+ \geq t} \right\} & \text{if } H_A: \theta \neq \theta_0 \end{cases} \] where \(t\) is the observed value of the test statistic.

This test is one-sided by nature, and for a two-tailed test the \(p-\)value is doubled. The Wilcoxon signed-rank test provides a robust non-parametric alternative for situations where assumptions of normality or other parametric tests are not met.

4.5.3 Fisher’s exact test

Fisher’s exact test is a non-parametric method used to test the independence of two categorical variables, that is, whether the distribution of one categorical variable is independent of another. The test is based on the hypergeometric distribution. Consider two categorical variables, \(X\) and \(Y\), with categories \(r\) and \(c\) respectively. Let \(n_{ij}\) denote the number of observations in the \(i\)th category of \(X\) and the \(j\)th category of \(Y\). The null hypothesis is that the distributions of \(X\) and \(Y\) are independent.

The test statistic, denoted \(T\), quantifies the probability of observing the given data under the assumption of independence. It is calculated as \(T = \frac{n!}{n_{11}! n_{12}! \cdots n_{rc}!}\), where \(n = \sum_{i=1}^r \sum_{j=1}^c n_{ij}\). Under the null hypothesis, \(T\) follows a hypergeometric distribution with parameters \(n\), \(n_{1*}\) and \(n_{*1}\), where \(n_{1*} = \sum_{j=1}^c n_{1j}\) and \(n_{*1} = \sum_{i=1}^r n_{i1}\).

The \(p-\)value is given by \(\Prob{T \geq t}\), where \(t\) is the observed value of the test statistic. The null hypothesis is rejected if the \(p-\) value is less than the significance level.

Fisher’s exact test provides a powerful nonparametric approach that is particularly suited to scenarios involving small sample sizes or sparse categorical data, where the assumptions of other tests may not hold.

Exercises

Exercise 4.20 Show that \(W\) in ((4.2)) and \(W^+\) in ((4.3)) have the following relation \[ W^+ = \frac 12 \left( W + \frac{n(n+1)}{2} \right) \]

Exercise 4.21 At the ISI student mess the cook has invented a new hot drink, and would like to find out if it will be as popular as the existing Tea. For this purpose, the students arrange 18 participants for taste testing. Each participant tries both drinks in random order before giving his or her opinion. It turns out that 5 of the participants like the new drink better, and the rest prefer the existing Tea. Using the inbuilt sign test function binom.test(), decide if we can reject the notion that the two drinks are equally popular at \(0.05\) significance level?

Exercise 4.22 Suppose a study of computer usage of Mahacomp for the time he plays video games:

11.5, 0.5, 0.9, 0.4, 7.8, 7, 0.2, 2.5, 0.9, 2, 3, 15

Plot the histogram of the data and obtain a guess for the median of the dataset.
Use this R-code: wilcox.test(x, mu = 5, alt = "less") to decide if the distribution is centered around 5 against the alternative that it is centered less than 5.