\[ \newcommand{\nRV}[2]{{#1}_1, {#1}_2, \ldots, {#1}_{#2}} \newcommand{\pnRV}[3]{{#1}_1^{#3}, {#1}_2^{#3}, \ldots, {#1}_{#2}^{#3}} \newcommand{\onRV}[2]{{#1}_{(1)} \le {#1}_{(2)} \le \ldots \le {#1}_{(#2)}} \newcommand{\RR}{\mathbb{R}} \newcommand{\Prob}[1]{\mathbb{P}\left({#1}\right)} \newcommand{\PP}{\mathcal{P}} \newcommand{\iidd}{\overset{\mathsf{iid}}{\sim}} \newcommand{\X}{\times} \newcommand{\EE}[1]{\mathbb{E}\left[{#1}\right]} \newcommand{\Var}[1]{\mathsf{Var}\left({#1}\right)} \newcommand{\Ber}[1]{\mathsf{Ber}\left({#1}\right)} \newcommand{\Geom}[1]{\mathsf{Geom}\left({#1}\right)} \newcommand{\Bin}[1]{\mathsf{Bin}\left({#1}\right)} \newcommand{\Poi}[1]{\mathsf{Pois}\left({#1}\right)} \newcommand{\Exp}[1]{\mathsf{Exp}\left({#1}\right)} \newcommand{\SD}[1]{\mathsf{SD}\left({#1}\right)} \newcommand{\sgn}[1]{\mathsf{sgn}} \newcommand{\dd}[1]{\operatorname{d}\!{#1}} \]
2.3 Markov and Chebyshev’s Inequality
Theorem 2.3 (Markov Inequality) Let \(X\) be a non-negative random variable with mean \(\mu\). Then for any \(t > 0\), \[ \Prob{ X \geq t } \leq \frac{\EE{X}}{t} \]
Proof (Continuous Case). Since \(X\) is a non-negative random variable, \(f_X(x) = 0\) for \(x < 0\). Hence, \[\EE{X} = \int_{0}^{\infty} x f_X(x) \dd{x}\] Fix \(t > 0\). Then, \[\begin{align*} \EE{X} & = \int_{0}^{\infty} x f_X(x) \dd{x} \\ & \geq \int_{t}^{\infty} x f_X(x) \dd{x} \\ & \geq \int_{t}^{\infty} t f_X(x) \dd{x} \\ & = t \int_{t}^{\infty} f_X(x) \dd{x} \\ & = t \Prob{X \geq t} \end{align*}\] Hence, \[\Prob{X \geq t} \leq \frac{\EE{X}}{t}\]
Markov’s inequality is useful for giving upper bounds on the probability of certain events. We will use it to prove the following theorem.
Theorem 2.4 (Chzebyshev's Inequality) Let \(X\) be a random variable with mean \(\mu\) and variance \(\sigma^2\). Then for any \(t > 0\), \[\Prob{\left| X-\mu \right| > t} \leq \frac{\Var{X}}{t^2}\]
Proof. Note that, the event \(\{\left| X-\mu \right| > t\}\) is equivalent to the event \(\{(X-\mu)^2 > t^2\}\). Hence, by Markov’s Inequality, \[\begin{align*} \Prob{\left| X-\mu \right| > t} & = \Prob{(X-\mu)^2 > t^2} \\ & \leq \frac{\EE{(X-\mu)^2}}{t^2} \\ & = \frac{\Var{X}}{t^2} \end{align*}\]
To understand Chebyshev’s Inequality, consider the following. Let \(X\) be a random variable with mean \(\mu\) and variance \(\sigma^2\). Then, for any \(k > 0\), \[\Prob{\left| X-\mu \right| > k\sigma} \leq \frac{\Var{X}}{k^2\sigma^2} = \frac{1}{k^2}\]
For any \(k > 0\), the probability that \(X\) is more than \(k\) standard deviations away from its mean is bounded by \(\frac{1}{k^2}\). In other words, the probability that \(X\) is within \(k\) standard deviations of its mean is at least \(1-\frac{1}{k^2}\). Note that this version is only useful when \(k > 1\).
For example, the probability of \(X\) falling within \(2\) standard deviations of its mean is at least \(1-\frac{1}{4} = \frac{3}{4} = 75\%\). In practice, almost \(75\%\) of the probability mass of \(X\) is within \(2\) standard deviations of its mean.
Similarly, the probability that \(X\) is within \(3\) standard deviations of its mean is at least \(1-\frac{1}{9} = \frac{8}{9} \approx 89\%\). About \(89\%\) of the probability mass of \(X\) is concentrated within \(3\) standard deviations of its mean.
Exercises
Exercise 2.4 Do the following:
- Find a random variable \(X\) with \(\textsf{Range}(X) = \{-1, 0, 1\}\) such that \[\Prob{\left| X-\mu \right|} = \frac{1}{4}\], with \(\mu = \EE{X}\) and \(\sigma^2 = \Var{X}\).
- Construct another random variable \(Y\) (different from \(X\)) with \(\textsf{Range}(X) = \{y_1, y_2, y_3\}\), mean \(\mu\) and with \[\Prob{\left| Y-\mu \right| > 2\sigma} > \Prob{\left| X-\mu \right| > 2\sigma}\] so as to get \[\Prob{\left| Y-\mu \right| >2\sigma} > \frac{1}{4}\] Decide whether Chebyshev’s Inequality is violated?
- Write an R-code that takes an input \(k\), and constructs a random variable \(X\) with \(\textsf{Range}(X) = \{-1, 0, 1\}\) such that \[\Prob{\left| X-\mu \right| > k\sigma} = \frac{1}{k^2}\] with \(\mu = \EE{X}\) and \(\sigma^2 = \Var{X}\). Further the R-code should construct a random variable \(Y\) (different from \(X\)) with \(\textsf{Range}(X) = \{y_1, y_2, y_3\}\), mean \(\mu\) so that \[\Prob{\left| Y-\mu \right| > k\sigma} > \frac{1}{k^2}\] and (using replications) verifies your conclusion about Chebyshev’s inequality in (b). It should save the entire output using
write.csv
as a (suitably designed) csv file.