Cantelli's inequality

In probability theory, Cantelli's inequality (also called the Chebyshev-Cantelli inequality and the one-sided Chebyshev inequality) is an improved version of Chebyshev's inequality for one-sided tail bounds.[1][2][3] The inequality states that, for λ > 0 , {\displaystyle \lambda >0,}

Pr ( X E [ X ] λ ) σ 2 σ 2 + λ 2 , {\displaystyle \Pr(X-\mathbb {E} [X]\geq \lambda )\leq {\frac {\sigma ^{2}}{\sigma ^{2}+\lambda ^{2}}},}

where

X {\displaystyle X} is a real-valued random variable,
Pr {\displaystyle \Pr } is the probability measure,
E [ X ] {\displaystyle \mathbb {E} [X]} is the expected value of X {\displaystyle X} ,
σ 2 {\displaystyle \sigma ^{2}} is the variance of X {\displaystyle X} .

Applying the Cantelli inequality to X {\displaystyle -X} gives a bound on the lower tail,

Pr ( X E [ X ] λ ) σ 2 σ 2 + λ 2 . {\displaystyle \Pr(X-\mathbb {E} [X]\leq -\lambda )\leq {\frac {\sigma ^{2}}{\sigma ^{2}+\lambda ^{2}}}.}

While the inequality is often attributed to Francesco Paolo Cantelli who published it in 1928,[4] it originates in Chebyshev's work of 1874.[5] When bounding the event random variable deviates from its mean in only one direction (positive or negative), Cantelli's inequality gives an improvement over Chebyshev's inequality. The Chebyshev inequality has "higher moments versions" and "vector versions", and so does the Cantelli inequality.

Comparison to Chebyshev's inequality

For one-sided tail bounds, Cantelli's inequality is better, since Chebyshev's inequality can only get

Pr ( X E [ X ] λ ) Pr ( | X E [ X ] | λ ) σ 2 λ 2 . {\displaystyle \Pr(X-\mathbb {E} [X]\geq \lambda )\leq \Pr(|X-\mathbb {E} [X]|\geq \lambda )\leq {\frac {\sigma ^{2}}{\lambda ^{2}}}.}

On the other hand, for two-sided tail bounds, Cantelli's inequality gives

Pr ( | X E [ X ] | λ ) = Pr ( X E [ X ] λ ) + Pr ( X E [ X ] λ ) 2 σ 2 σ 2 + λ 2 , {\displaystyle \Pr(|X-\mathbb {E} [X]|\geq \lambda )=\Pr(X-\mathbb {E} [X]\geq \lambda )+\Pr(X-\mathbb {E} [X]\leq -\lambda )\leq {\frac {2\sigma ^{2}}{\sigma ^{2}+\lambda ^{2}}},}

which is always worse than Chebyshev's inequality (when λ σ {\displaystyle \lambda \geq \sigma } ; otherwise, both inequalities bound a probability by a value greater than one, and so are trivial).

Proof

Let X {\displaystyle X} be a real-valued random variable with finite variance σ 2 {\displaystyle \sigma ^{2}} and expectation μ {\displaystyle \mu } , and define Y = X E [ X ] {\displaystyle Y=X-\mathbb {E} [X]} (so that E [ Y ] = 0 {\displaystyle \mathbb {E} [Y]=0} and Var ( Y ) = σ 2 {\displaystyle \operatorname {Var} (Y)=\sigma ^{2}} ).

Then, for any u 0 {\displaystyle u\geq 0} , we have

Pr ( X E [ X ] λ ) = Pr ( Y λ ) = Pr ( Y + u λ + u ) Pr ( ( Y + u ) 2 ( λ + u ) 2 ) E [ ( Y + u ) 2 ] ( λ + u ) 2 = σ 2 + u 2 ( λ + u ) 2 . {\displaystyle \Pr(X-\mathbb {E} [X]\geq \lambda )=\Pr(Y\geq \lambda )=\Pr(Y+u\geq \lambda +u)\leq \Pr((Y+u)^{2}\geq (\lambda +u)^{2})\leq {\frac {\mathbb {E} [(Y+u)^{2}]}{(\lambda +u)^{2}}}={\frac {\sigma ^{2}+u^{2}}{(\lambda +u)^{2}}}.}

the last inequality being a consequence of Markov's inequality. As the above holds for any choice of u R {\displaystyle u\in \mathbb {R} } , we can choose to apply it with the value that minimizes the function u 0 σ 2 + u 2 ( λ + u ) 2 {\displaystyle u\geq 0\mapsto {\frac {\sigma ^{2}+u^{2}}{(\lambda +u)^{2}}}} . By differentiating, this can be seen to be u = σ 2 λ {\displaystyle u_{\ast }={\frac {\sigma ^{2}}{\lambda }}} , leading to

Pr ( X E [ X ] λ ) σ 2 + u 2 ( λ + u ) 2 = σ 2 λ 2 + σ 2 {\displaystyle \Pr(X-\mathbb {E} [X]\geq \lambda )\leq {\frac {\sigma ^{2}+u_{\ast }^{2}}{(\lambda +u_{\ast })^{2}}}={\frac {\sigma ^{2}}{\lambda ^{2}+\sigma ^{2}}}} if λ > 0 {\displaystyle \lambda >0}

Generalizations

Various stronger inequalities can be shown. He, Zhang, and Zhang showed[6] (Corollary 2.3) when E [ X ] = 0 , E [ X 2 ] = 1 {\displaystyle \mathbb {E} [X]=0,\,\mathbb {E} [X^{2}]=1} and λ 0 {\displaystyle \lambda \geq 0} :

Pr ( X λ ) 1 ( 2 3 3 ) ( 1 + λ 2 ) 2 E [ X 4 ] + 6 λ 2 + λ 4 . {\displaystyle \Pr(X\geq \lambda )\leq 1-(2{\sqrt {3}}-3){\frac {(1+\lambda ^{2})^{2}}{\mathbb {E} [X^{4}]+6\lambda ^{2}+\lambda ^{4}}}.}

In the case λ = 0 {\displaystyle \lambda =0} this matches a bound in Berger's "The Fourth Moment Method",[7]

Pr ( X 0 ) 2 3 3 E [ X 4 ] . {\displaystyle \Pr(X\geq 0)\geq {\frac {2{\sqrt {3}}-3}{\mathbb {E} [X^{4}]}}.}

This improves over Cantelli's inequality in that we can get a non-zero lower bound, even when E [ X ] = 0 {\displaystyle \mathbb {E} [X]=0} .

See also

References

  1. ^ Boucheron, Stéphane (2013). Concentration inequalities : a nonasymptotic theory of independence. Gábor Lugosi, Pascal Massart. Oxford: Oxford University Press. ISBN 978-0-19-953525-5. OCLC 829910957.
  2. ^ "Tail and Concentration Inequalities" by Hung Q. Ngo
  3. ^ "Concentration-of-measure inequalities" by Gábor Lugosi
  4. ^ Cantelli, F. P. (1928), "Sui confini della probabilita," Atti del Congresso Internazional del Matematici, Bologna, 6, 47-5
  5. ^ Ghosh, B.K., 2002. Probability inequalities related to Markov's theorem. The American Statistician, 56(3), pp.186-190
  6. ^ He, S.; Zhang, J.; Zhang, S. (2010). "Bounding probability of small deviation: A fourth moment approach". Mathematics of Operations Research. 35 (1): 208–232. doi:10.1287/moor.1090.0438. S2CID 11298475.
  7. ^ Berger, Bonnie (August 1997). "The Fourth Moment Method". SIAM Journal on Computing. 26 (4): 1188–1207. doi:10.1137/S0097539792240005. ISSN 0097-5397. S2CID 14313557.