mirror of https://github.com/python/cpython
Refine statistics.NormalDist documentation and improve test coverage (GH-12208)
This commit is contained in:
parent
318d537daa
commit
1f58f4fa6a
|
@ -479,7 +479,7 @@ measurements as a single entity.
|
|||
|
||||
Normal distributions arise from the `Central Limit Theorem
|
||||
<https://en.wikipedia.org/wiki/Central_limit_theorem>`_ and have a wide range
|
||||
of applications in statistics, including simulations and hypothesis testing.
|
||||
of applications in statistics.
|
||||
|
||||
.. class:: NormalDist(mu=0.0, sigma=1.0)
|
||||
|
||||
|
@ -492,19 +492,19 @@ of applications in statistics, including simulations and hypothesis testing.
|
|||
|
||||
.. attribute:: mean
|
||||
|
||||
A read-only property representing the `arithmetic mean
|
||||
A read-only property for the `arithmetic mean
|
||||
<https://en.wikipedia.org/wiki/Arithmetic_mean>`_ of a normal
|
||||
distribution.
|
||||
|
||||
.. attribute:: stdev
|
||||
|
||||
A read-only property representing the `standard deviation
|
||||
A read-only property for the `standard deviation
|
||||
<https://en.wikipedia.org/wiki/Standard_deviation>`_ of a normal
|
||||
distribution.
|
||||
|
||||
.. attribute:: variance
|
||||
|
||||
A read-only property representing the `variance
|
||||
A read-only property for the `variance
|
||||
<https://en.wikipedia.org/wiki/Variance>`_ of a normal
|
||||
distribution. Equal to the square of the standard deviation.
|
||||
|
||||
|
@ -584,8 +584,8 @@ of applications in statistics, including simulations and hypothesis testing.
|
|||
Dividing a constant by an instance of :class:`NormalDist` is not supported.
|
||||
|
||||
Since normal distributions arise from additive effects of independent
|
||||
variables, it is possible to `add and subtract two normally distributed
|
||||
random variables
|
||||
variables, it is possible to `add and subtract two independent normally
|
||||
distributed random variables
|
||||
<https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables>`_
|
||||
represented as instances of :class:`NormalDist`. For example:
|
||||
|
||||
|
@ -607,15 +607,15 @@ of applications in statistics, including simulations and hypothesis testing.
|
|||
|
||||
For example, given `historical data for SAT exams
|
||||
<https://blog.prepscholar.com/sat-standard-deviation>`_ showing that scores
|
||||
are normally distributed with a mean of 1060 and standard deviation of 192,
|
||||
are normally distributed with a mean of 1060 and a standard deviation of 192,
|
||||
determine the percentage of students with scores between 1100 and 1200:
|
||||
|
||||
.. doctest::
|
||||
|
||||
>>> sat = NormalDist(1060, 195)
|
||||
>>> fraction = sat.cdf(1200) - sat.cdf(1100)
|
||||
>>> fraction = sat.cdf(1200 + 0.5) - sat.cdf(1100 - 0.5)
|
||||
>>> f'{fraction * 100 :.1f}% score between 1100 and 1200'
|
||||
'18.2% score between 1100 and 1200'
|
||||
'18.4% score between 1100 and 1200'
|
||||
|
||||
What percentage of men and women will have the same height in `two normally
|
||||
distributed populations with known means and standard deviations
|
||||
|
@ -644,20 +644,12 @@ model:
|
|||
|
||||
Normal distributions commonly arise in machine learning problems.
|
||||
|
||||
Wikipedia has a `nice example with a Naive Bayesian Classifier
|
||||
<https://en.wikipedia.org/wiki/Naive_Bayes_classifier>`_. The challenge
|
||||
is to guess a person's gender from measurements of normally distributed
|
||||
features including height, weight, and foot size.
|
||||
Wikipedia has a `nice example of a Naive Bayesian Classifier
|
||||
<https://en.wikipedia.org/wiki/Naive_Bayes_classifier>`_. The challenge is to
|
||||
predict a person's gender from measurements of normally distributed features
|
||||
including height, weight, and foot size.
|
||||
|
||||
The `prior probability <https://en.wikipedia.org/wiki/Prior_probability>`_ of
|
||||
being male or female is 50%:
|
||||
|
||||
.. doctest::
|
||||
|
||||
>>> prior_male = 0.5
|
||||
>>> prior_female = 0.5
|
||||
|
||||
We also have a training dataset with measurements for eight people. These
|
||||
We're given a training dataset with measurements for eight people. The
|
||||
measurements are assumed to be normally distributed, so we summarize the data
|
||||
with :class:`NormalDist`:
|
||||
|
||||
|
@ -670,8 +662,8 @@ with :class:`NormalDist`:
|
|||
>>> foot_size_male = NormalDist.from_samples([12, 11, 12, 10])
|
||||
>>> foot_size_female = NormalDist.from_samples([6, 8, 7, 9])
|
||||
|
||||
We observe a new person whose feature measurements are known but whose gender
|
||||
is unknown:
|
||||
Next, we encounter a new person whose feature measurements are known but whose
|
||||
gender is unknown:
|
||||
|
||||
.. doctest::
|
||||
|
||||
|
@ -679,19 +671,23 @@ is unknown:
|
|||
>>> wt = 130 # weight
|
||||
>>> fs = 8 # foot size
|
||||
|
||||
The posterior is the product of the prior times each likelihood of a
|
||||
feature measurement given the gender:
|
||||
Starting with a 50% `prior probability
|
||||
<https://en.wikipedia.org/wiki/Prior_probability>`_ of being male or female,
|
||||
we compute the posterior as the prior times the product of likelihoods for the
|
||||
feature measurements given the gender:
|
||||
|
||||
.. doctest::
|
||||
|
||||
>>> prior_male = 0.5
|
||||
>>> prior_female = 0.5
|
||||
>>> posterior_male = (prior_male * height_male.pdf(ht) *
|
||||
... weight_male.pdf(wt) * foot_size_male.pdf(fs))
|
||||
|
||||
>>> posterior_female = (prior_female * height_female.pdf(ht) *
|
||||
... weight_female.pdf(wt) * foot_size_female.pdf(fs))
|
||||
|
||||
The final prediction is awarded to the largest posterior -- this is known as
|
||||
the `maximum a posteriori
|
||||
The final prediction goes to the largest posterior. This is known as the
|
||||
`maximum a posteriori
|
||||
<https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation>`_ or MAP:
|
||||
|
||||
.. doctest::
|
||||
|
|
|
@ -2123,6 +2123,7 @@ class TestNormalDist(unittest.TestCase):
|
|||
0.3605, 0.3589, 0.3572, 0.3555, 0.3538,
|
||||
]):
|
||||
self.assertAlmostEqual(Z.pdf(x / 100.0), px, places=4)
|
||||
self.assertAlmostEqual(Z.pdf(-x / 100.0), px, places=4)
|
||||
# Error case: variance is zero
|
||||
Y = NormalDist(100, 0)
|
||||
with self.assertRaises(statistics.StatisticsError):
|
||||
|
@ -2262,7 +2263,7 @@ class TestNormalDist(unittest.TestCase):
|
|||
self.assertEqual(X * y, NormalDist(1000, 150)) # __mul__
|
||||
self.assertEqual(y * X, NormalDist(1000, 150)) # __rmul__
|
||||
self.assertEqual(X / y, NormalDist(10, 1.5)) # __truediv__
|
||||
with self.assertRaises(TypeError):
|
||||
with self.assertRaises(TypeError): # __rtruediv__
|
||||
y / X
|
||||
|
||||
def test_equality(self):
|
||||
|
|
Loading…
Reference in New Issue