More refinements to the statistics docs (GH-15713) (GH-15715)
(cherry picked from commit d8c93aa5d2
)
Co-authored-by: Raymond Hettinger <rhettinger@users.noreply.github.com>
This commit is contained in:
parent
7eaeddad75
commit
b8c66779c7
|
@ -19,17 +19,21 @@
|
||||||
--------------
|
--------------
|
||||||
|
|
||||||
This module provides functions for calculating mathematical statistics of
|
This module provides functions for calculating mathematical statistics of
|
||||||
numeric (:class:`Real`-valued) data.
|
numeric (:class:`~numbers.Real`-valued) data.
|
||||||
|
|
||||||
.. note::
|
The module is not intended to be a competitor to third-party libraries such
|
||||||
|
as `NumPy <https://numpy.org>`_, `SciPy <https://www.scipy.org/>`_, or
|
||||||
|
proprietary full-featured statistics packages aimed at professional
|
||||||
|
statisticians such as Minitab, SAS and Matlab. It is aimed at the level of
|
||||||
|
graphing and scientific calculators.
|
||||||
|
|
||||||
Unless explicitly noted otherwise, these functions support :class:`int`,
|
Unless explicitly noted, these functions support :class:`int`,
|
||||||
:class:`float`, :class:`decimal.Decimal` and :class:`fractions.Fraction`.
|
:class:`float`, :class:`~decimal.Decimal` and :class:`~fractions.Fraction`.
|
||||||
Behaviour with other types (whether in the numeric tower or not) is
|
Behaviour with other types (whether in the numeric tower or not) is
|
||||||
currently unsupported. Collections with a mix of types are also undefined
|
currently unsupported. Collections with a mix of types are also undefined
|
||||||
and implementation-dependent. If your input data consists of mixed types,
|
and implementation-dependent. If your input data consists of mixed types,
|
||||||
you may be able to use :func:`map` to ensure a consistent result, for
|
you may be able to use :func:`map` to ensure a consistent result, for
|
||||||
example: ``map(float, input_data)``.
|
example: ``map(float, input_data)``.
|
||||||
|
|
||||||
Averages and measures of central location
|
Averages and measures of central location
|
||||||
-----------------------------------------
|
-----------------------------------------
|
||||||
|
@ -107,7 +111,7 @@ However, for reading convenience, most of the examples show sorted sequences.
|
||||||
:func:`median` and :func:`mode`.
|
:func:`median` and :func:`mode`.
|
||||||
|
|
||||||
The sample mean gives an unbiased estimate of the true population mean,
|
The sample mean gives an unbiased estimate of the true population mean,
|
||||||
which means that, taken on average over all the possible samples,
|
so that when taken on average over all the possible samples,
|
||||||
``mean(sample)`` converges on the true mean of the entire population. If
|
``mean(sample)`` converges on the true mean of the entire population. If
|
||||||
*data* represents the entire population rather than a sample, then
|
*data* represents the entire population rather than a sample, then
|
||||||
``mean(data)`` is equivalent to calculating the true population mean μ.
|
``mean(data)`` is equivalent to calculating the true population mean μ.
|
||||||
|
@ -163,8 +167,16 @@ However, for reading convenience, most of the examples show sorted sequences.
|
||||||
will be equivalent to ``3/(1/a + 1/b + 1/c)``.
|
will be equivalent to ``3/(1/a + 1/b + 1/c)``.
|
||||||
|
|
||||||
The harmonic mean is a type of average, a measure of the central
|
The harmonic mean is a type of average, a measure of the central
|
||||||
location of the data. It is often appropriate when averaging quantities
|
location of the data. It is often appropriate when averaging
|
||||||
which are rates or ratios, for example speeds. For example:
|
rates or ratios, for example speeds.
|
||||||
|
|
||||||
|
Suppose a car travels 10 km at 40 km/hr, then another 10 km at 60 km/hr.
|
||||||
|
What is the average speed?
|
||||||
|
|
||||||
|
.. doctest::
|
||||||
|
|
||||||
|
>>> harmonic_mean([40, 60])
|
||||||
|
48.0
|
||||||
|
|
||||||
Suppose an investor purchases an equal value of shares in each of
|
Suppose an investor purchases an equal value of shares in each of
|
||||||
three companies, with P/E (price/earning) ratios of 2.5, 3 and 10.
|
three companies, with P/E (price/earning) ratios of 2.5, 3 and 10.
|
||||||
|
@ -175,9 +187,6 @@ However, for reading convenience, most of the examples show sorted sequences.
|
||||||
>>> harmonic_mean([2.5, 3, 10]) # For an equal investment portfolio.
|
>>> harmonic_mean([2.5, 3, 10]) # For an equal investment portfolio.
|
||||||
3.6
|
3.6
|
||||||
|
|
||||||
Using the arithmetic mean would give an average of about 5.167, which
|
|
||||||
is well over the aggregate P/E ratio.
|
|
||||||
|
|
||||||
:exc:`StatisticsError` is raised if *data* is empty, or any element
|
:exc:`StatisticsError` is raised if *data* is empty, or any element
|
||||||
is less than zero.
|
is less than zero.
|
||||||
|
|
||||||
|
@ -190,9 +199,9 @@ However, for reading convenience, most of the examples show sorted sequences.
|
||||||
middle two" method. If *data* is empty, :exc:`StatisticsError` is raised.
|
middle two" method. If *data* is empty, :exc:`StatisticsError` is raised.
|
||||||
*data* can be a sequence or iterator.
|
*data* can be a sequence or iterator.
|
||||||
|
|
||||||
The median is a robust measure of central location, and is less affected by
|
The median is a robust measure of central location and is less affected by
|
||||||
the presence of outliers in your data. When the number of data points is
|
the presence of outliers. When the number of data points is odd, the
|
||||||
odd, the middle data point is returned:
|
middle data point is returned:
|
||||||
|
|
||||||
.. doctest::
|
.. doctest::
|
||||||
|
|
||||||
|
@ -210,13 +219,10 @@ However, for reading convenience, most of the examples show sorted sequences.
|
||||||
This is suited for when your data is discrete, and you don't mind that the
|
This is suited for when your data is discrete, and you don't mind that the
|
||||||
median may not be an actual data point.
|
median may not be an actual data point.
|
||||||
|
|
||||||
If your data is ordinal (supports order operations) but not numeric (doesn't
|
If the data is ordinal (supports order operations) but not numeric (doesn't
|
||||||
support addition), you should use :func:`median_low` or :func:`median_high`
|
support addition), consider using :func:`median_low` or :func:`median_high`
|
||||||
instead.
|
instead.
|
||||||
|
|
||||||
.. seealso:: :func:`median_low`, :func:`median_high`, :func:`median_grouped`
|
|
||||||
|
|
||||||
|
|
||||||
.. function:: median_low(data)
|
.. function:: median_low(data)
|
||||||
|
|
||||||
Return the low median of numeric data. If *data* is empty,
|
Return the low median of numeric data. If *data* is empty,
|
||||||
|
@ -319,7 +325,7 @@ However, for reading convenience, most of the examples show sorted sequences.
|
||||||
desired instead, use ``min(multimode(data))`` or ``max(multimode(data))``.
|
desired instead, use ``min(multimode(data))`` or ``max(multimode(data))``.
|
||||||
If the input *data* is empty, :exc:`StatisticsError` is raised.
|
If the input *data* is empty, :exc:`StatisticsError` is raised.
|
||||||
|
|
||||||
``mode`` assumes discrete data, and returns a single value. This is the
|
``mode`` assumes discrete data and returns a single value. This is the
|
||||||
standard treatment of the mode as commonly taught in schools:
|
standard treatment of the mode as commonly taught in schools:
|
||||||
|
|
||||||
.. doctest::
|
.. doctest::
|
||||||
|
@ -522,7 +528,7 @@ However, for reading convenience, most of the examples show sorted sequences.
|
||||||
cut-point will evaluate to ``104``.
|
cut-point will evaluate to ``104``.
|
||||||
|
|
||||||
The *method* for computing quantiles can be varied depending on
|
The *method* for computing quantiles can be varied depending on
|
||||||
whether the data in *data* includes or excludes the lowest and
|
whether the *data* includes or excludes the lowest and
|
||||||
highest possible values from the population.
|
highest possible values from the population.
|
||||||
|
|
||||||
The default *method* is "exclusive" and is used for data sampled from
|
The default *method* is "exclusive" and is used for data sampled from
|
||||||
|
|
Loading…
Reference in New Issue