Commit Graph

74 Commits

Author SHA1 Message Date
Raymond Hettinger 6d34eb0e36
gh-115532: Add kernel density estimation to the statistics module (gh-115863) 2024-02-25 17:46:47 -06:00
Raymond Hettinger f3bff4ee9d
gh-112540: Support zero inputs in geometric_mean() (gh-112880) 2023-12-08 12:05:56 -06:00
Raymond Hettinger 62405c7867
gh-110150: Fix base case handling in quantiles() (gh-110151) 2023-09-30 23:35:54 -05:00
Serhiy Storchaka b9831e5c98
Use unittest test runner for doctests in test_statistics (GH-108921) 2023-09-07 23:08:55 +03:00
Serhiy Storchaka f3ba0a74cd
gh-108416: Mark slow test methods with @requires_resource('cpu') (GH-108421)
Only mark tests which spend significant system or user time,
by itself or in subprocesses.
2023-09-02 07:45:34 +03:00
Raymond Hettinger 52e0797f8e
Extend _sqrtprod() to cover the full range of inputs. Add tests. (GH-107855) 2023-08-11 11:19:19 -05:00
Raymond Hettinger 457e4d1a51
GH-102670: Use sumprod() to simplify, speed up, and improve accuracy of statistics functions (GH-102649) 2023-03-13 20:06:43 -05:00
Nikita Sobolev bef9efabc3
GH-99155: Fix `NormalDist` pickle with `0` and `1` protocols (GH99156) 2022-11-06 20:56:41 -06:00
Raymond Hettinger 29c8f80760
GH-95861: Add support for Spearman's rank correlation coefficient (GH-95863) 2022-08-18 13:48:27 -05:00
Raymond Hettinger 4395ff1e6a
Statistics inv_cdf sync with corresponding random module normal distributions (#95265) 2022-07-26 02:23:33 -05:00
Raymond Hettinger e01eeb7b4b
Fix inconsistent return type for statistics median_grouped() gh-92531 (#92533) 2022-05-09 02:08:41 -05:00
Raymond Hettinger 5212cbc261
Clean-up and simplify median_grouped(). Vastly improve its docstring. (#92324) 2022-05-05 03:01:07 -05:00
Raymond Hettinger a39f46afde
bpo-45876: Correctly rounded stdev() and pstdev() for the Decimal case (GH-29828) 2021-11-30 18:20:08 -06:00
Raymond Hettinger af9ee57b96
bpo-45876: Improve accuracy for stdev() and pstdev() in statistics (GH-29736)
* Inlined code from variance functions

* Added helper functions for the float square root of a fraction

* Call helper functions

* Add blurb

* Fix over-specified test

* Add a test for the _sqrt_frac() helper function

* Increase the tested range

* Add type hints to the internal function.

* Fix test for correct rounding

* Simplify ⌊√(n/m)⌋ calculation

Co-authored-by: Mark Dickinson <dickinsm@gmail.com>

* Add comment and beef-up tests

* Test for zero denominator

* Add algorithmic references

* Add test for the _isqrt_frac_rto() helper function.

* Compute the 109 instead of hard-wiring it

* Stronger test for _isqrt_frac_rto()

* Bigger range

* Bigger range

* Replace float() call with int/int division to be parallel with the other code path.

* Factor out division. Update proof link. Remove internal type declaration

Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
2021-11-26 22:54:50 -07:00
Raymond Hettinger d2b55b07d2
bpo-45766: Add direct proportion option to linear_regression(). (#29490)
* bpo-45766: Add direct proportion option to linear_regression().

* Update 2021-11-09-09-18-06.bpo-45766.dvbcMf.rst

* Use ellipsis to avoid round-off issues.

* Update Misc/NEWS.d/next/Library/2021-11-09-09-18-06.bpo-45766.dvbcMf.rst

Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no>

* Update signature in main docs

* Fix missing comma

Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no>
2021-11-21 08:39:26 -06:00
Raymond Hettinger 48744db70e
bpo-45852: Fix the Counter/iter test for statistics.mode() (GH-29667)
Suggested by Stefan Pochmann.
2021-11-20 11:01:09 -06:00
Raymond Hettinger 4a5cccb02b
bpo-20499: Rounding error in statistics.pvariance (GH-28230) 2021-09-08 22:00:12 -05:00
Raymond Hettinger 793f55bde9
bpo-39218: Improve accuracy of variance calculation (GH-27960) 2021-08-30 20:57:30 -05:00
Irit Katriel f5d7a8d29c
bpo-44960: add regression test for geometric_mean with mixed int/floa… (#27856)
Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
2021-08-20 14:08:21 +01:00
Zack Kneupper 2f3a87856c
bpo-44151: linear_regression() minor API improvements (GH-26199) 2021-05-24 17:30:58 -07:00
Raymond Hettinger be4dd7fcd9
bpo-44150: Support optional weights parameter for fmean() (GH-26175) 2021-05-20 20:22:26 -07:00
Tymoteusz Wołodźko 09aa6f914d
bpo-38490: statistics: Add covariance, Pearson's correlation, and simple linear regression (#16813)
Co-authored-by: Tymoteusz Wołodźko <twolodzko+gitkraken@gmail.com
2021-04-25 14:45:09 +03:00
Raymond Hettinger cc3467a57b
bpo-38308: Add optional weighting to statistics.harmonic_mean() (GH-23914) 2020-12-23 19:52:09 -08:00
Hai Shi 79bb2c93f2
bpo-40275: Use new test.support helper submodules in tests (GH-21743) 2020-08-06 13:51:29 +02:00
Raymond Hettinger d71ab4f738
bpo-40855: Fix ignored mu and xbar parameters (GH-20835) 2020-06-13 15:55:52 -07:00
Tzanetos Balitsaris b809717c1e
bpo-40331: Increase test coverage for the statistics module (GH-19608) 2020-05-13 13:29:31 +03:00
Raymond Hettinger 70f027dd22
bpo-40290: Add zscore() to statistics.NormalDist. (GH-19547) 2020-04-16 10:25:14 -07:00
Tim Gates c18b805ac6 bpo-39002: Fix simple typo: tranlation -> translation (GH-17517) 2019-12-09 09:42:17 -08:00
Raymond Hettinger 5eabec022b
bpo-38521: Fix error in NormalDist.__eq__() (GH-16840) 2019-10-18 14:20:35 -07:00
Raymond Hettinger 4db25d5c39
bpo-36018: Address more reviewer feedback (GH-15733) 2019-09-08 16:57:58 -07:00
Min ho Kim 39d87b5471 Fix typos mostly in comments, docs and test names (GH-15209) 2019-08-30 16:21:19 -04:00
Dong-hee Na 8ad22a4226 bpo-37798: Test both Python and C versions in test_statistics.py (GH-15453) 2019-08-24 10:51:20 -07:00
Neil Schemenauer 52a48e62c6
bpo-37707: Exclude expensive unit tests from PGO task (GH-15009)
Mark some individual tests to skip when --pgo is used.  The tests
marked increase the PGO task time significantly and likely don't
help improve optimization of the final executable.
2019-07-30 11:08:18 -07:00
Raymond Hettinger 02c91f59b6
bpo-36324: Make internal attributes for statistics.NormalDist() private. (GH-14871)
* Make internals private

* Finish making mu and sigma private

* Add missing __hash__() method

* Add blurb
2019-07-21 00:34:47 -07:00
Raymond Hettinger e917f2ed9a
bpo-36546: Add more tests and expand docs (#13406) 2019-05-18 10:18:29 -07:00
Xtreak 874ad1b3b4 Fix typo: quaatile to quantile (GH=13001) 2019-05-02 14:20:58 -04:00
Raymond Hettinger b0a2c0fa83
bpo-36018: Test idempotence. Test two methods against one-another. (GH-13021) 2019-04-29 23:47:33 -07:00
Raymond Hettinger db81ba1393
bpo-36546: More tests: type preservation and equal inputs (#13000) 2019-04-28 21:31:55 -07:00
Raymond Hettinger 9013ccf6d8
bpo-36546: Add statistics.quantiles() (#12710) 2019-04-23 00:06:35 -07:00
Raymond Hettinger 6463ba3061
bpo-27181: Add statistics.geometric_mean() (GH-12638) 2019-04-07 09:20:03 -07:00
Raymond Hettinger d1e768a677
bpo-36326: Let inspect.getdoc() find docstrings for __slots__ (GH-12498) 2019-03-25 13:01:13 -07:00
Raymond Hettinger 2afb598618 bpo-36324: NormalDist() add more tests and update comments (GH-12476)
* Improve coverage.
* Note inherent limitations of the accuracy tests


https://bugs.python.org/issue36324
2019-03-20 13:28:59 -07:00
Raymond Hettinger 714c60d7ac
bpo-36324: Add inv_cdf() to statistics.NormalDist() (GH-12377) 2019-03-18 20:17:14 -07:00
Raymond Hettinger fc06a192fd
bpo-35892: Fix mode() and add multimode() (#12089) 2019-03-12 00:43:27 -07:00
Raymond Hettinger 1f58f4fa6a Refine statistics.NormalDist documentation and improve test coverage (GH-12208) 2019-03-06 23:23:55 -08:00
Raymond Hettinger 318d537daa
bpo-36169 : Add overlap() method to statistics.NormalDist (GH-12149) 2019-03-06 22:59:40 -08:00
Raymond Hettinger 18ee50d5da Add more tests for pdf() and cdf() (GH-12190) 2019-03-06 02:31:14 -08:00
Raymond Hettinger ef17fdbc1c bpo-36018: Add special value tests and make minor tweaks to the docs (GH-12096)
https://bugs.python.org/issue36018
2019-02-28 09:16:25 -08:00
Raymond Hettinger 9e456bc70e bpo-36018: Add properties for mean and stdev (GH-12022)
Responding to suggestions on the tracker and some off-line suggestions.

Davin suggested that english named accessors instead of greek letters would result in more intelligible user code. Steven suggested that the parameters still need to be *mu* and *theta* which are used elsewhere (and I noted those parameter names are used in linked-to resources). 

Michael suggested proving-out the API by seeing whether it generalized to *Lognormal*.  I did so and found that Lognormal distribution parameters *mu* and *sigma*  do not represent the mean and standard deviation of the lognormal distribution (instead, they are for the underlying regular normal distribution).

Putting these ideas together, we have NormalDist parameterized by *mu* and *sigma* but offering English named properties for accessors.  That gives lets us match other API that access mu and sigma, it matches the external resources on the topic, gives us clear english names in user code. The API extends nicely to LogNormal where the parameters and the summary statistic accessors are not the same.


https://bugs.python.org/issue36018
2019-02-24 11:44:55 -08:00
Raymond Hettinger 79fbcc597d bpo-36018: Make __pos__ return a distinct instance of NormDist (GH-12009)
https://bugs.python.org/issue36018
2019-02-23 22:19:01 -08:00