Commit Graph

97 Commits

Author SHA1 Message Date
Raymond Hettinger 042aa88bcc
gh-108322: Optimize statistics.NormalDist.samples() (gh-108324) 2023-08-27 08:59:40 -05:00
Raymond Hettinger 52e0797f8e
Extend _sqrtprod() to cover the full range of inputs. Add tests. (GH-107855) 2023-08-11 11:19:19 -05:00
Raymond Hettinger 2fb484e625
Future-proof helper function with zero handling. (GH-107798) 2023-08-09 08:44:43 +01:00
Raymond Hettinger d4ac094cf9
Minor accuracy improvement for statistics.correlation() (GH-107781) 2023-08-08 17:12:52 +01:00
Raymond Hettinger 457e4d1a51
GH-102670: Use sumprod() to simplify, speed up, and improve accuracy of statistics functions (GH-102649) 2023-03-13 20:06:43 -05:00
Raymond Hettinger 6cd7572f85
Optimize fmean() weighted average (#102626) 2023-03-12 12:48:25 -05:00
Nikita Sobolev bef9efabc3
GH-99155: Fix `NormalDist` pickle with `0` and `1` protocols (GH99156) 2022-11-06 20:56:41 -06:00
Raymond Hettinger 3d180e3ab2
Improve accuracy for Spearman's rank correlation coefficient. (#96392) 2022-08-29 12:19:48 -05:00
Raymond Hettinger d8d55d13fc
Prepare private _rank() function to be made public. (#96372) 2022-08-28 23:41:58 -05:00
Raymond Hettinger 29c8f80760
GH-95861: Add support for Spearman's rank correlation coefficient (GH-95863) 2022-08-18 13:48:27 -05:00
Raymond Hettinger 4395ff1e6a
Statistics inv_cdf sync with corresponding random module normal distributions (#95265) 2022-07-26 02:23:33 -05:00
Benjamin Peterson e39ce7d487
Fix typo in _exact_ratio comment. (GH-94789) 2022-07-12 14:34:23 -07:00
Raymond Hettinger c9118afd04
Small speed-up for NormalDist.samples (GH-94730) 2022-07-10 22:34:53 -05:00
Raymond Hettinger e01eeb7b4b
Fix inconsistent return type for statistics median_grouped() gh-92531 (#92533) 2022-05-09 02:08:41 -05:00
Raymond Hettinger 5212cbc261
Clean-up and simplify median_grouped(). Vastly improve its docstring. (#92324) 2022-05-05 03:01:07 -05:00
Raymond Hettinger d20bb33f78
Fix renamed "total" variable (#92287)
* Fix renamed "total" variable
* Keep nan/inf handling consistent between versions
2022-05-03 23:22:04 -05:00
Raymond Hettinger 9badc86fb7
Compute from_sample() in a single pass over the data (#92284) 2022-05-03 21:22:26 -05:00
Raymond Hettinger ec8d3adb99
The stdev calculation is more accurate computing its own mean (#92220) 2022-05-03 03:41:46 -05:00
Raymond Hettinger d5b7bba43b
Statistics internals: Make fewer calls to _coerce() when data types are mixed (GH-31619) 2022-02-28 11:43:52 -06:00
Raymond Hettinger 43aac29cbb
bpo-46257: Convert statistics._ss() to a single pass algorithm (GH-30403) 2022-01-05 09:39:10 -06:00
Ned Batchelder c602c1be43
Fix double-space in exception message (GH-29955) 2021-12-08 12:42:02 +02:00
Raymond Hettinger 0aa0bd0563
bpo-45876: Have stdev() also use decimal specific square root. (GH-29869) 2021-11-30 19:25:57 -06:00
Raymond Hettinger a39f46afde
bpo-45876: Correctly rounded stdev() and pstdev() for the Decimal case (GH-29828) 2021-11-30 18:20:08 -06:00
Raymond Hettinger af9ee57b96
bpo-45876: Improve accuracy for stdev() and pstdev() in statistics (GH-29736)
* Inlined code from variance functions

* Added helper functions for the float square root of a fraction

* Call helper functions

* Add blurb

* Fix over-specified test

* Add a test for the _sqrt_frac() helper function

* Increase the tested range

* Add type hints to the internal function.

* Fix test for correct rounding

* Simplify ⌊√(n/m)⌋ calculation

Co-authored-by: Mark Dickinson <dickinsm@gmail.com>

* Add comment and beef-up tests

* Test for zero denominator

* Add algorithmic references

* Add test for the _isqrt_frac_rto() helper function.

* Compute the 109 instead of hard-wiring it

* Stronger test for _isqrt_frac_rto()

* Bigger range

* Bigger range

* Replace float() call with int/int division to be parallel with the other code path.

* Factor out division. Update proof link. Remove internal type declaration

Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
2021-11-26 22:54:50 -07:00
Raymond Hettinger d2b55b07d2
bpo-45766: Add direct proportion option to linear_regression(). (#29490)
* bpo-45766: Add direct proportion option to linear_regression().

* Update 2021-11-09-09-18-06.bpo-45766.dvbcMf.rst

* Use ellipsis to avoid round-off issues.

* Update Misc/NEWS.d/next/Library/2021-11-09-09-18-06.bpo-45766.dvbcMf.rst

Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no>

* Update signature in main docs

* Fix missing comma

Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no>
2021-11-21 08:39:26 -06:00
Raymond Hettinger 04e03f496c
bpo-45851: Avoid full sort in statistics.multimode() (#29662)
Suggested by Stefan Pochmann.
2021-11-20 10:04:37 -06:00
Raymond Hettinger c3bc0fe5a6
Factor-out constant calculation. (GH-29491) 2021-11-09 10:30:06 -06:00
Raymond Hettinger 4a5cccb02b
bpo-20499: Rounding error in statistics.pvariance (GH-28230) 2021-09-08 22:00:12 -05:00
Raymond Hettinger 793f55bde9
bpo-39218: Improve accuracy of variance calculation (GH-27960) 2021-08-30 20:57:30 -05:00
Raymond Hettinger 3668e118f7
Update nonstandard variable names (GH-26540) 2021-06-04 16:28:31 -07:00
Raymond Hettinger 2f2e703244
bpo-44151: Various grammar, word order, and markup fixes (GH-26344) 2021-05-24 23:04:04 -07:00
Zack Kneupper 2f3a87856c
bpo-44151: linear_regression() minor API improvements (GH-26199) 2021-05-24 17:30:58 -07:00
Raymond Hettinger be4dd7fcd9
bpo-44150: Support optional weights parameter for fmean() (GH-26175) 2021-05-20 20:22:26 -07:00
Raymond Hettinger b3f65e819f
Apply edits from Allen Downey's review of the linear_regression docs. (GH-26176) 2021-05-16 19:21:14 -07:00
Raymond Hettinger fdfea4ab16
Improve speed and accuracy for correlation() (GH-26135) 2021-05-15 11:00:51 -07:00
Raymond Hettinger 55b78ce3c4
Eliminate duplicated calculations and unnecessary work for linear regression (GH-25922) 2021-05-06 07:43:13 -07:00
Raymond Hettinger 1add719516
Fix inconsistent fsum vs sum and fmean vs mean (GH-25898) 2021-05-04 11:27:28 -07:00
Tymoteusz Wołodźko 09aa6f914d
bpo-38490: statistics: Add covariance, Pearson's correlation, and simple linear regression (#16813)
Co-authored-by: Tymoteusz Wołodźko <twolodzko+gitkraken@gmail.com
2021-04-25 14:45:09 +03:00
Raymond Hettinger 30a8b28396
bpo-43147: Remove archaic terminology. (GH-24462) 2021-02-07 16:44:42 -08:00
Raymond Hettinger cc3467a57b
bpo-38308: Add optional weighting to statistics.harmonic_mean() (GH-23914) 2020-12-23 19:52:09 -08:00
Raymond Hettinger 5aad027db9
Some reformatting (suggested by Black) and minor factoring. (GH-20865) 2020-06-13 19:17:28 -07:00
Raymond Hettinger d71ab4f738
bpo-40855: Fix ignored mu and xbar parameters (GH-20835) 2020-06-13 15:55:52 -07:00
Raymond Hettinger 0400a7f2f8
Minor code cleanups for statistics (GH-19873)
* Minor cleanups:  Removed unused code.  Move C import near its Python version.

* Clean-up whitespace
2020-05-02 19:30:24 -07:00
Raymond Hettinger 70f027dd22
bpo-40290: Add zscore() to statistics.NormalDist. (GH-19547) 2020-04-16 10:25:14 -07:00
Raymond Hettinger 733b9a308e
bpo-38385: Fix iterator/iterable terminology in statistics docs (GH-17111) 2019-11-11 23:35:06 -08:00
Raymond Hettinger 5eabec022b
bpo-38521: Fix error in NormalDist.__eq__() (GH-16840) 2019-10-18 14:20:35 -07:00
Raymond Hettinger 7ce4bfa8cf
Minor code and comment cleanup (GH-16315) 2019-09-20 21:46:52 -07:00
Raymond Hettinger 272d0d017a
bpo-36546: No longer a need to make "data" positional only (GH-16252) 2019-09-17 20:45:05 -07:00
Raymond Hettinger 4db25d5c39
bpo-36018: Address more reviewer feedback (GH-15733) 2019-09-08 16:57:58 -07:00
Raymond Hettinger e4810b2a6c
bpo-36324: Apply review comments from Allen Downey (GH-15693) 2019-09-05 00:18:47 -07:00