Commit Graph

111 Commits

Author SHA1 Message Date
Raymond Hettinger 120729d862
Minor code beautifications in statistics.py (gh-124866) 2024-10-01 15:55:36 -05:00
Raymond Hettinger 4b89c5ebfc
Improve accuracy of kde() invcdf estimates (gh-124637) 2024-09-27 09:56:37 -07:00
Serhiy Storchaka 1a0c7b9ba4
gh-121905: Consistently use "floating-point" instead of "floating point" (GH-121907) 2024-07-19 08:06:02 +00:00
Raymond Hettinger e378dc15b5
Refactor (mostly rearrange) the statistics module (gh-119930) 2024-06-01 22:07:46 -05:00
Raymond Hettinger ce2ea7d629
Minor speed/accuracy improvement for kde() (gh-119910) 2024-06-01 10:49:14 -05:00
Raymond Hettinger cc5cd4d93e
statistics.fmean(): speed-up code path for non-sizeable inputs. (gh-119876) 2024-05-31 17:08:55 -05:00
Raymond Hettinger 5092ea238e
Fix negative bandwidth test and add online code path test. (gh-118600) 2024-05-05 12:29:23 -05:00
Raymond Hettinger 42dc5b4ace
gh-115532 Add kde_random() to the statistic module (#118210) 2024-05-03 23:13:36 -05:00
Raymond Hettinger 0823f43618
gh-115532: Minor tweaks to kde() (gh-117897) 2024-04-15 10:08:21 -05:00
Raymond Hettinger a1e948edba
Add cumulative option for the new statistics.kde() function. (#117033) 2024-03-24 04:35:58 -05:00
Raymond Hettinger 0c7dc494f2
Minor kde() docstring nit: make presentation order match the function signature (#116876) 2024-03-15 14:02:10 -05:00
Raymond Hettinger 6d34eb0e36
gh-115532: Add kernel density estimation to the statistics module (gh-115863) 2024-02-25 17:46:47 -06:00
Raymond Hettinger f3bff4ee9d
gh-112540: Support zero inputs in geometric_mean() (gh-112880) 2023-12-08 12:05:56 -06:00
Raymond Hettinger 62405c7867
gh-110150: Fix base case handling in quantiles() (gh-110151) 2023-09-30 23:35:54 -05:00
Raymond Hettinger 042aa88bcc
gh-108322: Optimize statistics.NormalDist.samples() (gh-108324) 2023-08-27 08:59:40 -05:00
Raymond Hettinger 52e0797f8e
Extend _sqrtprod() to cover the full range of inputs. Add tests. (GH-107855) 2023-08-11 11:19:19 -05:00
Raymond Hettinger 2fb484e625
Future-proof helper function with zero handling. (GH-107798) 2023-08-09 08:44:43 +01:00
Raymond Hettinger d4ac094cf9
Minor accuracy improvement for statistics.correlation() (GH-107781) 2023-08-08 17:12:52 +01:00
Raymond Hettinger 457e4d1a51
GH-102670: Use sumprod() to simplify, speed up, and improve accuracy of statistics functions (GH-102649) 2023-03-13 20:06:43 -05:00
Raymond Hettinger 6cd7572f85
Optimize fmean() weighted average (#102626) 2023-03-12 12:48:25 -05:00
Nikita Sobolev bef9efabc3
GH-99155: Fix `NormalDist` pickle with `0` and `1` protocols (GH99156) 2022-11-06 20:56:41 -06:00
Raymond Hettinger 3d180e3ab2
Improve accuracy for Spearman's rank correlation coefficient. (#96392) 2022-08-29 12:19:48 -05:00
Raymond Hettinger d8d55d13fc
Prepare private _rank() function to be made public. (#96372) 2022-08-28 23:41:58 -05:00
Raymond Hettinger 29c8f80760
GH-95861: Add support for Spearman's rank correlation coefficient (GH-95863) 2022-08-18 13:48:27 -05:00
Raymond Hettinger 4395ff1e6a
Statistics inv_cdf sync with corresponding random module normal distributions (#95265) 2022-07-26 02:23:33 -05:00
Benjamin Peterson e39ce7d487
Fix typo in _exact_ratio comment. (GH-94789) 2022-07-12 14:34:23 -07:00
Raymond Hettinger c9118afd04
Small speed-up for NormalDist.samples (GH-94730) 2022-07-10 22:34:53 -05:00
Raymond Hettinger e01eeb7b4b
Fix inconsistent return type for statistics median_grouped() gh-92531 (#92533) 2022-05-09 02:08:41 -05:00
Raymond Hettinger 5212cbc261
Clean-up and simplify median_grouped(). Vastly improve its docstring. (#92324) 2022-05-05 03:01:07 -05:00
Raymond Hettinger d20bb33f78
Fix renamed "total" variable (#92287)
* Fix renamed "total" variable
* Keep nan/inf handling consistent between versions
2022-05-03 23:22:04 -05:00
Raymond Hettinger 9badc86fb7
Compute from_sample() in a single pass over the data (#92284) 2022-05-03 21:22:26 -05:00
Raymond Hettinger ec8d3adb99
The stdev calculation is more accurate computing its own mean (#92220) 2022-05-03 03:41:46 -05:00
Raymond Hettinger d5b7bba43b
Statistics internals: Make fewer calls to _coerce() when data types are mixed (GH-31619) 2022-02-28 11:43:52 -06:00
Raymond Hettinger 43aac29cbb
bpo-46257: Convert statistics._ss() to a single pass algorithm (GH-30403) 2022-01-05 09:39:10 -06:00
Ned Batchelder c602c1be43
Fix double-space in exception message (GH-29955) 2021-12-08 12:42:02 +02:00
Raymond Hettinger 0aa0bd0563
bpo-45876: Have stdev() also use decimal specific square root. (GH-29869) 2021-11-30 19:25:57 -06:00
Raymond Hettinger a39f46afde
bpo-45876: Correctly rounded stdev() and pstdev() for the Decimal case (GH-29828) 2021-11-30 18:20:08 -06:00
Raymond Hettinger af9ee57b96
bpo-45876: Improve accuracy for stdev() and pstdev() in statistics (GH-29736)
* Inlined code from variance functions

* Added helper functions for the float square root of a fraction

* Call helper functions

* Add blurb

* Fix over-specified test

* Add a test for the _sqrt_frac() helper function

* Increase the tested range

* Add type hints to the internal function.

* Fix test for correct rounding

* Simplify ⌊√(n/m)⌋ calculation

Co-authored-by: Mark Dickinson <dickinsm@gmail.com>

* Add comment and beef-up tests

* Test for zero denominator

* Add algorithmic references

* Add test for the _isqrt_frac_rto() helper function.

* Compute the 109 instead of hard-wiring it

* Stronger test for _isqrt_frac_rto()

* Bigger range

* Bigger range

* Replace float() call with int/int division to be parallel with the other code path.

* Factor out division. Update proof link. Remove internal type declaration

Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
2021-11-26 22:54:50 -07:00
Raymond Hettinger d2b55b07d2
bpo-45766: Add direct proportion option to linear_regression(). (#29490)
* bpo-45766: Add direct proportion option to linear_regression().

* Update 2021-11-09-09-18-06.bpo-45766.dvbcMf.rst

* Use ellipsis to avoid round-off issues.

* Update Misc/NEWS.d/next/Library/2021-11-09-09-18-06.bpo-45766.dvbcMf.rst

Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no>

* Update signature in main docs

* Fix missing comma

Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no>
2021-11-21 08:39:26 -06:00
Raymond Hettinger 04e03f496c
bpo-45851: Avoid full sort in statistics.multimode() (#29662)
Suggested by Stefan Pochmann.
2021-11-20 10:04:37 -06:00
Raymond Hettinger c3bc0fe5a6
Factor-out constant calculation. (GH-29491) 2021-11-09 10:30:06 -06:00
Raymond Hettinger 4a5cccb02b
bpo-20499: Rounding error in statistics.pvariance (GH-28230) 2021-09-08 22:00:12 -05:00
Raymond Hettinger 793f55bde9
bpo-39218: Improve accuracy of variance calculation (GH-27960) 2021-08-30 20:57:30 -05:00
Raymond Hettinger 3668e118f7
Update nonstandard variable names (GH-26540) 2021-06-04 16:28:31 -07:00
Raymond Hettinger 2f2e703244
bpo-44151: Various grammar, word order, and markup fixes (GH-26344) 2021-05-24 23:04:04 -07:00
Zack Kneupper 2f3a87856c
bpo-44151: linear_regression() minor API improvements (GH-26199) 2021-05-24 17:30:58 -07:00
Raymond Hettinger be4dd7fcd9
bpo-44150: Support optional weights parameter for fmean() (GH-26175) 2021-05-20 20:22:26 -07:00
Raymond Hettinger b3f65e819f
Apply edits from Allen Downey's review of the linear_regression docs. (GH-26176) 2021-05-16 19:21:14 -07:00
Raymond Hettinger fdfea4ab16
Improve speed and accuracy for correlation() (GH-26135) 2021-05-15 11:00:51 -07:00
Raymond Hettinger 55b78ce3c4
Eliminate duplicated calculations and unnecessary work for linear regression (GH-25922) 2021-05-06 07:43:13 -07:00