Raymond Hettinger
120729d862
Minor code beautifications in statistics.py (gh-124866)
2024-10-01 15:55:36 -05:00
Raymond Hettinger
4b89c5ebfc
Improve accuracy of kde() invcdf estimates (gh-124637)
2024-09-27 09:56:37 -07:00
Serhiy Storchaka
1a0c7b9ba4
gh-121905: Consistently use "floating-point" instead of "floating point" (GH-121907)
2024-07-19 08:06:02 +00:00
Raymond Hettinger
e378dc15b5
Refactor (mostly rearrange) the statistics module (gh-119930)
2024-06-01 22:07:46 -05:00
Raymond Hettinger
ce2ea7d629
Minor speed/accuracy improvement for kde() (gh-119910)
2024-06-01 10:49:14 -05:00
Raymond Hettinger
cc5cd4d93e
statistics.fmean(): speed-up code path for non-sizeable inputs. (gh-119876)
2024-05-31 17:08:55 -05:00
Raymond Hettinger
5092ea238e
Fix negative bandwidth test and add online code path test. (gh-118600)
2024-05-05 12:29:23 -05:00
Raymond Hettinger
42dc5b4ace
gh-115532 Add kde_random() to the statistic module ( #118210 )
2024-05-03 23:13:36 -05:00
Raymond Hettinger
0823f43618
gh-115532: Minor tweaks to kde() (gh-117897)
2024-04-15 10:08:21 -05:00
Raymond Hettinger
a1e948edba
Add cumulative option for the new statistics.kde() function. ( #117033 )
2024-03-24 04:35:58 -05:00
Raymond Hettinger
0c7dc494f2
Minor kde() docstring nit: make presentation order match the function signature ( #116876 )
2024-03-15 14:02:10 -05:00
Raymond Hettinger
6d34eb0e36
gh-115532: Add kernel density estimation to the statistics module (gh-115863)
2024-02-25 17:46:47 -06:00
Raymond Hettinger
f3bff4ee9d
gh-112540: Support zero inputs in geometric_mean() (gh-112880)
2023-12-08 12:05:56 -06:00
Raymond Hettinger
62405c7867
gh-110150: Fix base case handling in quantiles() (gh-110151)
2023-09-30 23:35:54 -05:00
Raymond Hettinger
042aa88bcc
gh-108322: Optimize statistics.NormalDist.samples() (gh-108324)
2023-08-27 08:59:40 -05:00
Raymond Hettinger
52e0797f8e
Extend _sqrtprod() to cover the full range of inputs. Add tests. (GH-107855)
2023-08-11 11:19:19 -05:00
Raymond Hettinger
2fb484e625
Future-proof helper function with zero handling. (GH-107798)
2023-08-09 08:44:43 +01:00
Raymond Hettinger
d4ac094cf9
Minor accuracy improvement for statistics.correlation() (GH-107781)
2023-08-08 17:12:52 +01:00
Raymond Hettinger
457e4d1a51
GH-102670: Use sumprod() to simplify, speed up, and improve accuracy of statistics functions (GH-102649)
2023-03-13 20:06:43 -05:00
Raymond Hettinger
6cd7572f85
Optimize fmean() weighted average ( #102626 )
2023-03-12 12:48:25 -05:00
Nikita Sobolev
bef9efabc3
GH-99155: Fix `NormalDist` pickle with `0` and `1` protocols (GH99156)
2022-11-06 20:56:41 -06:00
Raymond Hettinger
3d180e3ab2
Improve accuracy for Spearman's rank correlation coefficient. ( #96392 )
2022-08-29 12:19:48 -05:00
Raymond Hettinger
d8d55d13fc
Prepare private _rank() function to be made public. ( #96372 )
2022-08-28 23:41:58 -05:00
Raymond Hettinger
29c8f80760
GH-95861: Add support for Spearman's rank correlation coefficient (GH-95863)
2022-08-18 13:48:27 -05:00
Raymond Hettinger
4395ff1e6a
Statistics inv_cdf sync with corresponding random module normal distributions ( #95265 )
2022-07-26 02:23:33 -05:00
Benjamin Peterson
e39ce7d487
Fix typo in _exact_ratio comment. (GH-94789)
2022-07-12 14:34:23 -07:00
Raymond Hettinger
c9118afd04
Small speed-up for NormalDist.samples (GH-94730)
2022-07-10 22:34:53 -05:00
Raymond Hettinger
e01eeb7b4b
Fix inconsistent return type for statistics median_grouped() gh-92531 ( #92533 )
2022-05-09 02:08:41 -05:00
Raymond Hettinger
5212cbc261
Clean-up and simplify median_grouped(). Vastly improve its docstring. ( #92324 )
2022-05-05 03:01:07 -05:00
Raymond Hettinger
d20bb33f78
Fix renamed "total" variable ( #92287 )
...
* Fix renamed "total" variable
* Keep nan/inf handling consistent between versions
2022-05-03 23:22:04 -05:00
Raymond Hettinger
9badc86fb7
Compute from_sample() in a single pass over the data ( #92284 )
2022-05-03 21:22:26 -05:00
Raymond Hettinger
ec8d3adb99
The stdev calculation is more accurate computing its own mean ( #92220 )
2022-05-03 03:41:46 -05:00
Raymond Hettinger
d5b7bba43b
Statistics internals: Make fewer calls to _coerce() when data types are mixed (GH-31619)
2022-02-28 11:43:52 -06:00
Raymond Hettinger
43aac29cbb
bpo-46257: Convert statistics._ss() to a single pass algorithm (GH-30403)
2022-01-05 09:39:10 -06:00
Ned Batchelder
c602c1be43
Fix double-space in exception message (GH-29955)
2021-12-08 12:42:02 +02:00
Raymond Hettinger
0aa0bd0563
bpo-45876: Have stdev() also use decimal specific square root. (GH-29869)
2021-11-30 19:25:57 -06:00
Raymond Hettinger
a39f46afde
bpo-45876: Correctly rounded stdev() and pstdev() for the Decimal case (GH-29828)
2021-11-30 18:20:08 -06:00
Raymond Hettinger
af9ee57b96
bpo-45876: Improve accuracy for stdev() and pstdev() in statistics (GH-29736)
...
* Inlined code from variance functions
* Added helper functions for the float square root of a fraction
* Call helper functions
* Add blurb
* Fix over-specified test
* Add a test for the _sqrt_frac() helper function
* Increase the tested range
* Add type hints to the internal function.
* Fix test for correct rounding
* Simplify ⌊√(n/m)⌋ calculation
Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
* Add comment and beef-up tests
* Test for zero denominator
* Add algorithmic references
* Add test for the _isqrt_frac_rto() helper function.
* Compute the 109 instead of hard-wiring it
* Stronger test for _isqrt_frac_rto()
* Bigger range
* Bigger range
* Replace float() call with int/int division to be parallel with the other code path.
* Factor out division. Update proof link. Remove internal type declaration
Co-authored-by: Mark Dickinson <dickinsm@gmail.com>
2021-11-26 22:54:50 -07:00
Raymond Hettinger
d2b55b07d2
bpo-45766: Add direct proportion option to linear_regression(). ( #29490 )
...
* bpo-45766: Add direct proportion option to linear_regression().
* Update 2021-11-09-09-18-06.bpo-45766.dvbcMf.rst
* Use ellipsis to avoid round-off issues.
* Update Misc/NEWS.d/next/Library/2021-11-09-09-18-06.bpo-45766.dvbcMf.rst
Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no>
* Update signature in main docs
* Fix missing comma
Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@innova.no>
2021-11-21 08:39:26 -06:00
Raymond Hettinger
04e03f496c
bpo-45851: Avoid full sort in statistics.multimode() ( #29662 )
...
Suggested by Stefan Pochmann.
2021-11-20 10:04:37 -06:00
Raymond Hettinger
c3bc0fe5a6
Factor-out constant calculation. (GH-29491)
2021-11-09 10:30:06 -06:00
Raymond Hettinger
4a5cccb02b
bpo-20499: Rounding error in statistics.pvariance (GH-28230)
2021-09-08 22:00:12 -05:00
Raymond Hettinger
793f55bde9
bpo-39218: Improve accuracy of variance calculation (GH-27960)
2021-08-30 20:57:30 -05:00
Raymond Hettinger
3668e118f7
Update nonstandard variable names (GH-26540)
2021-06-04 16:28:31 -07:00
Raymond Hettinger
2f2e703244
bpo-44151: Various grammar, word order, and markup fixes (GH-26344)
2021-05-24 23:04:04 -07:00
Zack Kneupper
2f3a87856c
bpo-44151: linear_regression() minor API improvements (GH-26199)
2021-05-24 17:30:58 -07:00
Raymond Hettinger
be4dd7fcd9
bpo-44150: Support optional weights parameter for fmean() (GH-26175)
2021-05-20 20:22:26 -07:00
Raymond Hettinger
b3f65e819f
Apply edits from Allen Downey's review of the linear_regression docs. (GH-26176)
2021-05-16 19:21:14 -07:00
Raymond Hettinger
fdfea4ab16
Improve speed and accuracy for correlation() (GH-26135)
2021-05-15 11:00:51 -07:00
Raymond Hettinger
55b78ce3c4
Eliminate duplicated calculations and unnecessary work for linear regression (GH-25922)
2021-05-06 07:43:13 -07:00