The purpose of the `unicodedata.is_normalized` function is to answer
the question `str == unicodedata.normalized(form, str)` more
efficiently than writing just that, by using the "quick check"
optimization described in the Unicode standard in UAX GH-15.
However, it turns out the code doesn't implement the full algorithm
from the standard, and as a result we often miss the optimization and
end up having to compute the whole normalized string after all.
Implement the standard's algorithm. This greatly speeds up
`unicodedata.is_normalized` in many cases where our partial variant
of quick-check had been returning MAYBE and the standard algorithm
returns NO.
At a quick test on my desktop, the existing code takes about 4.4 ms/MB
(so 4.4 ns per byte) when the partial quick-check returns MAYBE and it
has to do the slow normalize-and-compare:
$ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
-- 'unicodedata.is_normalized("NFD", s)'
50 loops, best of 5: 4.39 msec per loop
With this patch, it gets the answer instantly (58 ns) on the same 1 MB
string:
$ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
-- 'unicodedata.is_normalized("NFD", s)'
5000000 loops, best of 5: 58.2 nsec per loop
This restores a small optimization that the original version of this
code had for the `unicodedata.normalize` use case.
With this, that case is actually faster than in master!
$ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
-- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 561 usec per loop
$ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
-- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 512 usec per loop
(cherry picked from commit 2f09413947)
Co-authored-by: Greg Price <gnprice@gmail.com>
bpo-37834: Normalise handling of reparse points on Windows
* ntpath.realpath() and nt.stat() will traverse all supported reparse points (previously was mixed)
* nt.lstat() will let the OS traverse reparse points that are not name surrogates (previously would not traverse any reparse point)
* nt.[l]stat() will only set S_IFLNK for symlinks (previous behaviour)
* nt.readlink() will read destinations for symlinks and junction points only
bpo-1311: os.path.exists('nul') now returns True on Windows
* nt.stat('nul').st_mode is now S_IFCHR (previously was an error)
* bpo-32912: Revert warnings for invalid escape sequences.
DeprecationWarning will continue to be emitted for invalid escape sequences in string and bytes literals in 3.8 just as it did in 3.7.
SyntaxWarning may be emitted in the future. But per mailing list discussion, we don't yet know when because we haven't settled on how to do so in a non-disruptive manner.
* bpo-33821: Update IDLE section of What's New 3.7
* Fix roles.
(cherry picked from commit 5982b7201b)
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
The distutils bdist_wininst command is now deprecated, use
bdist_wheel (wheel packages) instead.
(cherry picked from commit 1da4462765)
Co-authored-by: Victor Stinner <vstinner@redhat.com>
Add PyCode_NewEx to be used internally and set PyCode_New as a compatibility wrapper
(cherry picked from commit 4a2edc34a4)
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
The os.getcwdb() function now uses the UTF-8 encoding on Windows,
rather than the ANSI code page: see PEP 529 for the rationale. The
function is no longer deprecated on Windows.
os.getcwd() and os.getcwdb() now detect integer overflow on memory
allocations. On Unix, these functions properly report MemoryError on
memory allocation failure.
(cherry picked from commit 689830ee62)
Co-authored-by: Victor Stinner <vstinner@redhat.com>
* Mention issue in which ByByteArray_Init() has been removed.
* Fix typo
(cherry picked from commit af41c567af)
Co-authored-by: Victor Stinner <vstinner@redhat.com>
I didn't find any entries in the docs about these functions, so I just mentioned them, in "What's New".
(cherry picked from commit 47c2de7725)
Co-authored-by: Ivan Levkivskyi <levkivskyi@gmail.com>
https://bugs.python.org/issue33416
* Update PyCompilerFlags structure documentation.
* Document the new cf_feature_version field in the Changes in the C
API section of the What's New in Python 3.8 doc.
(cherry picked from commit 2c9b498759)
(A single int is still allowed, but undocumented.)
https://bugs.python.org/issue35766
(cherry picked from commit 10b55c1643)
Co-authored-by: Guido van Rossum <guido@python.org>
It is now allowed to add new fields at the end of the PyTypeObject struct without having to allocate a dedicated compatibility flag in tp_flags.
This will reduce the risk of running out of bits in the 32-bit tp_flags value.
* bpo-26836: Add os.memfd_create()
* Use the glibc wrapper for memfd_create()
Co-Authored-By: Christian Heimes <christian@python.org>
* Fix deletions caused by autoreconf.
* Use MFD_CLOEXEC as the default value for *flags*.
* Add memset_s to configure.ac.
* Revert memset_s changes.
* Apply the requested changes.
* Tweak the docs.