The purpose of the `unicodedata.is_normalized` function is to answer
the question `str == unicodedata.normalized(form, str)` more
efficiently than writing just that, by using the "quick check"
optimization described in the Unicode standard in UAX GH-15.
However, it turns out the code doesn't implement the full algorithm
from the standard, and as a result we often miss the optimization and
end up having to compute the whole normalized string after all.
Implement the standard's algorithm. This greatly speeds up
`unicodedata.is_normalized` in many cases where our partial variant
of quick-check had been returning MAYBE and the standard algorithm
returns NO.
At a quick test on my desktop, the existing code takes about 4.4 ms/MB
(so 4.4 ns per byte) when the partial quick-check returns MAYBE and it
has to do the slow normalize-and-compare:
$ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
-- 'unicodedata.is_normalized("NFD", s)'
50 loops, best of 5: 4.39 msec per loop
With this patch, it gets the answer instantly (58 ns) on the same 1 MB
string:
$ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
-- 'unicodedata.is_normalized("NFD", s)'
5000000 loops, best of 5: 58.2 nsec per loop
This restores a small optimization that the original version of this
code had for the `unicodedata.normalize` use case.
With this, that case is actually faster than in master!
$ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
-- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 561 usec per loop
$ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
-- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 512 usec per loop
(cherry picked from commit 2f09413947)
Co-authored-by: Greg Price <gnprice@gmail.com>
* Fix suspicious.py to actually print the unused rules
* Fix the other `self.warn` calls
(cherry picked from commit e1786b5416)
Co-authored-by: Anthony Sottile <asottile@umich.edu>
Adds a link to `dateutil.parser.isoparse` in the documentation.
It would be nice to set up intersphinx for things like this, but I think we can leave that for a separate PR.
CC: @pitrou
[bpo-37979](https://bugs.python.org/issue37979)
https://bugs.python.org/issue37979
Automerge-Triggered-By: @pitrou
(cherry picked from commit 59725f3bad)
Co-authored-by: Paul Ganssle <paul@ganssle.io>
- drop TargetScopeError in favour of raising SyntaxError directly
as per the updated PEP 572
- comprehension iteration variables are explicitly local, but
named expression targets in comprehensions are nonlocal or
global. Raise SyntaxError as specified in PEP 572
- named expression targets in the outermost iterable of a
comprehension have an ambiguous target scope. Avoid resolving
that question now by raising SyntaxError. PEP 572
originally required this only for cases where the bound name
conflicts with the iteration variable in the comprehension,
but CPython can't easily restrict the exception to that case
(as it doesn't know the target variable names when visiting
the outermost iterator expression)
(cherry picked from commit 5dbe0f59b7)
"Arguments may be integers... " could be misunderstand as they also
could be strings.
New wording makes it clear that arguments have to be integers.
modified: Doc/library/datetime.rst
Automerge-Triggered-By: @pganssle
(cherry picked from commit c5218fce02)
Co-authored-by: Jürgen Gmach <juergen.gmach@googlemail.com>
Automerge-Triggered-By: @pganssle
Fix typo in description of link to mozilla bug report writing guidelines.
Though the URL is misleading, we're indeed trying to write bug _reports_, not to add bugs.
Automerge-Triggered-By: @ned-deily
(cherry picked from commit e17f201cd9)
Co-authored-by: Antoine <43954001+awecx@users.noreply.github.com>
bpo-37834: Normalise handling of reparse points on Windows
* ntpath.realpath() and nt.stat() will traverse all supported reparse points (previously was mixed)
* nt.lstat() will let the OS traverse reparse points that are not name surrogates (previously would not traverse any reparse point)
* nt.[l]stat() will only set S_IFLNK for symlinks (previous behaviour)
* nt.readlink() will read destinations for symlinks and junction points only
bpo-1311: os.path.exists('nul') now returns True on Windows
* nt.stat('nul').st_mode is now S_IFCHR (previously was an error)
Added back mention that ensure_future actually scheduled obj. This documentation just mentions what ensure_future returns, so I did not realize that ensure_future also schedules obj.
(cherry picked from commit 092911d5c0)
Co-authored-by: Roger Iyengar <ri@rogeriyengar.com>
Fixed wrong link to Telnet.open() method in telnetlib documentation.
(cherry picked from commit e0b6117e27)
Co-authored-by: Michael Anckaert <michael.anckaert@sinax.be>
The documented definition was much broader than the real one:
there are tons of characters with general category "Other",
and we don't (and shouldn't) treat most of them as whitespace.
Rewrite the definition to agree with the comment on
_PyUnicode_IsWhitespace, and with the logic in makeunicodedata.py,
which is what generates that function and so ultimately governs.
Add suitable breadcrumbs so that a reader who wants to pin down
exactly what this definition means (what's a "bidirectional class"
of "B"?) can do so. The `unicodedata` module documentation is an
appropriate central place for our references to Unicode's own copious
documentation, so point there.
Also add to the isspace() test a thorough check that the
implementation agrees with the intended definition.
Because mod, func, class, etc all share one namespace, :func:time creates a link to the time module doc page rather than the time.time function.
(cherry picked from commit 1b1d0514ad)
Co-authored-by: Éric Araujo <merwok@netwok.org>
Automerge-Triggered-By: @merwok
https://bugs.python.org/issue37814:
> The empty tuple syntax in type annotations, `Tuple[()]`, is not obvious from the examples given in the documentation (I naively expected `Tuple[]` to work); it has been documented in PEP 484 and in mypy, but not in the documentation for the typing module.
https://bugs.python.org/issue37814
(cherry picked from commit 8a784af750)
Co-authored-by: Josh Holland <anowlcalledjosh@gmail.com>
* bpo-32912: Revert warnings for invalid escape sequences.
DeprecationWarning will continue to be emitted for invalid escape sequences in string and bytes literals in 3.8 just as it did in 3.7.
SyntaxWarning may be emitted in the future. But per mailing list discussion, we don't yet know when because we haven't settled on how to do so in a non-disruptive manner.
* add a missing ``.. availability::`` reST explicit markup;
* more consistent "see man page" sentences.
(cherry picked from commit cfebfef2de)
Co-authored-by: Géry Ogam <gery.ogam@gmail.com>
* Remove suggestion that is less relevant now that global lookups are much faster
* Add link for installing the recipes
(cherry picked from commit adf02b36b3)
Co-authored-by: Raymond Hettinger <rhettinger@users.noreply.github.com>
There was a discrepancy between the Python and C implementations.
Add singletons ALWAYS_EQ, LARGEST and SMALLEST in test.support
to test mixed type comparison.
(cherry picked from commit 17e52649c0)
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Expose the CAN_BCM SocketCAN constants used in the bcm_msg_head struct
flags (provided by <linux/can/bcm.h>) under the socket library.
This adds the following constants with a CAN_BCM prefix:
* SETTIMER
* STARTTIMER
* TX_COUNTEVT
* TX_ANNOUNCE
* TX_CP_CAN_ID
* RX_FILTER_ID
* RX_CHECK_DLC
* RX_NO_AUTOTIMER
* RX_ANNOUNCE_RESUME
* TX_RESET_MULTI_IDX
* RX_RTR_FRAME
* CAN_FD_FRAME
The CAN_FD_FRAME flag was introduced in the 4.8 kernel, while the other
ones were present since SocketCAN drivers were mainlined in 2.6.25. As
such, it is probably unnecessary to guard against these constants being
missing.
(cherry picked from commit 31c4fd2a10)
Co-authored-by: karl ding <karlding@users.noreply.github.com>
* bpo-33821: Update IDLE section of What's New 3.7
* Fix roles.
(cherry picked from commit 5982b7201b)
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Prior to this change the guard on an 'elif' used an assignment expression whose value was used in a later 'else' block, causing some confusion for people.
(Discussion on Twitter: https://twitter.com/brettsky/status/1153861041068994566.)
Automerge-Triggered-By: @brettcannon
(cherry picked from commit 544fa15ea1)
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
* Fix the formatting in the documentation of the tostring() functions.
* bpo-34160: Document that the tostring() and tostringlist() functions also preserve the attribute order now.
* bpo-34160: Add an explanation of how users should deal with the attribute order.
(cherry picked from commit a3697db010)
Co-authored-by: Stefan Behnel <stefan_ml@behnel.de>
Move the Editors and IDE section out of the Unix section, to its own section.
https://bugs.python.org/issue37610
(cherry picked from commit 8f040b7a9f)
Co-authored-by: aldwinaldwin <aldwinaldwin@users.noreply.github.com>
Add a brief note to indicate that any new required attributes must go through the PEP process.
https://bugs.python.org/issue37284
(cherry picked from commit 52693c10e8)
Co-authored-by: Giovanni Cappellotto <gcappellotto@fb.com>
The `allow_abbrev` option for ArgumentParser is documented and intended to disable support for unique prefixes of --options, which may sometimes be ambiguous due to deferred parsing.
However, the initial implementation also broke parsing of grouped short flags, such as `-ab` meaning `-a -b` (or `-a=b`). Checking the argument for a leading `--` before rejecting it fixes this.
This was prompted by pytest-dev/pytestGH-5469, so a backport to at least 3.8 would be great 😄
And this is my first PR to CPython, so please let me know if I've missed anything!
https://bugs.python.org/issue26967
(cherry picked from commit dffca9e925)
Co-authored-by: Zac Hatfield-Dodds <Zac-HD@users.noreply.github.com>
Hi,
I've faced an issue w/ `mailbox.Maildir()`. The case is following:
1. I create a folder with `tempfile.TemporaryDirectory()`, so it's empty
2. I pass that folder path as an argument when instantiating `mailbox.Maildir()`
3. Then I receive an exception happening because "there's no such file or directory" (namely `cur`, `tmp` or `new`) during interaction with Maildir
**Expected result:** subdirs are created during `Maildir()` instance creation.
**Actual result:** subdirs are assumed as existing which leads to exceptions during use.
**Workaround:** remove the actual dir before passing the path to `Maildir()`. It will be created automatically with all subdirs needed.
**Fix:** This PR. Basically it adds creation of subdirs regardless of whether the base dir existed before.
https://bugs.python.org/issue30088
(cherry picked from commit e44184749c)
Co-authored-by: Sviatoslav Sydorenko <wk@sydorenko.org.ua>
Fix importlib examples to insert any newly created modules via importlib.util.module_from_spec() immediately into sys.modules instead of after calling loader.exec_module().
Thanks to Benjamin Mintz for finding the bug.
https://bugs.python.org/issue37521
(cherry picked from commit 0827064c95)
Co-authored-by: Brett Cannon <54418+brettcannon@users.noreply.github.com>
https://bugs.python.org/issue37521
This is done to compensate for the extra stack frames added by
IDLE itself, which cause problems when setting the recursion limit
to low values.
This wraps sys.setrecursionlimit() and sys.getrecursionlimit()
as invisibly as possible.
(cherry picked from commit fcf1d003bf)
Co-authored-by: Tal Einat <taleinat+github@gmail.com>
The distutils bdist_wininst command is now deprecated, use
bdist_wheel (wheel packages) instead.
(cherry picked from commit 1da4462765)
Co-authored-by: Victor Stinner <vstinner@redhat.com>
bdist_wininst depends on MBCS codec, unavailable on non-Windows,
and bdist_wininst have not worked since at least Python 3.2, possibly
never on Python 3.
Here we document that bdist_wininst is only supported on Windows,
and we mark it unsupported otherwise to skip tests.
Distributors of Python 3 can now safely drop the bdist_wininst .exe files
without the need to skip bdist_wininst related tests.
(cherry picked from commit 72cd653c4e)
Co-authored-by: Miro Hrončok <miro@hroncok.cz>
Add PyCode_NewEx to be used internally and set PyCode_New as a compatibility wrapper
(cherry picked from commit 4a2edc34a4)
Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
* Added documentation for textwrap.dedent behavior.
* Remove an obsolete note about pre-2.5 behavior from the docstring.
(cherry picked from commit eb97b9211e)
Co-authored-by: tmblweed <tmblweed@users.noreply.github.com>
Add a versionadded for PS Core and note that `.venv` is a common virtual environment name.
(cherry picked from commit f9f8e3ce70)
Co-authored-by: Brett Cannon <54418+brettcannon@users.noreply.github.com>
Also updates some (unreleased) event names to be consistent with the others.
(cherry picked from commit 44f91c388a)
Co-authored-by: Steve Dower <steve.dower@python.org>
The os.getcwdb() function now uses the UTF-8 encoding on Windows,
rather than the ANSI code page: see PEP 529 for the rationale. The
function is no longer deprecated on Windows.
os.getcwd() and os.getcwdb() now detect integer overflow on memory
allocations. On Unix, these functions properly report MemoryError on
memory allocation failure.
(cherry picked from commit 689830ee62)
Co-authored-by: Victor Stinner <vstinner@redhat.com>
When the Windows default event loop changed, `asyncio-policy.rst` was updated but `asyncio-eventloop.rst` was missed.
(cherry picked from commit 9ffca670ed)
Co-authored-by: Ben Darnell <ben@bendarnell.com>
… as proposed in PEP 572; key is now evaluated before value.
https://bugs.python.org/issue35224
(cherry picked from commit c8a35417db)
Co-authored-by: Jörn Heissler <joernheissler@users.noreply.github.com>
* Mention issue in which ByByteArray_Init() has been removed.
* Fix typo
(cherry picked from commit af41c567af)
Co-authored-by: Victor Stinner <vstinner@redhat.com>
Add a missing single quote character in the documentation for `io.TextIOWrapper.reconfigure`.
(cherry picked from commit 35068bd059)
Co-authored-by: Harmon <Harmon758@gmail.com>
I didn't find any entries in the docs about these functions, so I just mentioned them, in "What's New".
(cherry picked from commit 47c2de7725)
Co-authored-by: Ivan Levkivskyi <levkivskyi@gmail.com>
https://bugs.python.org/issue33416
For datetime.datetime.strptime(), the leading zero for some two-digit formats is optional.
This adds a footnote to the strftime/strptime documentation to reflect this fact, and adds some tests to ensure that it is true.
bpo-34903
(cherry picked from commit 6b9c204ee7)
Co-authored-by: Mike Gleen <mike.gleen@gmail.com>
The initialize options are 1) add command line options, which are appended to sys.argv as if passed on a real command line, and 2) skip the shell restart. The customization dialog is accessed by a new entry on the Run menu.
(cherry picked from commit 201bc2d18b)
Co-authored-by: Cheryl Sabella <cheryl.sabella@gmail.com>
Measure required height by quickly maximizing once per screen.
A search for a better method failed.
(cherry picked from commit 5bff3c86ab)
Co-authored-by: Tal Einat <taleinat+github@gmail.com>
Document reference cycle and resurrected objects issues in
sys.unraisablehook() and threading.excepthook() documentation.
Fix test.support.catch_unraisable_exception(): __exit__() no longer
ignores unraisable exceptions.
Fix test_io test_writer_close_error_on_close(): use a second
catch_unraisable_exception() to catch the BufferedWriter unraisable
exception.
(cherry picked from commit 212646cae6)
Co-authored-by: Victor Stinner <vstinner@redhat.com>
This PR adds missing details in the [`concurrent.futures`](https://docs.python.org/3/library/concurrent.futures.html) documentation:
* the mention that `Future.cancel` also returns `False` if the call finished running;
* the mention of the states for `Future` that did not complete: pending or running.
(cherry picked from commit 431478d5d7)
Co-authored-by: Géry Ogam <gery.ogam@gmail.com>
It would raise ValueError("Paths don't have the same drive") if the paths on different drivers, which is not documented.
os.path.commonpath raises ValueError when the *paths* are in different drivers, but it is not documented.
Update the document according @Windsooon 's suggestion.
It actually raise ValueError according line 355 of [test of path](https://github.com/python/cpython/blob/master/Lib/test/test_ntpath.py)
https://bugs.python.org/issue6689
(cherry picked from commit 95492032c4)
Co-authored-by: Makdon <makdon@makdon.me>
The __exit__() method of test.support.catch_unraisable_exception
context manager now ignores unraisable exception raised when clearing
self.unraisable attribute.
(cherry picked from commit 6d22cc8e90)
Co-authored-by: Victor Stinner <vstinner@redhat.com>
* Update PyCompilerFlags structure documentation.
* Document the new cf_feature_version field in the Changes in the C
API section of the What's New in Python 3.8 doc.
(cherry picked from commit 2c9b498759)
Python 3.6 changed the size of bytecode instruction, while the documentation for `EXTENDED_ARG` was not updated accordingly.
(cherry picked from commit 405f648db7)
Co-authored-by: Yao Zuo <laike9m@users.noreply.github.com>
(A single int is still allowed, but undocumented.)
https://bugs.python.org/issue35766
(cherry picked from commit 10b55c1643)
Co-authored-by: Guido van Rossum <guido@python.org>
Based on the source code 4a686504eb/Lib/multiprocessing/pool.pyGH-L755 AsyncResult.successful() raises a ValueError, not an AssertionError.
(cherry picked from commit d4cf099dff)
Co-authored-by: Benjamin Yeh <bentyeh@users.noreply.github.com>
* bpo-35805: Add parser for Message-ID header.
This parser is based on the definition of Identification Fields from RFC 5322
Sec 3.6.4.
This should also prevent folding of Message-ID header using RFC 2047 encoded
words and hence fix bpo-35805.
* Prevent folding of non-ascii message-id headers.
* Add fold method to MsgID token to prevent folding.
* Improve example on tzinfo instances
Move from GMTX to TZX when naming the classes, as GMT1 might be rather
confusing as seen in the reported issue.
In addition, move to UTC over GMT and improve the tzname implementation.
* Simplify datetime with tzinfo example
Move the example in the documentation to just use timezone.utc and a
user defined Kabul timezone rather than having two user defined
timezones with DST.
Kabul timezone is still interesting as it changes its offset but not
based on DST. This is more accurate as the previous example was missing
information about the fold attribute. Additionally, implementing the fold
attribute was rather complex and probably not relevant enough for the
section "datetime with tzinfo".
Add BaseEventLoop.wait_executor_on_close attribute: true by default.
loop.close() now waits for the default executor to finish by default.
Set loop.wait_executor_on_close attribute to False to not wait for
the executor.
* bpo-19184: Update the documentation of dis module
* Explain the behavior of the number of arguments of RAISE_VARGARGS
opcode.
* bpo-19184: Update blurb.
* bpo-19184: Fix typo in the dis Documentation.
* bpo-19184: Address review comments and improve the doc
* bpo-19184: Remove news file.
* bpo-37014: Update docstring and Documentation of fileinput.FileInput()
* Explain the behavior of fileinput.FileInput() when reading stdin.
* Update blurb.
* bpo-37014: Fix typo in the docstring and documentation.
Adds a new option in trace that allows tracing runnable modules. It is
exposed as `--module module_name` as `-m` is already in use for another
argument.
* Add deprecated-remove information on stream doc
According to the code on streams.py the functions:
``open_connection()``, ``start_server()``, ``open_unix_connection()``,
``start_unix_server()`` are deprecated. I infor that on
documentation.
The ssl module now can dump key material to a keylog file and trace TLS
protocol messages with a tracing callback. The default and stdlib
contexts also support SSLKEYLOGFILE env var.
The msg_callback and related enums are private members. The feature
is designed for internal debugging and not for end users.
Signed-off-by: Christian Heimes <christian@python.org>
This is an old feature request that appears from time to time. After a year of experimenting with various introspection capabilities in `typing_inspect` on PyPI, I propose to add these two most commonly used functions: `get_origin()` and `get_args()`. These are essentially thin public wrappers around private APIs: `__origin__` and `__args__`.
As discussed in the issue and on the typing tracker, exposing some public helpers instead of `__origin__` and `__args__` directly will give us more flexibility if we will decide to update the internal representation, while still maintaining backwards compatibility.
The implementation is very simple an is essentially a copy from `typing_inspect` with one exception: `ClassVar` was special-cased in `typing_inspect`, but I think this special-casing doesn't really help and only makes things more complicated.
Bump the removal to 3.9, indicate collections.abc available since 3.3,
replace version-changed directive to deprecated-removed.
https://bugs.python.org/issue36953
It is now allowed to add new fields at the end of the PyTypeObject struct without having to allocate a dedicated compatibility flag in tp_flags.
This will reduce the risk of running out of bits in the 32-bit tp_flags value.
* bpo-26836: Add os.memfd_create()
* Use the glibc wrapper for memfd_create()
Co-Authored-By: Christian Heimes <christian@python.org>
* Fix deletions caused by autoreconf.
* Use MFD_CLOEXEC as the default value for *flags*.
* Add memset_s to configure.ac.
* Revert memset_s changes.
* Apply the requested changes.
* Tweak the docs.
* bpo-22385: Support output separators in hex methods.
Also in binascii.hexlify aka b2a_hex.
The underlying implementation behind all hex generation in CPython uses the
same pystrhex.c implementation. This adds support to bytes, bytearray,
and memoryview objects.
The binascii module functions exist rather than being slated for deprecation
because they return bytes rather than requiring an intermediate step through a
str object.
This change was inspired by MicroPython which supports sep in its binascii
implementation (and does not yet support the .hex methods).
https://bugs.python.org/issue22385
* Fix the implicit string concatenation in `assert_has_awaits` error message.
* Use "await" instead of "call" in `assert_awaited_with` error message.
https://bugs.python.org/issue37075
_thread.start_new_thread() now logs uncaught exception raised by the
function using sys.unraisablehook(), rather than sys.excepthook(), so
the hook gets access to the function which raised the exception.
* bpo-36540: Documentation for PEP570 - Python positional only arguments
* fixup! bpo-36540: Documentation for PEP570 - Python positional only arguments
* Update reference for compound statements
* Apply suggestions from Carol
Co-Authored-By: Carol Willing <carolcode@willingconsulting.com>
* Update Doc/tutorial/controlflow.rst
Co-Authored-By: Carol Willing <carolcode@willingconsulting.com>
* Add extra bullet point and minor edits