Commit Graph

11254 Commits

Author SHA1 Message Date
Greg Price 2f09413947 closes bpo-37966: Fully implement the UAX #15 quick-check algorithm. (GH-15558)
The purpose of the `unicodedata.is_normalized` function is to answer
the question `str == unicodedata.normalized(form, str)` more
efficiently than writing just that, by using the "quick check"
optimization described in the Unicode standard in UAX #15.

However, it turns out the code doesn't implement the full algorithm
from the standard, and as a result we often miss the optimization and
end up having to compute the whole normalized string after all.

Implement the standard's algorithm.  This greatly speeds up
`unicodedata.is_normalized` in many cases where our partial variant
of quick-check had been returning MAYBE and the standard algorithm
returns NO.

At a quick test on my desktop, the existing code takes about 4.4 ms/MB
(so 4.4 ns per byte) when the partial quick-check returns MAYBE and it
has to do the slow normalize-and-compare:

  $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
      -- 'unicodedata.is_normalized("NFD", s)'
  50 loops, best of 5: 4.39 msec per loop

With this patch, it gets the answer instantly (58 ns) on the same 1 MB
string:

  $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
      -- 'unicodedata.is_normalized("NFD", s)'
  5000000 loops, best of 5: 58.2 nsec per loop

This restores a small optimization that the original version of this
code had for the `unicodedata.normalize` use case.

With this, that case is actually faster than in master!

$ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
    -- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 561 usec per loop

$ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
    -- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 512 usec per loop
2019-09-03 19:45:44 -07:00
Steve Dower 993ac92418
bpo-38020: Fixes crash in os.readlink() on Windows (GH-15663) 2019-09-03 12:50:51 -07:00
Dong-hee Na 0cf832a9ef bpo-37798: Fix _statistics module doc (GH-15546) 2019-09-03 12:21:45 +03:00
Serhiy Storchaka 1f21eaa15e
bpo-15999: Clean up of handling boolean arguments. (GH-15610)
* Use the 'p' format unit instead of manually called PyObject_IsTrue().
* Pass boolean value instead 0/1 integers to functions that needs boolean.
* Convert some arguments to boolean only once.
2019-09-01 12:16:51 +03:00
Serhiy Storchaka 5eca7f3f38
bpo-15999: Always pass bool instead of int to socket.setblocking(). (GH-15621) 2019-09-01 12:12:52 +03:00
Serhiy Storchaka 41c57b3353
bpo-37994: Fix silencing all errors if an attribute lookup fails. (GH-15630)
Only AttributeError should be silenced.
2019-09-01 12:03:39 +03:00
Serhiy Storchaka f02ea6225b
bpo-36543: Remove old-deprecated ElementTree features. (GH-12707)
Remove methods Element.getchildren(), Element.getiterator() and
ElementTree.getiterator() and the xml.etree.cElementTree module.
2019-09-01 11:18:35 +03:00
Inada Naoki 013e52fd34
bpo-37990: fix gc stats (GH-15626) 2019-08-31 09:13:42 +09:00
Min ho Kim 39d87b5471 Fix typos mostly in comments, docs and test names (GH-15209) 2019-08-30 16:21:19 -04:00
Victor Stinner 96b4087ce7
bpo-37140: Fix StructUnionType_paramfunc() (GH-15612)
Fix a ctypes regression of Python 3.8. When a ctypes.Structure is
passed by copy to a function, ctypes internals created a temporary
object which had the side effect of calling the structure finalizer
(__del__) twice. The Python semantics requires a finalizer to be
called exactly once. Fix ctypes internals to no longer call the
finalizer twice.

Create a new internal StructParam_Type which is only used by
_ctypes_callproc() to call PyMem_Free(ptr) on Py_DECREF(argument).
StructUnionType_paramfunc() creates such object.
2019-08-30 14:30:33 +02:00
Sergey Fedoseev 6a650aaf77 bpo-37976: Prevent shadowing of TypeError in zip() (GH-15592) 2019-08-29 21:25:48 -07:00
Thomas A Caswell e278335a6e bpo-37933: Fix faulthandler.cancel_dump_traceback_later() (GH-15440)
Fix faulthandler.cancel_dump_traceback_later() call
if cancel_dump_traceback_later() was not called previously.
2019-08-29 18:30:04 +02:00
Rémi Lapeyre 4901fe274b bpo-37034: Display argument name on errors with keyword arguments with Argument Clinic. (GH-13593) 2019-08-29 17:49:08 +03:00
Joannah Nanjekye 2c5fb17118 bpo-36833: Add tests for Datetime C API Macros (GH-14842)
Added tests for PyDateTime_xxx_GET_xxx() macros of the C API of
the datetime module.
2019-08-29 14:54:46 +02:00
Justin Blanchard 122376df55 bpo-37372: Fix error unpickling datetime.time objects from Python 2 with seconds>=24. (GH-14307) 2019-08-29 10:36:15 +03:00
Serhiy Storchaka b235a1b473
bpo-37960: Silence only necessary errors in repr() of buffered and text streams. (GH-15543) 2019-08-29 09:25:22 +03:00
HongWeipeng fa220ec763 Raise a RuntimeError when tee iterator is consumed from different threads (GH-15567) 2019-08-28 20:39:25 -07:00
Vinay Sharma 13f37f2ba8 closes bpo-37964: add F_GETPATH command to fcntl (GH-15550)
https://bugs.python.org/issue37964



Automerge-Triggered-By: @benjaminp
2019-08-28 18:56:17 -07:00
Christian Heimes 98d90f745d
bpo-37951: Lift subprocess's fork() restriction (GH-15544) 2019-08-27 23:36:56 +02:00
vrajivk 8bf5fef873 bpo-36205: Fix the rusage implementation of time.process_time() (GH-15538) 2019-08-27 00:13:12 -04:00
Raymond Hettinger 6fee0f8ea7
bpo-37798: Minor code formatting and comment clean-ups. (GH-15526) 2019-08-26 11:25:58 -07:00
Inada Naoki b27cbec801 bpo-37055: fix warnings in _blake2 module (GH-14646)
https://bugs.python.org/issue37055



Automerge-Triggered-By: @tiran
2019-08-26 10:52:36 -07:00
Raymond Hettinger aef9ad82f7
bpo-37942: Improve argument clinic float converter (GH-15470) 2019-08-24 19:10:39 -07:00
Dong-hee Na 0a18ee4be7 bpo-37798: Add C fastpath for statistics.NormalDist.inv_cdf() (GH-15266) 2019-08-23 15:20:30 -07:00
Pablo Galindo 4be11c009a bpo-37915: Fix comparison between tzinfo objects and timezone objects (GH-15390)
https://bugs.python.org/issue37915



Automerge-Triggered-By: @pablogsal
2019-08-22 12:24:25 -07:00
Steve Dower df2d4a6f3d
bpo-37834: Normalise handling of reparse points on Windows (GH-15231)
bpo-37834: Normalise handling of reparse points on Windows
* ntpath.realpath() and nt.stat() will traverse all supported reparse points (previously was mixed)
* nt.lstat() will let the OS traverse reparse points that are not name surrogates (previously would not traverse any reparse point)
* nt.[l]stat() will only set S_IFLNK for symlinks (previous behaviour)
* nt.readlink() will read destinations for symlinks and junction points only

bpo-1311: os.path.exists('nul') now returns True on Windows
* nt.stat('nul').st_mode is now S_IFCHR (previously was an error)
2019-08-21 15:27:33 -07:00
Stefan Krah bcc446f525
Revert mode change that loses information in directory listings on Linux. (#15366) 2019-08-21 23:00:04 +02:00
Victor Stinner d8c5adf6f8
bpo-37851: faulthandler allocates its stack on demand (GH-15358)
The faulthandler module no longer allocates its alternative stack at
Python startup. Now the stack is only allocated at the first
faulthandler usage.

faulthandler no longer ignores memory allocation failure when
allocating the stack. sigaltstack() failure now raises an OSError
exception, rather than being ignored.

The alternative stack is no longer used if sigaction() is
not available. In practice, sigaltstack() should only be available
when sigaction() is avaialble, so this change should have no effect
in practice.

faulthandler.dump_traceback_later() internal locks are now only
allocated at the first dump_traceback_later() call, rather than
always being allocated at Python startup.
2019-08-21 13:40:42 +01:00
Greg Price 9ece4a5057 Unmark files as executable that can't actually be executed. (GH-15353)
There are plenty of legitimate scripts in the tree that begin with a
`#!`, but also a few that seem to be marked executable by mistake.

Found them with this command -- it gets executable files known to Git,
filters to the ones that don't start with a `#!`, and then unmarks
them as executable:

    $ git ls-files --stage \
      | perl -lane 'print $F[3] if (!/^100644/)' \
      | while read f; do
          head -c2 "$f" | grep -qxF '#!' \
          || chmod a-x "$f"; \
        done

Looking at the list by hand confirms that we didn't sweep up any
files that should have the executable bit after all.  In particular

 * The `.psd` files are images from Photoshop.

 * The `.bat` files sure look like things that can be run.
   But we have lots of other `.bat` files, and they don't have
   this bit set, so it must not be needed for them.



Automerge-Triggered-By: @benjaminp
2019-08-20 21:53:59 -07:00
Brett Cannon 1407038e0b
Remove a dead comment from ossaudiodev.c (#15346) 2019-08-20 12:20:47 -07:00
Joannah Nanjekye 9e66aba999 bpo-15913: Implement PyBuffer_SizeFromFormat() (GH-13873)
Implement PyBuffer_SizeFromFormat() function (previously
documented but not implemented): call struct.calcsize().
2019-08-20 15:46:36 +01:00
Alex Gaynor 40dad9545a Replace usage of the obscure PEM_read_bio_X509_AUX with the more standard PEM_read_bio_X509 (GH-15303)
X509_AUX is an odd, note widely used, OpenSSL extension to the X509 file format. This function doesn't actually use any of the extra metadata that it parses, so just use the standard API.

Automerge-Triggered-By: @tiran
2019-08-15 05:31:28 -07:00
Victor Stinner ac827edc49
bpo-21131: Fix faulthandler.register(chain=True) stack (GH-15276)
faulthandler now allocates a dedicated stack of SIGSTKSZ*2 bytes,
instead of just SIGSTKSZ bytes. Calling the previous signal handler
in faulthandler signal handler uses more than SIGSTKSZ bytes of stack
memory on some platforms.
2019-08-14 23:35:27 +02:00
Artem Khramov 2814620657 bpo-37811: FreeBSD, OSX: fix poll(2) usage in sockets module (GH-15202)
FreeBSD implementation of poll(2) restricts the timeout argument to be
either zero, or positive, or equal to INFTIM (-1).

Unless otherwise overridden, socket timeout defaults to -1. This value
is then converted to milliseconds (-1000) and used as argument to the
poll syscall. poll returns EINVAL (22), and the connection fails.

This bug was discovered during the EINTR handling testing, and the
reproduction code can be found in
https://bugs.python.org/issue23618 (see connect_eintr.py,
attached). On GNU/Linux, the example runs as expected.

This change is trivial:
If the supplied timeout value is negative, truncate it to -1.
2019-08-14 23:21:48 +02:00
Victor Stinner 077af8c2c9
bpo-37738: Fix curses addch(str, color_pair) (GH-15071)
Fix the implementation of curses addch(str, color_pair): pass the
color pair to setcchar(), instead of always passing 0 as the color
pair.
2019-08-14 12:31:43 +02:00
Greg Price 51aac15f6d Delete leftover clinic-generated file for C zipimport. (GH-15174) 2019-08-10 10:20:27 +03:00
Ngalim Siregar 92c7e30adf bpo-37642: Update acceptable offsets in timezone (GH-14878)
This fixes an inconsistency between the Python and C implementations of
the datetime module. The pure python version of the code was not
accepting offsets greater than 23:59 but less than 24:00. This is an
accidental legacy of the original implementation, which was put in place
before tzinfo allowed sub-minute time zone offsets.

GH-14878
2019-08-09 10:22:16 -04:00
Inada Naoki 2a570af12a
bpo-37587: optimize json.loads (GH-15134)
Use a tighter scope temporary variable to help register allocation.
1% speedup for large string.

Use PyDict_SetItemDefault() for memoizing keys.
At most 4% speedup when the cache hit ratio is low.
2019-08-08 17:57:10 +09:00
Sergey Fedoseev 3e41f3cabb bpo-34488: optimize BytesIO.writelines() (GH-8904)
Avoid the creation of unused int object for each line.
2019-08-07 09:38:31 +09:00
Serhiy Storchaka 18b711c5a7
bpo-37648: Fixed minor inconsistency in some __contains__. (GH-14904)
The collection's item is now always at the left and
the needle is on the right of ==.
2019-08-04 14:12:48 +03:00
Serhiy Storchaka 17e52649c0
bpo-37685: Fixed comparisons of datetime.timedelta and datetime.timezone. (GH-14996)
There was a discrepancy between the Python and C implementations.

Add singletons ALWAYS_EQ, LARGEST and SMALLEST in test.support
to test mixed type comparison.
2019-08-04 12:38:46 +03:00
Greg Bowser 8fbece135d bpo-36590: Add Bluetooth RFCOMM and support for Windows. (GH-12767)
Support for RFCOMM, L2CAP, HCI, SCO is based on the BTPROTO_* macros
being defined. Winsock only supports RFCOMM, even though it has a
BTHPROTO_L2CAP macro. L2CAP support would build on windows, but not
necessarily work.

This also adds some basic unittests for constants (all of which existed
prior to this commit, just not on windows) and creating sockets.

pair: Nate Duarte <slacknate@gmail.com>
2019-08-02 13:29:52 -07:00
Inada Naoki bf8162c8c4
bpo-37729: gc: write stats at once (GH-15050)
gc used several PySys_WriteStderr() calls to write stats.
It caused stats mixed up when stderr is shared by multiple
processes like this:

  gc: collecting generation 2...
  gc: objects in each generation: 0 0gc: collecting generation 2...
  gc: objects in each generation: 0 0 126077 126077
  gc: objects in permanent generation: 0

  gc: objects in permanent generation: 0
  gc: done, 112575 unreachable, 0 uncollectablegc: done, 112575 unreachable, 0 uncollectable, 0.2223s elapsed
  , 0.2344s elapsed
2019-08-02 16:25:29 +09:00
Anthony Sottile c9345e382c bpo-37695: Correct unget_wch error message. (GH-14986) 2019-07-31 15:11:24 +03:00
karl ding 31c4fd2a10 bpo-37085: Expose SocketCAN bcm_msg_head flags (#13646)
Expose the CAN_BCM SocketCAN constants used in the bcm_msg_head struct
flags (provided by <linux/can/bcm.h>) under the socket library.

This adds the following constants with a CAN_BCM prefix:

  * SETTIMER
  * STARTTIMER
  * TX_COUNTEVT
  * TX_ANNOUNCE
  * TX_CP_CAN_ID
  * RX_FILTER_ID
  * RX_CHECK_DLC
  * RX_NO_AUTOTIMER
  * RX_ANNOUNCE_RESUME
  * TX_RESET_MULTI_IDX
  * RX_RTR_FRAME
  * CAN_FD_FRAME

The CAN_FD_FRAME flag was introduced in the 4.8 kernel, while the other
ones were present since SocketCAN drivers were mainlined in 2.6.25. As
such, it is probably unnecessary to guard against these constants being
missing.
2019-07-31 10:47:16 +02:00
Min ho Kim c4cacc8c5e Fix typos in comments, docs and test names (#15018)
* Fix typos in comments, docs and test names

* Update test_pyparse.py

account for change in string length

* Apply suggestion: splitable -> splittable

Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu>

* Apply suggestion: splitable -> splittable

Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu>

* Apply suggestion: Dealloccte -> Deallocate

Co-Authored-By: Terry Jan Reedy <tjreedy@udel.edu>

* Update posixmodule checksum.

* Reverse idlelib changes.
2019-07-30 18:16:13 -04:00
Marco Paolini 8a758f5b99 bpo-37587: Make json.loads faster for long strings (GH-14752)
When scanning the string, most characters are valid, so
checking for invalid characters first means never needing
to check the value of strict on valid strings, and only
needing to check it on invalid characters when doing
non-strict parsing of invalid strings.

This provides a measurable reduction in per-character
processing time (~11% in the pre-merge patch testing).
2019-07-31 00:16:34 +10:00
Pablo Galindo 9211e2fd81 bpo-37268: Add deprecation notice and a DeprecationWarning for the parser module (GH-15017)
Deprecate the parser module and add a deprecation warning triggered on import and a warning block in the documentation.





https://bugs.python.org/issue37268



Automerge-Triggered-By: @pablogsal
2019-07-30 04:04:01 -07:00
Raymond Hettinger 6b5f1b496f
bpo-37691: Let math.dist() accept sequences and iterables for coordinates (GH-14975) 2019-07-27 14:04:29 -07:00
Markus Mohrhard 898318b53d bpo-37502: handle default parameter for buffers argument of pickle.loads correctly (GH-14593) 2019-07-25 18:00:34 +02:00