cpython/Misc
Greg Price 2f09413947 closes bpo-37966: Fully implement the UAX #15 quick-check algorithm. (GH-15558)
The purpose of the `unicodedata.is_normalized` function is to answer
the question `str == unicodedata.normalized(form, str)` more
efficiently than writing just that, by using the "quick check"
optimization described in the Unicode standard in UAX #15.

However, it turns out the code doesn't implement the full algorithm
from the standard, and as a result we often miss the optimization and
end up having to compute the whole normalized string after all.

Implement the standard's algorithm.  This greatly speeds up
`unicodedata.is_normalized` in many cases where our partial variant
of quick-check had been returning MAYBE and the standard algorithm
returns NO.

At a quick test on my desktop, the existing code takes about 4.4 ms/MB
(so 4.4 ns per byte) when the partial quick-check returns MAYBE and it
has to do the slow normalize-and-compare:

  $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
      -- 'unicodedata.is_normalized("NFD", s)'
  50 loops, best of 5: 4.39 msec per loop

With this patch, it gets the answer instantly (58 ns) on the same 1 MB
string:

  $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
      -- 'unicodedata.is_normalized("NFD", s)'
  5000000 loops, best of 5: 58.2 nsec per loop

This restores a small optimization that the original version of this
code had for the `unicodedata.normalize` use case.

With this, that case is actually faster than in master!

$ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
    -- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 561 usec per loop

$ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
    -- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 512 usec per loop
2019-09-03 19:45:44 -07:00
..
NEWS.d closes bpo-37966: Fully implement the UAX #15 quick-check algorithm. (GH-15558) 2019-09-03 19:45:44 -07:00
ACKS bpo-37764: Fix infinite loop when parsing unstructured email headers. (GH-15239) 2019-08-31 08:25:35 -07:00
HISTORY Fix typos mostly in comments, docs and test names (GH-15209) 2019-08-30 16:21:19 -04:00
Porting bpo-30737: Update DevGuide links to new URL (GH-3228) 2017-08-30 09:37:43 -07:00
README bpo-32159: Revert Misc/svnmap.txt (#4639) 2017-11-29 18:58:33 +01:00
README.AIX Replace KB unit with KiB (#4293) 2017-11-08 14:44:44 -08:00
README.coverity
README.valgrind bpo-18859: Document --with-valgrind option in README.valgrind (#10591) 2018-11-20 12:10:49 +01:00
SpecialBuilds.txt bpo-36722: Style and grammar edits for ABI news entries (GH-12979) 2019-04-27 20:14:35 +02:00
coverity_model.c bpo-35808: Retire pgen and use pgen2 to generate the parser (GH-11814) 2019-03-01 15:34:44 -08:00
gdbinit bpo-29673: fix gdb scripts pystack and pystackv (GH-6126) 2018-04-06 17:22:04 -04:00
indent.pro
python-config.in bpo-36721: Add --embed option to python-config (GH-13500) 2019-05-23 03:30:23 +02:00
python-config.sh.in bpo-37925: Mention --embed in python-config usage (GH-15458) 2019-08-26 23:45:36 +02:00
python-embed.pc.in bpo-36721: Add --embed option to python-config (GH-13500) 2019-05-23 03:30:23 +02:00
python-wing3.wpr
python-wing4.wpr
python-wing5.wpr Issue 17457: extend test discovery to support namespace packages 2013-11-23 13:29:23 +00:00
python.man bpo-29535: Remove promize about hash randomization of datetime objects. (GH-15269) 2019-08-24 12:49:27 +03:00
python.pc.in bpo-36721: Add --embed option to python-config (GH-13500) 2019-05-23 03:30:23 +02:00
svnmap.txt bpo-32159: Revert Misc/svnmap.txt (#4639) 2017-11-29 18:58:33 +01:00
valgrind-python.supp closes bpo-34377: Update Valgrind suppressions. (GH-8729) 2018-08-10 23:41:34 -07:00
vgrindefs

README

Python Misc subdirectory
========================

This directory contains files that wouldn't fit in elsewhere.  Some
documents are only of historic importance.

Files found here
----------------

ACKS                    Acknowledgements
gdbinit                 Handy stuff to put in your .gdbinit file, if you use gdb
HISTORY                 News from previous releases -- oldest last
indent.pro              GNU indent profile approximating my C style
NEWS                    News for this release (for some meaning of "this")
Porting                 Mini-FAQ on porting to new platforms
python-config.in        Python script template for python-config
python.man              UNIX man page for the python interpreter
python.pc.in            Package configuration info template for pkg-config
python-wing*.wpr        Wing IDE project file
README                  The file you're reading now
README.AIX              Information about using Python on AIX
README.coverity         Information about running Coverity's Prevent on Python
README.valgrind         Information for Valgrind users, see valgrind-python.supp
SpecialBuilds.txt       Describes extra symbols you can set for debug builds
svnmap.txt              Map of old SVN revs and branches to hg changeset ids,
                        help history-digging
valgrind-python.supp    Valgrind suppression file, see README.valgrind
vgrindefs               Python configuration for vgrind (a generic pretty printer)