mirror of https://github.com/python/cpython
2f09413947
The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX #15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop This restores a small optimization that the original version of this code had for the `unicodedata.normalize` use case. With this, that case is actually faster than in master! $ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 561 usec per loop $ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 512 usec per loop |
||
---|---|---|
.. | ||
NEWS.d | ||
ACKS | ||
HISTORY | ||
Porting | ||
README | ||
README.AIX | ||
README.coverity | ||
README.valgrind | ||
SpecialBuilds.txt | ||
coverity_model.c | ||
gdbinit | ||
indent.pro | ||
python-config.in | ||
python-config.sh.in | ||
python-embed.pc.in | ||
python-wing3.wpr | ||
python-wing4.wpr | ||
python-wing5.wpr | ||
python.man | ||
python.pc.in | ||
svnmap.txt | ||
valgrind-python.supp | ||
vgrindefs |
README
Python Misc subdirectory ======================== This directory contains files that wouldn't fit in elsewhere. Some documents are only of historic importance. Files found here ---------------- ACKS Acknowledgements gdbinit Handy stuff to put in your .gdbinit file, if you use gdb HISTORY News from previous releases -- oldest last indent.pro GNU indent profile approximating my C style NEWS News for this release (for some meaning of "this") Porting Mini-FAQ on porting to new platforms python-config.in Python script template for python-config python.man UNIX man page for the python interpreter python.pc.in Package configuration info template for pkg-config python-wing*.wpr Wing IDE project file README The file you're reading now README.AIX Information about using Python on AIX README.coverity Information about running Coverity's Prevent on Python README.valgrind Information for Valgrind users, see valgrind-python.supp SpecialBuilds.txt Describes extra symbols you can set for debug builds svnmap.txt Map of old SVN revs and branches to hg changeset ids, help history-digging valgrind-python.supp Valgrind suppression file, see README.valgrind vgrindefs Python configuration for vgrind (a generic pretty printer)