cpython/Doc
Miss Islington (bot) 4dd1c9d9c2
closes bpo-37966: Fully implement the UAX GH-15 quick-check algorithm. (GH-15558)
The purpose of the `unicodedata.is_normalized` function is to answer
the question `str == unicodedata.normalized(form, str)` more
efficiently than writing just that, by using the "quick check"
optimization described in the Unicode standard in UAX GH-15.

However, it turns out the code doesn't implement the full algorithm
from the standard, and as a result we often miss the optimization and
end up having to compute the whole normalized string after all.

Implement the standard's algorithm.  This greatly speeds up
`unicodedata.is_normalized` in many cases where our partial variant
of quick-check had been returning MAYBE and the standard algorithm
returns NO.

At a quick test on my desktop, the existing code takes about 4.4 ms/MB
(so 4.4 ns per byte) when the partial quick-check returns MAYBE and it
has to do the slow normalize-and-compare:

  $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
      -- 'unicodedata.is_normalized("NFD", s)'
  50 loops, best of 5: 4.39 msec per loop

With this patch, it gets the answer instantly (58 ns) on the same 1 MB
string:

  $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
      -- 'unicodedata.is_normalized("NFD", s)'
  5000000 loops, best of 5: 58.2 nsec per loop

This restores a small optimization that the original version of this
code had for the `unicodedata.normalize` use case.

With this, that case is actually faster than in master!

$ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
    -- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 561 usec per loop

$ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
    -- 'unicodedata.normalize("NFD", s)'
500 loops, best of 5: 512 usec per loop
(cherry picked from commit 2f09413947)

Co-authored-by: Greg Price <gnprice@gmail.com>
2019-09-03 20:03:37 -07:00
..
c-api Fix typos mostly in comments, docs and test names (GH-15209) 2019-08-30 13:42:54 -07:00
data Fix typo: Pyssize_t => Py_ssize_t (GH-15411) 2019-08-26 08:54:26 -07:00
distributing bpo-36797: Reduce levels of indirection in outdated distutils docs (#13462) 2019-05-24 00:06:39 +10:00
distutils bpo-37481: Deprecate distutils bdist_wininst command (GH-14553) 2019-07-05 02:03:23 -07:00
extending Doc: Replace the deprecated highlightlang directive by highlight. (#13377) 2019-05-17 15:25:34 +05:30
faq bpo-36167: fix an incorrect capitalization (GH-14482) 2019-08-28 22:47:42 -07:00
howto bpo-25777: Wording describes a lookup, not a call (GH-15573) (GH-15576) 2019-08-28 23:12:13 -07:00
includes bpo-36261: Improve example of the preamble field in email docs (GH-14751) 2019-07-14 00:53:15 -07:00
install Doc: Replace the deprecated highlightlang directive by highlight. (#13377) 2019-05-17 15:25:34 +05:30
installing Doc: Replace the deprecated highlightlang directive by highlight. (#13377) 2019-05-17 15:25:34 +05:30
library bpo-38010 Sync importlib.metadata with importlib_metadata 0.20. (GH-15646) (GH-15648) 2019-09-02 12:11:01 -04:00
reference bpo-36743: __get__ is sometimes called without the owner argument (GH-12992) (GH-15589) 2019-08-29 02:02:51 -07:00
tools bpo-36853: Fix suspicious.py to actually print the unused rules (GH-13579) (GH-15649) 2019-09-02 12:12:19 -04:00
tutorial bpo-14112: Allow beginners to explore shallowness in greater depth ;-) (GH-15465) (GH-15469) 2019-08-24 11:33:18 -07:00
using bpo-29535: Remove promize about hash randomization of datetime objects. (GH-15269) 2019-08-24 03:19:51 -07:00
whatsnew closes bpo-37966: Fully implement the UAX GH-15 quick-check algorithm. (GH-15558) 2019-09-03 20:03:37 -07:00
Makefile Doc: Update pip and setuptools when creating the virtual environment (GH-13307) 2019-05-14 04:49:49 -07:00
README.rst Doc: Add an optional obsolete header. (GH-13638) 2019-05-29 18:34:04 +02:00
about.rst
bugs.rst Fix funny typo in Doc/bugs. (GH-15412) 2019-08-23 21:16:28 -07:00
conf.py Doc: Keep the venv/* exclude pattern. (GH-15229) 2019-08-25 23:19:45 -07:00
contents.rst
copyright.rst Bump copyright years to 2019. (GH-11404) 2019-01-02 07:46:53 -08:00
glossary.rst Doc: Space breaking whole definition. (GH-13615) 2019-05-28 14:04:42 +02:00
license.rst Doc: Replace the deprecated highlightlang directive by highlight. (#13377) 2019-05-17 15:25:34 +05:30
make.bat Implement Windows release builds in Azure Pipelines (GH-14065) 2019-06-14 14:20:16 -07:00

README.rst

Python Documentation README
~~~~~~~~~~~~~~~~~~~~~~~~~~~

This directory contains the reStructuredText (reST) sources to the Python
documentation.  You don't need to build them yourself, `prebuilt versions are
available <https://docs.python.org/dev/download.html>`_.

Documentation on authoring Python documentation, including information about
both style and markup, is available in the "`Documenting Python
<https://devguide.python.org/documenting/>`_" chapter of the
developers guide.


Building the docs
=================

The documentation is built with several tools which are not included in this
tree but are maintained separately and are available from
`PyPI <https://pypi.org/>`_.

* `Sphinx <https://pypi.org/project/Sphinx/>`_
* `blurb <https://pypi.org/project/blurb/>`_
* `python-docs-theme <https://pypi.org/project/python-docs-theme/>`_

The easiest way to install these tools is to create a virtual environment and
install the tools into there.

Using make
----------

To get started on UNIX, you can create a virtual environment with the command ::

  make venv

That will install all the tools necessary to build the documentation. Assuming
the virtual environment was created in the ``venv`` directory (the default;
configurable with the VENVDIR variable), you can run the following command to
build the HTML output files::

  make html

By default, if the virtual environment is not created, the Makefile will
look for instances of sphinxbuild and blurb installed on your process PATH
(configurable with the SPHINXBUILD and BLURB variables).

On Windows, we try to emulate the Makefile as closely as possible with a
``make.bat`` file. If you need to specify the Python interpreter to use,
set the PYTHON environment variable instead.

Available make targets are:

* "clean", which removes all build files.

* "venv", which creates a virtual environment with all necessary tools
  installed.

* "html", which builds standalone HTML files for offline viewing.

* "htmlview", which re-uses the "html" builder, but then opens the main page
  in your default web browser.

* "htmlhelp", which builds HTML files and a HTML Help project file usable to
  convert them into a single Compiled HTML (.chm) file -- these are popular
  under Microsoft Windows, but very handy on every platform.

  To create the CHM file, you need to run the Microsoft HTML Help Workshop
  over the generated project (.hhp) file.  The make.bat script does this for
  you on Windows.

* "latex", which builds LaTeX source files as input to "pdflatex" to produce
  PDF documents.

* "text", which builds a plain text file for each source file.

* "epub", which builds an EPUB document, suitable to be viewed on e-book
  readers.

* "linkcheck", which checks all external references to see whether they are
  broken, redirected or malformed, and outputs this information to stdout as
  well as a plain-text (.txt) file.

* "changes", which builds an overview over all versionadded/versionchanged/
  deprecated items in the current version. This is meant as a help for the
  writer of the "What's New" document.

* "coverage", which builds a coverage overview for standard library modules and
  C API.

* "pydoc-topics", which builds a Python module containing a dictionary with
  plain text documentation for the labels defined in
  `tools/pyspecific.py` -- pydoc needs these to show topic and keyword help.

* "suspicious", which checks the parsed markup for text that looks like
  malformed and thus unconverted reST.

* "check", which checks for frequent markup errors.

* "serve", which serves the build/html directory on port 8000.

* "dist", (Unix only) which creates distributable archives of HTML, text,
  PDF, and EPUB builds.


Without make
------------

First, install the tool dependencies from PyPI.

Then, from the ``Doc`` directory, run ::

   sphinx-build -b<builder> . build/<builder>

where ``<builder>`` is one of html, text, latex, or htmlhelp (for explanations
see the make targets above).

Deprecation header
==================

You can define the ``outdated`` variable in ``html_context`` to show a
red banner on each page redirecting to the "latest" version.

The link points to the same page on ``/3/``, sadly for the moment the
language is lost during the process.


Contributing
============

Bugs in the content should be reported to the
`Python bug tracker <https://bugs.python.org>`_.

Bugs in the toolset should be reported to the tools themselves.

You can also send a mail to the Python Documentation Team at docs@python.org,
and we will process your request as soon as possible.

If you want to help the Documentation Team, you are always welcome.  Just send
a mail to docs@python.org.