diff --git a/Doc/howto/unicode.rst b/Doc/howto/unicode.rst index 0f829f39af4..3d8bc060164 100644 --- a/Doc/howto/unicode.rst +++ b/Doc/howto/unicode.rst @@ -403,7 +403,7 @@ These are grouped into categories such as "Letter", "Number", "Punctuation", or from the above output, ``'Ll'`` means 'Letter, lowercase', ``'No'`` means "Number, other", ``'Mn'`` is "Mark, nonspacing", and ``'So'`` is "Symbol, other". See - for a + for a list of category codes. References diff --git a/Doc/library/unicodedata.rst b/Doc/library/unicodedata.rst index e1e6dc130ea..af1ac5f1591 100644 --- a/Doc/library/unicodedata.rst +++ b/Doc/library/unicodedata.rst @@ -15,12 +15,12 @@ This module provides access to the Unicode Character Database which defines character properties for all Unicode characters. The data in this database is -based on the :file:`UnicodeData.txt` file version 5.1.0 which is publicly +based on the :file:`UnicodeData.txt` file version 5.2.0 which is publicly available from ftp://ftp.unicode.org/. The module uses the same names and symbols as defined by the UnicodeData File -Format 5.1.0 (see http://www.unicode.org/Public/5.1.0/ucd/UCD.html). It defines -the following functions: +Format 5.2.0 (see http://www.unicode.org/reports/tr44/). It defines the +following functions: .. function:: lookup(name) diff --git a/Doc/whatsnew/2.7.rst b/Doc/whatsnew/2.7.rst index e063f1b96d2..e6aa3c5c5a1 100644 --- a/Doc/whatsnew/2.7.rst +++ b/Doc/whatsnew/2.7.rst @@ -933,11 +933,13 @@ changes, or look through the Subversion logs for all the details. a timeout was provided and the operation timed out. (Contributed by Tim Lesher; :issue:`1674032`.) -* The Unicode database provided by the :mod:`unicodedata` module - remains at version 5.1.0, but Python now uses it internally to - determine which characters are numeric, whitespace, or represent - line breaks. The database also now includes information from the - :file:`Unihan.txt` data file. (Patch by Anders Chrigström +* The Unicode database has been updated to the version 5.2.0. + (Updated by Florent Xicluna; :issue:`8024`.) + +* The Unicode database provided by the :mod:`unicodedata` is used + internally to determine which characters are numeric, whitespace, + or represent line breaks. The database also now includes information + from the :file:`Unihan.txt` data file. (Patch by Anders Chrigström and Amaury Forgeot d'Arc; :issue:`1571184`.) * The :class:`UserDict` class is now a new-style class. (Changed by diff --git a/Modules/unicodedata.c b/Modules/unicodedata.c index ecd744a4c23..463f2cce3d5 100644 --- a/Modules/unicodedata.c +++ b/Modules/unicodedata.c @@ -1,8 +1,8 @@ /* ------------------------------------------------------------------------ - unicodedata -- Provides access to the Unicode 5.1 data base. + unicodedata -- Provides access to the Unicode 5.2 data base. - Data was extracted from the Unicode 5.1 UnicodeData.txt file. + Data was extracted from the Unicode 5.2 UnicodeData.txt file. Written by Marc-Andre Lemburg (mal@lemburg.com). Modified for Python 2.0 by Fredrik Lundh (fredrik@pythonware.com) @@ -1235,11 +1235,10 @@ PyDoc_STRVAR(unicodedata_docstring, "This module provides access to the Unicode Character Database which\n\ defines character properties for all Unicode characters. The data in\n\ this database is based on the UnicodeData.txt file version\n\ -5.1.0 which is publically available from ftp://ftp.unicode.org/.\n\ +5.2.0 which is publically available from ftp://ftp.unicode.org/.\n\ \n\ The module uses the same names and symbols as defined by the\n\ -UnicodeData File Format 5.1.0 (see\n\ -http://www.unicode.org/Public/5.1.0/ucd/UCD.html)."); +UnicodeData File Format 5.2.0 (see http://www.unicode.org/reports/tr44/)."); static struct PyModuleDef unicodedatamodule = {