Benjamin Peterson
bb904e063d
closes gh-124016: update Unicode to 16.0.0 ( #124017 )
2024-09-13 07:47:04 -07:00
CF Bolz-Tereick
9573d14215
gh-96954: use a directed acyclic word graph for storing the unicodedata codepoint names ( #97906 )
...
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>
2023-11-04 15:56:58 +01:00
James Gerity
def828995a
fixes gh-109559: Update `unicodedata` for Unicode 15.1.0 (GH-109560)
...
---------
Co-authored-by: Benjamin Peterson <benjamin@python.org>
2023-09-19 22:07:47 -07:00
LiarPrincess
0c1d7a06ed
bpo-47243: Duplicate entry in 'Objects/unicodetype_db.h' (GH-32376)
...
Fix for duplicate 1st entry in 'Objects/unicodetype_db.h':
```c
/* a list of unique character type descriptors */
const _PyUnicode_TypeRecord _PyUnicode_TypeRecords[] = {
{0, 0, 0, 0, 0, 0},
{0, 0, 0, 0, 0, 0}, <--- HERE
{0, 0, 0, 0, 0, 32},
{0, 0, 0, 0, 0, 48},
…
```
https://bugs.python.org/issue47243
Automerge-Triggered-By: GH:isidentical
2022-09-28 06:57:14 -07:00
Benjamin Peterson
fd1e477f53
closes gh-96734: Update to Unicode 15.0.0. (GH-96809)
2022-09-13 15:45:12 -07:00
Carl Friedrich Bolz-Tereick
9c197bc8bf
GH-96172 fix unicodedata.east_asian_width being wrong on unassigned code points ( #96207 )
2022-08-26 19:29:39 +03:00
Carl Friedrich Bolz-Tereick
2d9f252c0c
gh-96019: Fix caching of decompositions in makeunicodedata (GH-96020)
2022-08-19 12:20:44 +03:00
Benjamin Peterson
024fda47d4
closes bpo-45190: Update Unicode data to version 14.0.0. (GH-28336)
2021-09-14 11:00:38 -07:00
Benjamin Peterson
51796e5d26
Update some www.unicode.org URLs to use HTTPS. (GH-18912)
2020-03-10 21:10:59 -07:00
Benjamin Peterson
051b9d08d1
closes bpo-39926: Update Unicode to 13.0.0. (GH-18910)
2020-03-10 20:41:34 -07:00
Greg Price
a65678c5c9
bpo-37760: Convert from length-18 lists to a dataclass, in makeunicodedata. (GH-15265)
...
Now the fields have names! Much easier to keep straight as a
reader than the elements of an 18-tuple.
Runs about 10-15% slower: from 10.8s to 12.3s, on my laptop.
Fortunately that's perfectly fine for this maintenance script.
2019-09-12 10:23:43 +01:00
Greg Price
3e4498d35c
bpo-37760: Avoid cluttering work tree with downloaded Unicode files. (GH-15128)
2019-08-14 18:18:53 -07:00
Greg Price
c03e698c34
bpo-37760: Factor out standard range-expanding logic in makeunicodedata. (GH-15248)
...
Much like the lower-level logic in commit ef2af1ad4
, we had
4 copies of this logic, written in a couple of different ways.
They're all implementing the same standard, so write it just once.
2019-08-13 19:28:38 -07:00
Greg Price
99d208efed
bpo-37760: Constant-fold some old options in makeunicodedata. (GH-15129)
...
The `expand` option was introduced in 2000 in commit fad27aee1
.
It appears to have been always set since it was committed, and
what it does is tell the code to do something essential. So,
just always do that, and cut the option.
Also cut the `linebreakprops` option, which isn't consulted anymore.
2019-08-12 22:59:30 -07:00
Greg Price
ef2af1ad44
bpo-37760: Factor out the basic UCD parsing logic of makeunicodedata. (GH-15130)
...
There were 10 copies of this, and almost as many distinct versions of
exactly how it was written. They're all implementing the same
standard. Pull them out to the top, so the more interesting logic
that remains becomes easier to read.
2019-08-12 22:20:56 -07:00
Stefan Behnel
faa2948654
Clean up and reduce visual clutter in the makeunicode.py script. (GH-7558)
2019-06-01 21:49:03 +02:00
Benjamin Peterson
3aca40d3cb
closes bpo-36861: Update Unicode database to 12.1.0. (GH-13214)
...
Adds ㋿.
2019-05-08 20:59:35 -07:00
Inada Naoki
6fec905de5
bpo-36642: make unicodedata const (GH-12855)
2019-04-17 08:40:34 +09:00
Benjamin Peterson
738c19f4c5
closes bpo-33376: Update to Unicode 12.0.0. (GH-12256)
2019-03-09 16:25:55 -08:00
Benjamin Peterson
7c69c1c0fb
update to Unicode 11.0.0 (closes bpo-33778) (GH-7439)
...
Also, standardize indentation of generated tables.
2018-06-06 20:14:28 -07:00
Benjamin Peterson
279a96206f
bpo-30736: upgrade to Unicode 10.0 ( #2344 )
...
Straightforward. While we're at it, though, strip trailing whitespace from generated tables.
2017-06-22 22:31:08 -07:00
Jon Dufresne
3972628de3
bpo-30296 Remove unnecessary tuples, lists, sets, and dicts ( #1489 )
...
* Replaced list(<generator expression>) with list comprehension
* Replaced dict(<generator expression>) with dict comprehension
* Replaced set(<list literal>) with set literal
* Replaced builtin func(<list comprehension>) with func(<generator
expression>) when supported (e.g. any(), all(), tuple(), min(), &
max())
2017-05-18 07:35:54 -07:00
Benjamin Peterson
6775231597
Unicode 9.0.0
...
Not completely mechanical since support for East Asian Width changes—emoji
codepoints became Wide—had to be added to unicodedata.
2016-09-14 23:53:47 -07:00
Benjamin Peterson
4801383c29
upgrade to Unicode 8.0.0
2015-06-27 15:45:56 -05:00
R David Murray
2623a5db6f
Merge: #18176 : Change generic UCD PropList link to version specific link.
2014-10-09 20:47:31 -04:00
R David Murray
5f16f90d1b
#18176 : Change generic UCD PropList link to version specific link.
2014-10-09 20:45:59 -04:00
R David Murray
532783bd5e
Merge: #18176 : fix another reference and add it to the makeunicodedata comment.
2014-10-09 17:41:55 -04:00
R David Murray
5bd62420f4
#18176 : fix another reference and add it to the makeunicodedata comment.
2014-10-09 17:39:48 -04:00
R David Murray
5ac125cde3
Merge: #18176 : updated stdtypes UCD link, added reminder to makeunicodedata.
2014-10-09 17:33:15 -04:00
R David Murray
7445a383a6
#18176 : updated stdtypes UCD link, added reminder to makeunicodedata.
...
Patch by Alexander Belopolsky.
2014-10-09 17:30:33 -04:00
Benjamin Peterson
3032ed7cb1
upgrade to unicode 7.0.0
2014-07-06 13:04:20 -07:00
Benjamin Peterson
94d08d908b
upgrade unicode db to 6.3.0 ( closes #19221 )
2013-10-10 17:24:45 -04:00
Ezio Melotti
d640fe2af5
#18803 : merge with 3.3.
2013-08-26 01:33:30 +03:00
Ezio Melotti
7c4a7e6f3c
#18803 : fix more typos. Patch by Févry Thibault.
2013-08-26 01:32:56 +03:00
Antoine Pitrou
9ed5f27266
Issue #18722 : Remove uses of the "register" keyword in C code.
2013-08-13 20:18:52 +02:00
Benjamin Peterson
b8350f1c7d
upgrade to UCD 6.2
2012-09-29 13:47:39 -04:00
Florent Xicluna
c20740109d
Some cleanup in the Tools directory.
2012-07-07 17:03:54 +02:00
Benjamin Peterson
71f660e00f
update to Unicode 6.1
2012-02-20 22:24:29 -05:00
Benjamin Peterson
ad9c569825
delta encoding of upper/lower/title makes a glorious return ( #12736 )
2012-01-15 21:19:20 -05:00
Benjamin Peterson
d5890c8db5
add str.casefold() ( closes #13752 )
2012-01-14 13:23:30 -05:00
Benjamin Peterson
b2bf01d824
use full unicode mappings for upper/lower/title case ( #12736 )
...
Also broaden the category of characters that count as lowercase/uppercase.
2012-01-11 18:17:06 -05:00
Ezio Melotti
931b8aac80
#12753 : Add support for Unicode name aliases and named sequences.
2011-10-21 21:57:36 +03:00
Ezio Melotti
2a1e926d63
Fix ResourceWarnings in makeunicodedata.py.
2011-09-30 08:46:25 +03:00
Ezio Melotti
3b3499ba69
#11565 : Merge with 3.1.
2011-03-16 11:35:38 +02:00
Ezio Melotti
13925008dc
#11565 : Fix several typos. Patch by Piotr Kasprzyk.
2011-03-16 11:05:33 +02:00
Martin v. Löwis
5cbc71e50a
Issue #10459 : Update CJK character names to Unicode 6.0.
2010-11-22 09:00:02 +00:00
Martin v. Löwis
baecd7243a
Upgrade to Unicode 6.0.0.
...
makeunicodedata.py: download all data files from unicode.org,
switch to extracting Unihan data from zip file.
Read linebreakprops and derivednormalizationprops even for
old versions, even though they are not used in delta records.
test:unicode.py: U+11000 is now assigned, use U+14000 instead.
2010-10-11 22:42:28 +00:00
Amaury Forgeot d'Arc
feb7307db4
#9210 : remove --with-wctype-functions configure option.
...
The internal unicode database is now always used.
(after 5 years: see
http://mail.python.org/pipermail/python-dev/2004-December/050193.html
)
2010-09-12 22:42:57 +00:00
Amaury Forgeot d'Arc
324ac65ceb
#5127 : Even on narrow unicode builds, the C functions that access the Unicode
...
Database (Py_UNICODE_TOLOWER, Py_UNICODE_ISDECIMAL, and others) now accept
and return characters from the full Unicode range (Py_UCS4).
The differences from Python code are few:
- unicodedata.numeric(), unicodedata.decimal() and unicodedata.digit()
now return the correct value for large code points
- repr() may consider more characters as printable.
2010-08-18 20:44:58 +00:00
Florent Xicluna
806d8cf0e8
Merged revisions 79494,79496 via svnmerge from
...
svn+ssh://pythondev@svn.python.org/python/trunk
........
r79494 | florent.xicluna | 2010-03-30 10:24:06 +0200 (mar, 30 mar 2010) | 2 lines
#7643 : Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according to Unicode Standard Annex #14 .
........
r79496 | florent.xicluna | 2010-03-30 18:29:03 +0200 (mar, 30 mar 2010) | 2 lines
Highlight the change of behavior related to r79494. Now VT and FF are linebreaks.
........
2010-03-30 19:34:18 +00:00