Commit Graph

142 Commits

Author SHA1 Message Date
CF Bolz-Tereick 9573d14215
gh-96954: use a directed acyclic word graph for storing the unicodedata codepoint names (#97906)
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>
2023-11-04 15:56:58 +01:00
Hugo van Kemenade f19416534a
Code: Update Donghee Na's name (#109744) 2023-09-25 18:17:34 +03:00
James Gerity def828995a
fixes gh-109559: Update `unicodedata` for Unicode 15.1.0 (GH-109560)
---------

Co-authored-by: Benjamin Peterson <benjamin@python.org>
2023-09-19 22:07:47 -07:00
LiarPrincess 0c1d7a06ed
bpo-47243: Duplicate entry in 'Objects/unicodetype_db.h' (GH-32376)
Fix for duplicate 1st entry in 'Objects/unicodetype_db.h':

```c
/* a list of unique character type descriptors */
const _PyUnicode_TypeRecord _PyUnicode_TypeRecords[] = {
    {0, 0, 0, 0, 0, 0},
    {0, 0, 0, 0, 0, 0}, <--- HERE
    {0, 0, 0, 0, 0, 32},
    {0, 0, 0, 0, 0, 48},
    …
```

https://bugs.python.org/issue47243

Automerge-Triggered-By: GH:isidentical
2022-09-28 06:57:14 -07:00
Benjamin Peterson fd1e477f53
closes gh-96734: Update to Unicode 15.0.0. (GH-96809) 2022-09-13 15:45:12 -07:00
Carl Friedrich Bolz-Tereick 9c197bc8bf
GH-96172 fix unicodedata.east_asian_width being wrong on unassigned code points (#96207) 2022-08-26 19:29:39 +03:00
Carl Friedrich Bolz-Tereick 2d9f252c0c
gh-96019: Fix caching of decompositions in makeunicodedata (GH-96020) 2022-08-19 12:20:44 +03:00
Davide Rizzo 733e15f170
gh-84508: tool to generate cjk traditional chinese mappings (gh-93272) 2022-06-11 23:19:41 +09:00
Dong-hee Na 749dc4b9c2
Revert "gh-84508: Add mapping files for Korean and Japanese. (gh-93309)" (#93320)
This reverts commit dec1e9346d.
2022-05-29 09:49:19 +09:00
Dong-hee Na dec1e9346d
gh-84508: Add mapping files for Korean and Japanese. (gh-93309) 2022-05-28 12:32:00 +09:00
Benjamin Peterson 024fda47d4
closes bpo-45190: Update Unicode data to version 14.0.0. (GH-28336) 2021-09-14 11:00:38 -07:00
Dong-hee Na 113feb3ec2
bpo-40328: Add tool for generating cjk mapping headers (GH-19602) 2020-04-30 02:34:24 +09:00
Benjamin Peterson 51796e5d26
Update some www.unicode.org URLs to use HTTPS. (GH-18912) 2020-03-10 21:10:59 -07:00
Benjamin Peterson 051b9d08d1
closes bpo-39926: Update Unicode to 13.0.0. (GH-18910) 2020-03-10 20:41:34 -07:00
Greg Price a65678c5c9 bpo-37760: Convert from length-18 lists to a dataclass, in makeunicodedata. (GH-15265)
Now the fields have names!  Much easier to keep straight as a
reader than the elements of an 18-tuple.

Runs about 10-15% slower: from 10.8s to 12.3s, on my laptop.
Fortunately that's perfectly fine for this maintenance script.
2019-09-12 10:23:43 +01:00
Greg Price 3cbc23aa22 bpo-37758: Cut always-constant conditionals on sys.maxunicode. (GH-15302)
Since PEP 393 in Python 3.3, this value is always 0x10ffff, the
maximum codepoint in Unicode; there's no longer such a thing as a
UCS-2 build of Python, which couldn't properly represent some
characters.

There are a couple of spots left where we still condition on the value
of this constant.  Take them out.
2019-09-09 08:20:40 -07:00
Greg Price 3e4498d35c bpo-37760: Avoid cluttering work tree with downloaded Unicode files. (GH-15128) 2019-08-14 18:18:53 -07:00
Greg Price c03e698c34 bpo-37760: Factor out standard range-expanding logic in makeunicodedata. (GH-15248)
Much like the lower-level logic in commit ef2af1ad4, we had
4 copies of this logic, written in a couple of different ways.
They're all implementing the same standard, so write it just once.
2019-08-13 19:28:38 -07:00
Greg Price 99d208efed bpo-37760: Constant-fold some old options in makeunicodedata. (GH-15129)
The `expand` option was introduced in 2000 in commit fad27aee1.
It appears to have been always set since it was committed, and
what it does is tell the code to do something essential.  So,
just always do that, and cut the option.

Also cut the `linebreakprops` option, which isn't consulted anymore.
2019-08-12 22:59:30 -07:00
Greg Price ef2af1ad44 bpo-37760: Factor out the basic UCD parsing logic of makeunicodedata. (GH-15130)
There were 10 copies of this, and almost as many distinct versions of
exactly how it was written.  They're all implementing the same
standard.  Pull them out to the top, so the more interesting logic
that remains becomes easier to read.
2019-08-12 22:20:56 -07:00
Stefan Behnel faa2948654
Clean up and reduce visual clutter in the makeunicode.py script. (GH-7558) 2019-06-01 21:49:03 +02:00
Benjamin Peterson 3aca40d3cb
closes bpo-36861: Update Unicode database to 12.1.0. (GH-13214)
Adds ㋿.
2019-05-08 20:59:35 -07:00
Inada Naoki 6fec905de5
bpo-36642: make unicodedata const (GH-12855) 2019-04-17 08:40:34 +09:00
Serhiy Storchaka 172bb39452
bpo-22831: Use "with" to avoid possible fd leaks in tools (part 2). (GH-10927) 2019-03-30 08:33:02 +02:00
Benjamin Peterson 738c19f4c5
closes bpo-33376: Update to Unicode 12.0.0. (GH-12256) 2019-03-09 16:25:55 -08:00
Benjamin Peterson 7c69c1c0fb
update to Unicode 11.0.0 (closes bpo-33778) (GH-7439)
Also, standardize indentation of generated tables.
2018-06-06 20:14:28 -07:00
Benjamin Peterson 279a96206f bpo-30736: upgrade to Unicode 10.0 (#2344)
Straightforward. While we're at it, though, strip trailing whitespace from generated tables.
2017-06-22 22:31:08 -07:00
Zachary Ware 6b6e687766 bpo-27425: Be more explicit in .gitattributes (GH-840)
Updates checked-in line endings on several files.
2017-06-10 14:58:42 -05:00
Jon Dufresne 3972628de3 bpo-30296 Remove unnecessary tuples, lists, sets, and dicts (#1489)
* Replaced list(<generator expression>) with list comprehension
* Replaced dict(<generator expression>) with dict comprehension
* Replaced set(<list literal>) with set literal
* Replaced builtin func(<list comprehension>) with func(<generator
  expression>) when supported (e.g. any(), all(), tuple(), min(), &
  max())
2017-05-18 07:35:54 -07:00
Benjamin Peterson 6775231597 Unicode 9.0.0
Not completely mechanical since support for East Asian Width changes—emoji
codepoints became Wide—had to be added to unicodedata.
2016-09-14 23:53:47 -07:00
R David Murray 44b548dda8 #27364: fix "incorrect" uses of escape character in the stdlib.
And most of the tools.

Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and
Martin Panter.
2016-09-08 13:59:53 -04:00
Benjamin Peterson 4801383c29 upgrade to Unicode 8.0.0 2015-06-27 15:45:56 -05:00
Serhiy Storchaka ba9ac5b5c4 Issue #16261: Converted some bare except statements to except statements
with specified exception type.  Original patch by Ramchandra Apte.
2015-05-20 10:33:40 +03:00
Zachary Ware 774ac377da Closes #17202: Merge with 3.4 2015-04-13 12:11:40 -05:00
Zachary Ware 4c9c848159 Issue #17202: Add .bat to .hgeol to force them to CRLF.
Using LF can a script to fail if it tries to use a label that is
split across 512 byte blocks.  Who knows why.
2015-04-13 11:59:54 -05:00
Serhiy Storchaka 82e07b92b3 Issue #23181: More "codepoint" -> "code point". 2015-01-18 11:33:31 +02:00
Serhiy Storchaka d3faf43f9b Issue #23181: More "codepoint" -> "code point". 2015-01-18 11:28:37 +02:00
R David Murray 2623a5db6f Merge: #18176: Change generic UCD PropList link to version specific link. 2014-10-09 20:47:31 -04:00
R David Murray 5f16f90d1b #18176: Change generic UCD PropList link to version specific link. 2014-10-09 20:45:59 -04:00
R David Murray 532783bd5e Merge: #18176: fix another reference and add it to the makeunicodedata comment. 2014-10-09 17:41:55 -04:00
R David Murray 5bd62420f4 #18176: fix another reference and add it to the makeunicodedata comment. 2014-10-09 17:39:48 -04:00
R David Murray 5ac125cde3 Merge: #18176: updated stdtypes UCD link, added reminder to makeunicodedata. 2014-10-09 17:33:15 -04:00
R David Murray 7445a383a6 #18176: updated stdtypes UCD link, added reminder to makeunicodedata.
Patch by Alexander Belopolsky.
2014-10-09 17:30:33 -04:00
Benjamin Peterson 3032ed7cb1 upgrade to unicode 7.0.0 2014-07-06 13:04:20 -07:00
Serhiy Storchaka 8f8ec92de8 Issue #19936: Added executable bits or shebang lines to Python scripts which
requires them.  Disable executable bits and shebang lines in test and
benchmark files in order to prevent using a random system python, and in
source files of modules which don't provide command line interface.  Fixed
shebang lines in the unittestgui and checkpip scripts.
2014-01-16 17:33:23 +02:00
Serhiy Storchaka b992a0e102 Issue #19936: Added executable bits or shebang lines to Python scripts which
requires them.  Disable executable bits and shebang lines in test and
benchmark files in order to prevent using a random system python, and in
source files of modules which don't provide command line interface.  Fixed
shebang line to use python3 executable in the unittestgui script.
2014-01-16 17:15:49 +02:00
Andrew Kuchling 9d5c071060 #1097797: add the original mapping file 2013-11-10 21:46:02 -05:00
Andrew Kuchling 695f07b27b Fix some PEP8-formatting problems in the generated code 2013-11-10 21:45:24 -05:00
Benjamin Peterson 94d08d908b upgrade unicode db to 6.3.0 (closes #19221) 2013-10-10 17:24:45 -04:00
Ezio Melotti d640fe2af5 #18803: merge with 3.3. 2013-08-26 01:33:30 +03:00