cpython

Commit Graph

Author	SHA1	Message	Date
CF Bolz-Tereick	9573d14215	gh-96954: use a directed acyclic word graph for storing the unicodedata codepoint names (#97906 ) Co-authored-by: Łukasz Langa <lukasz@langa.pl> Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com> Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>	2023-11-04 15:56:58 +01:00
James Gerity	def828995a	fixes gh-109559: Update `unicodedata` for Unicode 15.1.0 (GH-109560) --------- Co-authored-by: Benjamin Peterson <benjamin@python.org>	2023-09-19 22:07:47 -07:00
LiarPrincess	0c1d7a06ed	bpo-47243: Duplicate entry in 'Objects/unicodetype_db.h' (GH-32376) Fix for duplicate 1st entry in 'Objects/unicodetype_db.h': ```c /* a list of unique character type descriptors */ const _PyUnicode_TypeRecord _PyUnicode_TypeRecords[] = { {0, 0, 0, 0, 0, 0}, {0, 0, 0, 0, 0, 0}, <--- HERE {0, 0, 0, 0, 0, 32}, {0, 0, 0, 0, 0, 48}, … ``` https://bugs.python.org/issue47243 Automerge-Triggered-By: GH:isidentical	2022-09-28 06:57:14 -07:00
Benjamin Peterson	fd1e477f53	closes gh-96734: Update to Unicode 15.0.0. (GH-96809)	2022-09-13 15:45:12 -07:00
Carl Friedrich Bolz-Tereick	9c197bc8bf	GH-96172 fix unicodedata.east_asian_width being wrong on unassigned code points (#96207 )	2022-08-26 19:29:39 +03:00
Carl Friedrich Bolz-Tereick	2d9f252c0c	gh-96019: Fix caching of decompositions in makeunicodedata (GH-96020)	2022-08-19 12:20:44 +03:00
Benjamin Peterson	024fda47d4	closes bpo-45190: Update Unicode data to version 14.0.0. (GH-28336)	2021-09-14 11:00:38 -07:00
Benjamin Peterson	51796e5d26	Update some www.unicode.org URLs to use HTTPS. (GH-18912)	2020-03-10 21:10:59 -07:00
Benjamin Peterson	051b9d08d1	closes bpo-39926: Update Unicode to 13.0.0. (GH-18910)	2020-03-10 20:41:34 -07:00
Greg Price	a65678c5c9	bpo-37760: Convert from length-18 lists to a dataclass, in makeunicodedata. (GH-15265) Now the fields have names! Much easier to keep straight as a reader than the elements of an 18-tuple. Runs about 10-15% slower: from 10.8s to 12.3s, on my laptop. Fortunately that's perfectly fine for this maintenance script.	2019-09-12 10:23:43 +01:00
Greg Price	3e4498d35c	bpo-37760: Avoid cluttering work tree with downloaded Unicode files. (GH-15128)	2019-08-14 18:18:53 -07:00
Greg Price	c03e698c34	bpo-37760: Factor out standard range-expanding logic in makeunicodedata. (GH-15248) Much like the lower-level logic in commit `ef2af1ad4`, we had 4 copies of this logic, written in a couple of different ways. They're all implementing the same standard, so write it just once.	2019-08-13 19:28:38 -07:00
Greg Price	99d208efed	bpo-37760: Constant-fold some old options in makeunicodedata. (GH-15129) The `expand` option was introduced in 2000 in commit `fad27aee1`. It appears to have been always set since it was committed, and what it does is tell the code to do something essential. So, just always do that, and cut the option. Also cut the `linebreakprops` option, which isn't consulted anymore.	2019-08-12 22:59:30 -07:00
Greg Price	ef2af1ad44	bpo-37760: Factor out the basic UCD parsing logic of makeunicodedata. (GH-15130) There were 10 copies of this, and almost as many distinct versions of exactly how it was written. They're all implementing the same standard. Pull them out to the top, so the more interesting logic that remains becomes easier to read.	2019-08-12 22:20:56 -07:00
Stefan Behnel	faa2948654	Clean up and reduce visual clutter in the makeunicode.py script. (GH-7558)	2019-06-01 21:49:03 +02:00
Benjamin Peterson	3aca40d3cb	closes bpo-36861: Update Unicode database to 12.1.0. (GH-13214) Adds ㋿.	2019-05-08 20:59:35 -07:00
Inada Naoki	6fec905de5	bpo-36642: make unicodedata const (GH-12855)	2019-04-17 08:40:34 +09:00
Benjamin Peterson	738c19f4c5	closes bpo-33376: Update to Unicode 12.0.0. (GH-12256)	2019-03-09 16:25:55 -08:00
Benjamin Peterson	7c69c1c0fb	update to Unicode 11.0.0 (closes bpo-33778) (GH-7439) Also, standardize indentation of generated tables.	2018-06-06 20:14:28 -07:00
Benjamin Peterson	279a96206f	bpo-30736: upgrade to Unicode 10.0 (#2344 ) Straightforward. While we're at it, though, strip trailing whitespace from generated tables.	2017-06-22 22:31:08 -07:00
Jon Dufresne	3972628de3	bpo-30296 Remove unnecessary tuples, lists, sets, and dicts (#1489 ) * Replaced list(<generator expression>) with list comprehension * Replaced dict(<generator expression>) with dict comprehension * Replaced set(<list literal>) with set literal * Replaced builtin func(<list comprehension>) with func(<generator expression>) when supported (e.g. any(), all(), tuple(), min(), & max())	2017-05-18 07:35:54 -07:00
Benjamin Peterson	6775231597	Unicode 9.0.0 Not completely mechanical since support for East Asian Width changes—emoji codepoints became Wide—had to be added to unicodedata.	2016-09-14 23:53:47 -07:00
Benjamin Peterson	4801383c29	upgrade to Unicode 8.0.0	2015-06-27 15:45:56 -05:00
R David Murray	2623a5db6f	Merge: #18176 : Change generic UCD PropList link to version specific link.	2014-10-09 20:47:31 -04:00
R David Murray	5f16f90d1b	#18176 : Change generic UCD PropList link to version specific link.	2014-10-09 20:45:59 -04:00
R David Murray	532783bd5e	Merge: #18176 : fix another reference and add it to the makeunicodedata comment.	2014-10-09 17:41:55 -04:00
R David Murray	5bd62420f4	#18176 : fix another reference and add it to the makeunicodedata comment.	2014-10-09 17:39:48 -04:00
R David Murray	5ac125cde3	Merge: #18176 : updated stdtypes UCD link, added reminder to makeunicodedata.	2014-10-09 17:33:15 -04:00
R David Murray	7445a383a6	#18176 : updated stdtypes UCD link, added reminder to makeunicodedata. Patch by Alexander Belopolsky.	2014-10-09 17:30:33 -04:00
Benjamin Peterson	3032ed7cb1	upgrade to unicode 7.0.0	2014-07-06 13:04:20 -07:00
Benjamin Peterson	94d08d908b	upgrade unicode db to 6.3.0 (closes #19221 )	2013-10-10 17:24:45 -04:00
Ezio Melotti	d640fe2af5	#18803 : merge with 3.3.	2013-08-26 01:33:30 +03:00
Ezio Melotti	7c4a7e6f3c	#18803 : fix more typos. Patch by Févry Thibault.	2013-08-26 01:32:56 +03:00
Antoine Pitrou	9ed5f27266	Issue #18722 : Remove uses of the "register" keyword in C code.	2013-08-13 20:18:52 +02:00
Benjamin Peterson	b8350f1c7d	upgrade to UCD 6.2	2012-09-29 13:47:39 -04:00
Florent Xicluna	c20740109d	Some cleanup in the Tools directory.	2012-07-07 17:03:54 +02:00
Benjamin Peterson	71f660e00f	update to Unicode 6.1	2012-02-20 22:24:29 -05:00
Benjamin Peterson	ad9c569825	delta encoding of upper/lower/title makes a glorious return (#12736 )	2012-01-15 21:19:20 -05:00
Benjamin Peterson	d5890c8db5	add str.casefold() (closes #13752 )	2012-01-14 13:23:30 -05:00
Benjamin Peterson	b2bf01d824	use full unicode mappings for upper/lower/title case (#12736 ) Also broaden the category of characters that count as lowercase/uppercase.	2012-01-11 18:17:06 -05:00
Ezio Melotti	931b8aac80	#12753 : Add support for Unicode name aliases and named sequences.	2011-10-21 21:57:36 +03:00
Ezio Melotti	2a1e926d63	Fix ResourceWarnings in makeunicodedata.py.	2011-09-30 08:46:25 +03:00
Ezio Melotti	3b3499ba69	#11565 : Merge with 3.1.	2011-03-16 11:35:38 +02:00
Ezio Melotti	13925008dc	#11565 : Fix several typos. Patch by Piotr Kasprzyk.	2011-03-16 11:05:33 +02:00
Martin v. Löwis	5cbc71e50a	Issue #10459 : Update CJK character names to Unicode 6.0.	2010-11-22 09:00:02 +00:00
Martin v. Löwis	baecd7243a	Upgrade to Unicode 6.0.0. makeunicodedata.py: download all data files from unicode.org, switch to extracting Unihan data from zip file. Read linebreakprops and derivednormalizationprops even for old versions, even though they are not used in delta records. test:unicode.py: U+11000 is now assigned, use U+14000 instead.	2010-10-11 22:42:28 +00:00
Amaury Forgeot d'Arc	feb7307db4	#9210 : remove --with-wctype-functions configure option. The internal unicode database is now always used. (after 5 years: see http://mail.python.org/pipermail/python-dev/2004-December/050193.html )	2010-09-12 22:42:57 +00:00
Amaury Forgeot d'Arc	324ac65ceb	#5127 : Even on narrow unicode builds, the C functions that access the Unicode Database (Py_UNICODE_TOLOWER, Py_UNICODE_ISDECIMAL, and others) now accept and return characters from the full Unicode range (Py_UCS4). The differences from Python code are few: - unicodedata.numeric(), unicodedata.decimal() and unicodedata.digit() now return the correct value for large code points - repr() may consider more characters as printable.	2010-08-18 20:44:58 +00:00
Florent Xicluna	806d8cf0e8	Merged revisions 79494,79496 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r79494 \| florent.xicluna \| 2010-03-30 10:24:06 +0200 (mar, 30 mar 2010) \| 2 lines #7643: Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according to Unicode Standard Annex #14. ........ r79496 \| florent.xicluna \| 2010-03-30 18:29:03 +0200 (mar, 30 mar 2010) \| 2 lines Highlight the change of behavior related to r79494. Now VT and FF are linebreaks. ........	2010-03-30 19:34:18 +00:00
Florent Xicluna	f089fd67fc	Merged revisions 78982,78986 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r78982 \| florent.xicluna \| 2010-03-15 15:00:58 +0100 (lun, 15 mar 2010) \| 2 lines Remove py3k deprecation warnings from these Unicode tools. ........ r78986 \| florent.xicluna \| 2010-03-15 19:08:58 +0100 (lun, 15 mar 2010) \| 3 lines Issue #7783 and #7787: open_urlresource invalidates the outdated files from the local cache. Use this feature to fix test_normalization. ........	2010-03-19 14:25:03 +00:00

1 2

95 Commits