cpython

Commit Graph

Author	SHA1	Message	Date
CF Bolz-Tereick	9573d14215	gh-96954: use a directed acyclic word graph for storing the unicodedata codepoint names (#97906 ) Co-authored-by: Łukasz Langa <lukasz@langa.pl> Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com> Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>	2023-11-04 15:56:58 +01:00
Benjamin Peterson	850cc8d0b1	gh-109559: Update unicodedata checksums for 15.1.0. (#109597 ) Update unicodedata checksums for 15.1.0.	2023-09-19 22:40:34 -07:00
Serhiy Storchaka	f3ba0a74cd	gh-108416: Mark slow test methods with @requires_resource('cpu') (GH-108421) Only mark tests which spend significant system or user time, by itself or in subprocesses.	2023-09-02 07:45:34 +03:00
Alex Waygood	401d7a7f00	gh-102515: Remove unused imports in the `Lib/` directory (#102516 )	2023-03-08 11:45:38 +00:00
Victor Stinner	2488c1e1b6	gh-99892: test_unicodedata: skip test on download failure (#100011 ) Skip test_normalization() of test_unicodedata if it fails to download NormalizationTest.txt file from pythontest.net.	2022-12-05 16:37:40 +01:00
Batuhan Taskaya	5a32eeced2	gh-96954: Add tests for unicodedata.name/lookup (#96955 ) They were undertested, and since #96954 might involve a rewrite of this part of the code we want to ensure that there won't be any behavioral change. Co-authored-by: Carl Friedrich Bolz-Tereick <cfbolz@gmx.de>	2022-09-21 14:52:40 +02:00
Benjamin Peterson	fd1e477f53	closes gh-96734: Update to Unicode 15.0.0. (GH-96809)	2022-09-13 15:45:12 -07:00
Carl Friedrich Bolz-Tereick	9c197bc8bf	GH-96172 fix unicodedata.east_asian_width being wrong on unassigned code points (#96207 )	2022-08-26 19:29:39 +03:00
Benjamin Peterson	024fda47d4	closes bpo-45190: Update Unicode data to version 14.0.0. (GH-28336)	2021-09-14 11:00:38 -07:00
Erlend Egeberg Aasland	fbff5387c3	bpo-43988: Use check disallow instantiation helper (GH-26392)	2021-05-27 08:43:52 +02:00
Erlend Egeberg Aasland	9746cda705	bpo-43916: Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to selected types (GH-25748) Apply Py_TPFLAGS_DISALLOW_INSTANTIATION to the following types: * _dbm.dbm * _gdbm.gdbm * _multibytecodec.MultibyteCodec * _sre..SRE_Scanner * _thread._localdummy * _thread.lock * _winapi.Overlapped * array.arrayiterator * functools.KeyWrapper * functools._lru_list_elem * pyexpat.xmlparser * re.Match * re.Pattern * unicodedata.UCD * zlib.Compress * zlib.Decompress	2021-04-30 16:04:57 +02:00
Ammar Askar	c6ccdfb479	bpo-43144: Mark unicodedata's test_normalization as requiring network (GH-24650) Co-authored-by: Arkadiusz Miśkiewicz <arekm@maven.pl>	2021-02-26 12:24:32 +09:00
Benjamin Peterson	c77aa2d60b	bpo-39926: Update unicodedata checksum tests for Unicode 13.0 update. (GH-18913) I forget these tests required the cpu resource.	2020-03-10 21:18:33 -07:00
Benjamin Peterson	51796e5d26	Update some www.unicode.org URLs to use HTTPS. (GH-18912)	2020-03-10 21:10:59 -07:00
Greg Price	6954be815a	closes bpo-37758: Extend unicodedata checksum tests to cover all of Unicode. (GH-15125) Unicode has grown since Python first gained support for it, when Unicode itself was still rather new. This pair of test cases was added in commit `6a20ee7de` back in 2000, and they haven't needed to change much since then. But do change them to look beyond the Basic Multilingual Plane (range(0x10000)) and cover all 17 planes of Unicode's final form. This adds about 5 seconds to the test suite's runtime. Mark the tests as CPU-using accordingly.	2019-09-12 10:25:25 +01:00
Greg Price	1ad0c776cb	bpo-38043: Move unicodedata.normalize tests into test_unicodedata. (GH-15712) Having these in a separate file from the one that's named after the module in the usual way makes it very easy to miss them when looking for tests for these two functions. (In fact when working recently on is_normalized, I'd been surprised to see no tests for it here and concluded the function had evaded being tested at all. I'd gone as far as to write up some tests myself before I spotted this other file.) Mostly this just means moving all the one file's code into the other, and moving code from the module toplevel to inside the test class to keep it tidily separate from the rest of the file's code. There's one substantive change, which reduces by a bit the amount of code to be moved: we drop the `x > sys.maxunicode` conditional and all the `RangeError` logic behind it. Now if that condition ever occurs it will cause an error at `chr(x)`, and a test failure. That's the right result because, since PEP 393 in Python 3.3, there is no longer such a thing as an "unsupported character".	2019-09-10 10:29:26 +01:00
Greg Price	2f09413947	closes bpo-37966: Fully implement the UAX #15 quick-check algorithm. (GH-15558) The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX #15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop This restores a small optimization that the original version of this code had for the `unicodedata.normalize` use case. With this, that case is actually faster than in master! $ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 561 usec per loop $ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 512 usec per loop	2019-09-03 19:45:44 -07:00
Greg Price	def97c988b	bpo-37758: Clean out vestigial script-bits from test_unicodedata. (GH-15126) This file started life as a script, before conversion to a `unittest` test file. Clear out some legacies of that conversion that are a bit confusing about how it works. Most notably, it's unlikely there's still a good reason to try to recover from `unicodedata` failing to import -- as there was when that logic was first added, when the module was very new. So take that out entirely. Keep `self.db` working, though, to avoid a noisy diff.	2019-08-12 22:58:01 -07:00
Benjamin Peterson	3aca40d3cb	closes bpo-36861: Update Unicode database to 12.1.0. (GH-13214) Adds ㋿.	2019-05-08 20:59:35 -07:00
Benjamin Peterson	738c19f4c5	closes bpo-33376: Update to Unicode 12.0.0. (GH-12256)	2019-03-09 16:25:55 -08:00
Wonsup Yoon	d134809cd3	bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (GH-1958) Hangul composition check boundaries are wrong for the second character ([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3) instead of [0x11A7, 0x11C3]).	2018-06-15 20:03:14 +08:00
Benjamin Peterson	7c69c1c0fb	update to Unicode 11.0.0 (closes bpo-33778) (GH-7439) Also, standardize indentation of generated tables.	2018-06-06 20:14:28 -07:00
Benjamin Peterson	279a96206f	bpo-30736: upgrade to Unicode 10.0 (#2344 ) Straightforward. While we're at it, though, strip trailing whitespace from generated tables.	2017-06-22 22:31:08 -07:00
Benjamin Peterson	6775231597	Unicode 9.0.0 Not completely mechanical since support for East Asian Width changes—emoji codepoints became Wide—had to be added to unicodedata.	2016-09-14 23:53:47 -07:00
Berker Peksag	33a7fcc066	Issue #23981 : Update test_unicodedata to use script_helpers Patch by Christie.	2015-10-22 03:29:10 +03:00
Benjamin Peterson	4801383c29	upgrade to Unicode 8.0.0	2015-06-27 15:45:56 -05:00
Zachary Ware	38c707e7e0	Issue #21741 : Update 147 test modules to use test discovery. I have compared output between pre- and post-patch runs of these tests to make sure there's nothing missing and nothing broken, on both Windows and Linux. The only differences I found were actually tests that were previously not run.	2015-04-13 15:00:43 -05:00
Benjamin Peterson	96baaae46f	for some reason, you don't get the right checksum from an incremental build	2014-07-06 22:07:08 -07:00
Benjamin Peterson	3032ed7cb1	upgrade to unicode 7.0.0	2014-07-06 13:04:20 -07:00
Antoine Pitrou	73abc527eb	Fix expected checksum for new unicodedata (after full rebuild)	2013-10-11 21:40:55 +02:00
Benjamin Peterson	94d08d908b	upgrade unicode db to 6.3.0 (closes #19221 )	2013-10-10 17:24:45 -04:00
Benjamin Peterson	b8350f1c7d	upgrade to UCD 6.2	2012-09-29 13:47:39 -04:00
Benjamin Peterson	71f660e00f	update to Unicode 6.1	2012-02-20 22:24:29 -05:00
Benjamin Peterson	b2bf01d824	use full unicode mappings for upper/lower/title case (#12736 ) Also broaden the category of characters that count as lowercase/uppercase.	2012-01-11 18:17:06 -05:00
Ezio Melotti	f503673c4d	Move UCS4-specific tests with the "normal" tests.	2011-09-29 03:14:56 +03:00
Alexander Belopolsky	86f65d5dbb	Issue #10254 : Fixed a crash and a regression introduced by the implementation of PRI 29.	2010-12-23 02:27:37 +00:00
Ezio Melotti	b3aedd4862	#9424 : Replace deprecated assert* methods in the Python test suite.	2010-11-20 19:04:17 +00:00
Antoine Pitrou	849e12bfe9	Fix resource warning in test_unicodedata. Patch by Brian Brazil.	2010-10-30 14:24:33 +00:00
Martin v. Löwis	baecd7243a	Upgrade to Unicode 6.0.0. makeunicodedata.py: download all data files from unicode.org, switch to extracting Unihan data from zip file. Read linebreakprops and derivednormalizationprops even for old versions, even though they are not used in delta records. test:unicode.py: U+11000 is now assigned, use U+14000 instead.	2010-10-11 22:42:28 +00:00
Amaury Forgeot d'Arc	7e44b6b0c5	Add more tests to unicodedata with large code points (the other functions where not affected by the recent change)	2010-08-18 22:07:15 +00:00
Amaury Forgeot d'Arc	56ab01b66a	Fix stupid typo in test.	2010-08-18 21:12:52 +00:00
Amaury Forgeot d'Arc	324ac65ceb	#5127 : Even on narrow unicode builds, the C functions that access the Unicode Database (Py_UNICODE_TOLOWER, Py_UNICODE_ISDECIMAL, and others) now accept and return characters from the full Unicode range (Py_UCS4). The differences from Python code are few: - unicodedata.numeric(), unicodedata.decimal() and unicodedata.digit() now return the correct value for large code points - repr() may consider more characters as printable.	2010-08-18 20:44:58 +00:00
Mark Dickinson	388122d43b	Issue #9337 : Make float.__str__ identical to float.__repr__. (And similarly for complex numbers.)	2010-08-04 20:56:28 +00:00
Florent Xicluna	806d8cf0e8	Merged revisions 79494,79496 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r79494 \| florent.xicluna \| 2010-03-30 10:24:06 +0200 (mar, 30 mar 2010) \| 2 lines #7643: Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according to Unicode Standard Annex #14. ........ r79496 \| florent.xicluna \| 2010-03-30 18:29:03 +0200 (mar, 30 mar 2010) \| 2 lines Highlight the change of behavior related to r79494. Now VT and FF are linebreaks. ........	2010-03-30 19:34:18 +00:00
Florent Xicluna	faa663f03d	Fixed a failure in test_bigmem. Merged revision 79059 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r79059 \| florent.xicluna \| 2010-03-18 22:50:06 +0100 (jeu, 18 mar 2010) \| 2 lines Issue #8024: Update the Unicode database to 5.2 ........	2010-03-19 13:37:08 +00:00
Florent Xicluna	f1789dee30	Revert Unicode UCD 5.2 upgrade in 3.x. It broke repr() for unicode objects, and gave failures in test_bigmem. Revert 79062, 79065 and 79083.	2010-03-19 01:17:46 +00:00
Florent Xicluna	0106250f0d	Fix bad unicodedata checksum merge from trunk in r79062	2010-03-19 00:03:01 +00:00
Florent Xicluna	657de43f97	Merged revisions 79059 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r79059 \| florent.xicluna \| 2010-03-18 22:50:06 +0100 (jeu, 18 mar 2010) \| 2 lines Issue #8024: Update the Unicode database to 5.2 ........	2010-03-18 22:11:01 +00:00
Victor Stinner	931bb02d96	oops, fix the test of my previous commit about unicodedata and PR #29 (r78647)	2010-03-04 12:47:32 +00:00
Victor Stinner	7ed9c4c910	Merged revisions 78646 via svnmerge from svn+ssh://pythondev@svn.python.org/python/trunk ........ r78646 \| victor.stinner \| 2010-03-04 13:09:33 +0100 (jeu., 04 mars 2010) \| 5 lines Issue #1054943: Fix unicodedata.normalize('NFC', text) for the Public Review Issue #29. PR #29 was released in february 2004! ........	2010-03-04 12:14:57 +00:00

1 2

85 Commits