cpython

Commit Graph

Author	SHA1	Message	Date
Greg Price	6bccbe7dfb	bpo-36502: Correct documentation of str.isspace() (GH-15019) The documented definition was much broader than the real one: there are tons of characters with general category "Other", and we don't (and shouldn't) treat most of them as whitespace. Rewrite the definition to agree with the comment on _PyUnicode_IsWhitespace, and with the logic in makeunicodedata.py, which is what generates that function and so ultimately governs. Add suitable breadcrumbs so that a reader who wants to pin down exactly what this definition means (what's a "bidirectional class" of "B"?) can do so. The `unicodedata` module documentation is an appropriate central place for our references to Unicode's own copious documentation, so point there. Also add to the isspace() test a thorough check that the implementation agrees with the intended definition.	2019-08-14 13:05:19 +02:00
Hai Shi	5623ac87bb	bpo-37476: Adding tests for asutf8 and asutf8andsize (GH-14531)	2019-07-20 15:56:23 +08:00
Victor Stinner	22eb689cf3	bpo-37388: Development mode check encoding and errors (GH-14341) In development mode and in debug build, encoding and errors arguments are now checked on string encoding and decoding operations. Examples: open(), str.encode() and bytes.decode(). By default, for best performances, the errors argument is only checked at the first encoding/decoding error, and the encoding argument is sometimes ignored for empty strings.	2019-06-26 00:51:05 +02:00
Kingsley M	b015fc86f7	bpo-36549: str.capitalize now titlecases the first character instead of uppercasing it (GH-12804)	2019-04-12 08:35:39 -07:00
Inada Naoki	6a16b18224	bpo-36297: remove "unicode_internal" codec (GH-12342)	2019-03-18 15:44:11 +09:00
Serhiy Storchaka	44cc4822bb	bpo-33817: Fix _PyBytes_Resize() for empty bytes object. (GH-11516) Add also tests for PyUnicode_FromFormat() and PyBytes_FromFormat() with empty result.	2019-01-12 09:22:29 +02:00
Victor Stinner	998b806366	Revert "bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)" (GH-9187) This reverts commit `886483e2b9`.	2018-09-12 00:23:25 +02:00
Victor Stinner	886483e2b9	bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080) * Add %T format to PyUnicode_FromFormatV(), and so to PyUnicode_FromFormat() and PyErr_Format(), to format an object type name: equivalent to "%s" with Py_TYPE(obj)->tp_name. * Replace Py_TYPE(obj)->tp_name with %T format in unicodeobject.c. * Add unit test on %T format. * Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(), to make the intent more explicit.	2018-09-07 18:00:58 +02:00
Zackery Spytz	e349bf2358	bpo-22602: Raise an exception in the UTF-7 decoder for ill-formed sequences starting with "+". (GH-8741) The UTF-7 decoder now raises UnicodeDecodeError for ill-formed sequences starting with "+" (as specified in RFC 2152).	2018-08-19 07:43:38 +03:00
INADA Naoki	a49ac99029	bpo-32677: Add .isascii() to str, bytes and bytearray (GH-5342)	2018-01-27 14:06:21 +09:00
Serhiy Storchaka	9b6c60cbce	bpo-31979: Simplify transforming decimals to ASCII (#4336 ) in int(), float() and complex() parsers. This also speeds up parsing non-ASCII numbers by around 20%.	2017-11-13 21:23:48 +02:00
Serhiy Storchaka	5075416b8f	bpo-30978: str.format_map() now passes key lookup exceptions through. (#2790 ) Previously any exception was replaced with a KeyError exception.	2017-08-03 11:45:23 +03:00
Victor Stinner	d6debb24e0	bpo-29919: Remove unused imports found by pyflakes (#137 ) Make also minor PEP8 coding style fixes on modified imports.	2017-03-27 16:05:26 +02:00
Martijn Pieters	d7e64337ef	bpo-28598: Support __rmod__ for RHS subclasses of str in % string formatting operations (#51 ) When you use `'%s' % SubClassOfStr()`, where `SubClassOfStr.__rmod__` exists, the reverse operation is ignored as normally such string formatting operations use the `PyUnicode_Format()` fast path. This patch tests for subclasses of `str` first and picks the slow path in that case. Patch by Martijn Pieters.	2017-02-23 15:38:04 +02:00
Martin Panter	5644729aa6	Issue #29145 : Merge test from 3.6	2017-01-14 06:29:32 +00:00
Martin Panter	758c7d044b	Merge tests from 3.5	2017-01-14 06:26:51 +00:00
Martin Panter	b71c0956d0	Issues #1621 , #29145 : Test for str.join() overflow	2017-01-12 11:54:59 +00:00
Serhiy Storchaka	8cbd3df3ce	Issue #28992 : Use bytes.fromhex().	2016-12-21 12:59:28 +02:00
Xiang Zhang	b211068f5c	Issue #28822 : Adjust indices handling of PyUnicode_FindChar().	2016-12-20 22:52:33 +08:00
Martin Panter	fff07e34fa	Merge spelling and grammar from 3.5	2016-12-18 05:37:21 +00:00
Martin Panter	2f9171d900	Fix spelling and grammar in code comments and documentation	2016-12-18 01:23:09 +00:00
Eric V. Smith	5646648678	Issue 28128: Print out better error/warning messages for invalid string escapes. Backport to 3.6.	2016-10-31 14:46:26 -04:00
Serhiy Storchaka	21d9f10c94	Merge from 3.5.	2016-10-08 22:46:01 +03:00
Serhiy Storchaka	9c0e1f83af	Issue #28379 : Added sanity checks and tests for PyUnicode_CopyCharacters(). Patch by Xiang Zhang.	2016-10-08 22:45:38 +03:00
Serhiy Storchaka	0a6ef790e4	test_invalid_sequences seems don't have to stay in CAPITest. Reported by Xiang Zhang.	2016-10-02 21:59:44 +03:00
Serhiy Storchaka	b3648576cd	Issue #28295 : Fixed the documentation and added tests for PyUnicode_AsUCS4(). Original patch by Xiang Zhang.	2016-10-02 21:30:35 +03:00
Serhiy Storchaka	cc164232aa	Issue #28295 : Fixed the documentation and added tests for PyUnicode_AsUCS4(). Original patch by Xiang Zhang.	2016-10-02 21:29:26 +03:00
Serhiy Storchaka	1edebef724	Moved Unicode C API related tests to separate test class.	2016-10-02 21:18:14 +03:00
Serhiy Storchaka	63b5b6fd45	Moved Unicode C API related tests to separate test class.	2016-10-02 21:16:38 +03:00
R David Murray	110b6fecbb	#27364 : Deprecate invalid escape strings in str/byutes. Patch by Emanuel Barry, reviewed by Serhiy Storchaka and Martin Panter.	2016-09-08 15:34:08 -04:00
R David Murray	44b548dda8	#27364 : fix "incorrect" uses of escape character in the stdlib. And most of the tools. Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and Martin Panter.	2016-09-08 13:59:53 -04:00
Guido van Rossum	97c1adf393	Anti-registration of various ABC methods. - Issue #25958: Support "anti-registration" of special methods from various ABCs, like __hash__, __iter__ or __len__. All these (and several more) can be set to None in an implementation class and the behavior will be as if the method is not defined at all. (Previously, this mechanism existed only for __hash__, to make mutable classes unhashable.) Code contributed by Andrew Barnert and Ivan Levkivskyi.	2016-08-18 09:22:23 -07:00
Martin Panter	6245cb3c01	Correct “an” → “a” with “Unicode”, “user”, “UTF”, etc This affects documentation, code comments, and a debugging messages.	2016-04-15 02:14:19 +00:00
Martin Panter	0d0db6cc1e	Issue #26712 : Unify (r)split, (l/r)strip tests into string_tests This eliminates a few redundant test cases.	2016-04-10 08:45:26 +00:00
Martin Panter	152a19c6bd	Issue #26257 : Eliminate buffer_tests.py and fix ByteArrayAsStringTest ByteArrayAsStringTest.fixtype() was converting test data to bytes, not byte- array, therefore many of the test cases inherited in this class were not actually being run on the bytearray type. The tests in buffer_tests.py were redundant with methods in string_tests .MixinStrUnicodeUserStringTest and string_tests.CommonTest. These methods are now moved into string_tests.BaseTest, where they will also get run for bytes and bytearray. This change also moves test_additional_split(), test_additional_rsplit(), and test_strip() from CommonTest to BaseTest, meaning these tests are now run for bytes and bytearray. I plan to eliminate redundancies with existing tests in test_bytes.py soon.	2016-04-06 06:37:17 +00:00
Serhiy Storchaka	fbb1c5ee06	Issue #26494 : Fixed crash on iterating exhausting iterators. Affected classes are generic sequence iterators, iterators of str, bytes, bytearray, list, tuple, set, frozenset, dict, OrderedDict, corresponding views and os.scandir() iterator.	2016-03-30 20:40:02 +03:00
Victor Stinner	337986740f	Issue #26464 : Fix unicode_fast_translate() again Initialize i variable if the string is non-ASCII.	2016-03-01 21:59:58 +01:00
Victor Stinner	6c9aa8f2bf	Fix str.translate() Issue #26464: Fix str.translate() when string is ASCII and first replacements removes character, but next replacement uses a non-ASCII character or a string longer than 1 character. Regression introduced in Python 3.5.0.	2016-03-01 21:30:30 +01:00
Serhiy Storchaka	6648bf5661	Issue #25709 : Fixed problem with in-place string concatenation and utf-8 cache.	2015-12-03 01:04:37 +02:00
Serhiy Storchaka	7aa690860e	Issue #25709 : Fixed problem with in-place string concatenation and utf-8 cache.	2015-12-03 01:02:03 +02:00
Serhiy Storchaka	f9afda57ad	Issue #24731 : Fixed crash on converting objects with special methods __bytes__, __trunc__, and __float__ returning instances of subclasses of bytes, int, and float to subclasses of bytes, int, and float correspondingly.	2015-11-25 15:52:04 +02:00
Serhiy Storchaka	15095800a3	Issue #24731 : Fixed crash on converting objects with special methods __bytes__, __trunc__, and __float__ returning instances of subclasses of bytes, int, and float to subclasses of bytes, int, and float correspondingly.	2015-11-25 15:47:01 +02:00
Serhiy Storchaka	411dfd871c	Issue #22643 : Skip test_case_operation_overflow on computers with low memory.	2015-11-07 16:54:48 +02:00
Serhiy Storchaka	3d717d05de	Issue #22643 : Skip test_case_operation_overflow on computers with low memory.	2015-11-07 16:55:16 +02:00
Serhiy Storchaka	58c8f2bb6d	Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data: 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate. 3. In some circumstances the '\xfd' character was produced instead of the replacement character '\ufffd' (due to a bug in _PyUnicodeWriter).	2015-10-02 13:13:14 +03:00
Serhiy Storchaka	28b21e50c8	Issue #24848 : Fixed bugs in UTF-7 decoding of misformed data: 1. Non-ASCII bytes were accepted after shift sequence. 2. A low surrogate could be emitted in case of error in high surrogate.	2015-10-02 13:07:28 +03:00
Serhiy Storchaka	f0eeedf0d8	Issue #22681 : Added support for the koi8_t encoding.	2015-05-12 23:24:19 +03:00
Serhiy Storchaka	ad8a1c3fb2	Issue #22682 : Added support for the kz1048 encoding.	2015-05-12 23:16:55 +03:00
Serhiy Storchaka	1b74d630da	Added explicit tests for issue #23803 .	2015-03-29 19:23:27 +03:00
Serhiy Storchaka	48070c1248	Issue #23803 : Fixed str.partition() and str.rpartition() when a separator is wider then partitioned string.	2015-03-29 19:21:02 +03:00

1 2 3 4 5 ...

364 Commits