cpython

Commit Graph

Author	SHA1	Message	Date
Ezio Melotti	370d85cee4	Python 2 can encode/decode surrogates to utf-8. Add a test for this.	2011-02-28 01:42:29 +00:00
Antoine Pitrou	b27ddc72ea	Merged revisions 85861 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/py3k ........ r85861 \| antoine.pitrou \| 2010-10-27 20:52:48 +0200 (mer., 27 oct. 2010) \| 3 lines Recode modules from latin-1 to utf-8 ........	2010-10-27 18:58:04 +00:00
Florent Xicluna	c0c0b14671	Strengthen test_unicode with explicit type checking for assertEqual tests.	2010-09-13 08:53:00 +00:00
Florent Xicluna	60d512c3b0	Check PendingDeprecationWarning after issue #7994 .	2010-09-13 08:21:43 +00:00
Florent Xicluna	9b90cd1f7b	Merged revisions 84470-84471,84566-84567,84759 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/py3k ........ r84470 \| florent.xicluna \| 2010-09-03 22:00:37 +0200 (ven., 03 sept. 2010) \| 1 line Strengthen BytesWarning tests. ........ r84471 \| florent.xicluna \| 2010-09-03 22:23:40 +0200 (ven., 03 sept. 2010) \| 1 line Typo ........ r84566 \| florent.xicluna \| 2010-09-06 22:27:15 +0200 (lun., 06 sept. 2010) \| 1 line typo ........ r84567 \| florent.xicluna \| 2010-09-06 22:27:55 +0200 (lun., 06 sept. 2010) \| 1 line typo ........ r84759 \| florent.xicluna \| 2010-09-13 04:28:18 +0200 (lun., 13 sept. 2010) \| 1 line Reenable test_ucs4 and remove some duplicated lines. ........	2010-09-13 07:46:37 +00:00
Stefan Krah	0b9201fa1c	Sub-issue of #9036 : Fix incorrect use of Py_CHARMASK.	2010-07-19 18:06:46 +00:00
Benjamin Peterson	eabdeba25e	use unicode literals	2010-06-07 22:33:09 +00:00
Benjamin Peterson	13e934acc0	correctly overflow when indexes are too large	2010-06-07 22:23:23 +00:00
Ezio Melotti	ab2eb0ee84	Add a NEWS entry for r81758 and clarify a comment.	2010-06-05 19:21:32 +00:00
Ezio Melotti	e57e50c8e7	Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629. 1) #8271: when a byte sequence is invalid, only the start byte and all the valid continuation bytes are now replaced by U+FFFD, instead of replacing the number of bytes specified by the start byte. See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95); 2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes in behavior); 3) Add code and tests to reject surrogates (U+D800-U+DFFF) as defined in RFC 3629, but leave it commented out since it's not backward compatible; 4) Change the error messages "unexpected code byte" to "invalid start byte" and "invalid data" to "invalid continuation byte"; 5) Add an extensive set of tests in test_unicode; 6) Fix test_codeccallbacks because it was failing after this change.	2010-06-05 17:51:07 +00:00
Georg Brandl	f0757a2937	#8016 : add the CP858 codec (approved by Benjamin). (Also add CP720 to the tests, it was missing there.)	2010-05-24 21:29:07 +00:00
Victor Stinner	c7790ed163	Fix the NEWS about my last commit: an unicode subclass can now override the __unicode__ method (and not the __str__ method). Simplify also the testcase.	2010-03-22 12:36:28 +00:00
Victor Stinner	95affc4449	Issue #1583863 : An unicode subclass can now override the __str__ method	2010-03-22 12:24:37 +00:00
Florent Xicluna	6de9e938a5	Issue #7849 : Now the utility ``check_warnings`` verifies if the warnings are effectively raised. A new utility ``check_py3k_warnings`` deals with py3k warnings.	2010-03-07 12:18:33 +00:00
Victor Stinner	f20f9c299e	Issue #7649 : Fix u'%c' % char for character in range 0x80..0xFF => raise an UnicodeDecodeError. Patch written by Ezio Melotti.	2010-02-23 23:16:07 +00:00
Ezio Melotti	aa98058cc4	use assert[Not]In where appropriate	2010-01-23 23:04:36 +00:00
Antoine Pitrou	5b7139aab4	Issue #7462 : Implement the stringlib fast search algorithm for the `rfind`, `rindex`, `rsplit` and `rpartition` methods. Patch by Florent Xicluna.	2010-01-02 21:12:58 +00:00
R. David Murray	0a0a1a842c	Issue #1680159 : unicode coercion during an 'in' operation was masking any errors that might occur during coercion of the left operand and turning them into a TypeError with a message text that was confusing in the given context. This patch lets any errors through, as was already done during coercion of the right hand side.	2009-12-14 16:28:26 +00:00
Benjamin Peterson	332d721750	add keyword arguments support to str/unicode encode and decode #6300	2009-09-18 21:14:55 +00:00
Benjamin Peterson	5c8da86f3a	convert usage of fail* to assert*	2009-06-30 22:57:08 +00:00
Eric Smith	4b94b192ff	Issue 6089: str.format raises SystemError.	2009-05-23 13:56:13 +00:00
Antoine Pitrou	653dece278	Issue #4426 : The UTF-7 decoder was too strict and didn't accept some legal sequences. Patch by Nick Barnes and Victor Stinner.	2009-05-04 18:32:32 +00:00
Eric Smith	2ace4cf813	Unicode format tests weren't actually testing unicode. This was probably due to the original backport from py3k.	2009-03-14 14:37:38 +00:00
Eric Smith	6f42edb682	Issue 5237, Allow auto-numbered replacement fields in str.format() strings. For simple uses for str.format(), this makes the typing easier. Hopfully this will help in the adoption of str.format(). For example: 'The {} is {}'.format('sky', 'blue') You can mix and matcth auto-numbering and named replacement fields: 'The {} is {color}'.format('sky', color='blue') But you can't mix and match auto-numbering and specified numbering: 'The {0} is {}'.format('sky', 'blue') ValueError: cannot switch from manual field specification to automatic field numbering Will port to 3.1.	2009-03-14 11:57:26 +00:00
Antoine Pitrou	187ac1bda4	#3601 : test_unicode.test_raiseMemError fails in UCS4 Reviewed by Benjamin Peterson on IRC.	2008-09-05 22:04:54 +00:00
Antoine Pitrou	fd7c43e7be	#3556 : test_raiseMemError consumes an insane amount of memory	2008-08-17 17:01:49 +00:00
Amaury Forgeot d'Arc	06847b13ca	Correct a crash when two successive unicode allocations fail with a MemoryError: the freelist contained half-initialized objects with freed pointers. The comment /* XXX UNREF/NEWREF interface should be more symmetrical */ was copied from tupleobject.c, and appears in some other places. I sign the petition.	2008-07-31 23:39:05 +00:00
Antoine Pitrou	4982d5d04a	#2242 : utf7 decoding crashes on bogus input on some Windows/MSVC versions	2008-07-25 17:45:59 +00:00
Amaury Forgeot d'Arc	9a0d3462fc	#1477 : ur'\U0010FFFF' raised in narrow unicode builds. Corrected the raw-unicode-escape codec to use UTF-16 surrogates in this case, just like the unicode-escape codec.	2008-03-23 09:55:29 +00:00
Christian Heimes	c5f05e45cf	Patch #2167 from calvin: Remove unused imports	2008-02-23 17:40:11 +00:00
Eric Smith	bc32fee029	Added code to correct combining str and unicode in ''.format(). Added test case.	2008-02-18 18:02:34 +00:00
Eric Smith	a9f7d62480	Backport of PEP 3101, Advanced String Formatting, from py3k. Highlights: - Adding PyObject_Format. - Adding string.Format class. - Adding __format__ for str, unicode, int, long, float, datetime. - Adding builtin format. - Adding ''.format and u''.format. - str/unicode fixups for formatters. The files in Objects/stringlib that implement PEP 3101 (stringdefs.h, unicodedefs.h, formatter.h, string_format.h) are identical in trunk and py3k. Any changes from here on should be made to trunk, and changes will propogate to py3k).	2008-02-17 19:46:49 +00:00
Kurt B. Kaiser	db98f3632a	Fix failing unicode test caused by change to ast.c at r56441	2007-07-18 19:58:42 +00:00
Neal Norwitz	ba965deea8	Prevent these tests from running on Win64 since they don\'t apply there either	2007-06-11 02:14:39 +00:00
Neal Norwitz	7dbd2a3720	Prevent expandtabs() on string and unicode objects from causing a segfault when a large width is passed on 32-bit platforms. Found by Google. It would be good for people to review this especially carefully and verify I don't have an off by one error and there is no other way to cause overflow.	2007-06-09 03:36:34 +00:00
Collin Winter	c2898c5a67	Standardize on test.test_support.run_unittest() (as opposed to a mix of run_unittest() and run_suite()). Also, add functionality to run_unittest() that admits usage of unittest.TestLoader.loadTestsFromModule().	2007-04-25 17:29:52 +00:00
Neal Norwitz	17753ecbfa	Patch #1541585 : fix buffer overrun when performing repr() on a unicode string in a build with wide unicode (UCS-4) support. This code could be improved, so add an XXX comment.	2006-08-21 22:21:19 +00:00
Tim Peters	4511a713d5	Whitespace normalization.	2006-05-03 04:46:14 +00:00
Georg Brandl	de9b624fb9	Bug #1473625 : stop cPickle making float dumps locale dependent in protocol 0. On the way, add a decorator to test_support to facilitate running single test functions in different locales with automatic cleanup.	2006-04-30 11:13:56 +00:00
Anthony Baxter	67b6d516ce	Fixed bug #1459029 - unicode reprs were double-escaped.	2006-03-30 10:54:07 +00:00
Georg Brandl	da6b107745	Checkin the test of patch #1400181 .	2006-01-20 17:48:54 +00:00
Hye-Shik Chang	835b243c71	Bug #1379994 : Fix *unicode_escape codecs to encode r'\' as r'\\' just like string codecs.	2005-12-17 04:38:31 +00:00
Neal Norwitz	430f68b447	Move registration of the codec search function to the module scope so it is only executed once. Otherwise the same search function is repeated added to the codec search path when regrtest is run with -R and leaks are reported.	2005-11-24 22:00:56 +00:00
Neil Schemenauer	cf52c07843	Change the %s format specifier for str objects so that it returns a unicode instance if the argument is not an instance of basestring and calling __str__ on the argument returns a unicode instance.	2005-08-12 17:34:58 +00:00
Brett Cannon	c3647ac93e	Make subclasses of int, long, complex, float, and unicode perform type conversion using the proper magic slot (e.g., __int__()). Also move conversion code out of PyNumber_() functions in the C API into the nb_ function. Applied patch #1109424. Thanks Walter Doewald.	2005-04-26 03:45:26 +00:00
Walter Dörwald	57d88e5abd	Move test_bug1001011() to string_tests.MixinStrUnicodeTest so that it can be used for str and unicode. Drop the test for "".join([s]) is s because this is an implementation detail (and doesn't work for unicode)	2004-08-26 16:53:04 +00:00
Hye-Shik Chang	e9ddfbb412	SF #989185 : Drop unicode.iswide() and unicode.width() and add unicodedata.east_asian_width(). You can still implement your own simple width() function using it like this: def width(u): w = 0 for c in unicodedata.normalize('NFC', u): cwidth = unicodedata.east_asian_width(c) if cwidth in ('W', 'F'): w += 2 else: w += 1 return w	2004-08-04 07:38:35 +00:00
Marc-André Lemburg	d25c650461	Let u'%s' % obj try obj.__unicode__() first and fallback to obj.__str__().	2004-07-23 16:13:25 +00:00
Hye-Shik Chang	3c145449da	Reuse width/iswide tests from strings_test. (Suggested by Walter DÃ¶rwald)	2004-06-04 04:24:54 +00:00
Hye-Shik Chang	7bd860655f	Fix typo.	2004-06-04 03:19:17 +00:00

1 2 3

138 Commits