cpython

Commit Graph

Author	SHA1	Message	Date
Ezio Melotti	e57e50c8e7	Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629. 1) #8271: when a byte sequence is invalid, only the start byte and all the valid continuation bytes are now replaced by U+FFFD, instead of replacing the number of bytes specified by the start byte. See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95); 2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes in behavior); 3) Add code and tests to reject surrogates (U+D800-U+DFFF) as defined in RFC 3629, but leave it commented out since it's not backward compatible; 4) Change the error messages "unexpected code byte" to "invalid start byte" and "invalid data" to "invalid continuation byte"; 5) Add an extensive set of tests in test_unicode; 6) Fix test_codeccallbacks because it was failing after this change.	2010-06-05 17:51:07 +00:00
Mark Dickinson	3e4caeb3bf	Issue #5341 : Fix a variety of spelling errors.	2009-02-21 20:27:01 +00:00
Benjamin Peterson	910f216260	fix test that wasn't working as expected #4990	2009-01-18 21:11:38 +00:00
Fred Drake	d995e1150c	revert creation of the html.entities and html.parser modules (http://bugs.python.org/issue2882)	2008-05-20 06:08:38 +00:00
Fred Drake	cb51d84214	update references and documentation for modules in the new html package (http://bugs.python.org/issue2882)	2008-05-17 21:14:05 +00:00
Walter Dörwald	6e39080649	Backport r57105 and r57145 from the py3k branch: UTF-32 codecs.	2007-08-17 16:41:28 +00:00
Richard Jones	7b9558d37d	Conversion of exceptions over from faked-up classes to new-style C types.	2006-05-27 12:29:24 +00:00
Walter Dörwald	690402ff17	Add tests to increase code coverage in Python/codecs.c and Python/exceptions.c.	2005-11-17 18:51:34 +00:00
Walter Dörwald	e22d339dc5	Add tests for various error cases and for readbuffer_encode() and charbuffer_encode(). This increases code coverage in Modules/_codecsmodule.c from 83% to 95%.	2005-11-17 08:52:34 +00:00
Walter Dörwald	a47d1c08d0	SF bug #1251300 : On UCS-4 builds the "unicode-internal" codec will now complain about illegal code points. The codec now supports PEP 293 style error handlers. (This is a variant of the Nik Haldimann's patch that detects truncated data)	2005-08-30 10:23:14 +00:00
Walter Dörwald	29ddfba3d8	Fix copy & paste error in comments.	2004-12-14 21:28:07 +00:00
Tim Peters	58eb11cf62	Whitespace normalization.	2004-01-18 20:29:55 +00:00
Walter Dörwald	4894c30626	Fix a bug in the memory reallocation code of PyUnicode_TranslateCharmap(). charmaptranslate_makespace() allocated more memory than required for the next replacement but didn't remember that fact, so memory size was growing exponentially every time a replacement string is longer that one character. This fixes SF bug #828737.	2003-10-24 14:25:28 +00:00
Walter Dörwald	a54b92b2eb	Add a unicode prefix to the characters in the UnicodeEncodeError and UnicodeTranslateError message.	2003-08-12 17:34:49 +00:00
Walter Dörwald	fd196bd263	Enhance message for UnicodeEncodeError and UnicodeTranslateError. If there is only one bad character it will now be printed in a form that is a valid Python string.	2003-08-12 17:32:43 +00:00
Walter Dörwald	21d3a32b99	Combine the functionality of test_support.run_unittest() and test_support.run_classtests() into run_unittest() and use it wherever possible. Also don't use "from test.test_support import ...", but "from test import test_support" in a few spots. From SF patch #662807.	2003-05-01 17:45:56 +00:00
Walter Dörwald	1b0be2d4c6	Use the new htmlentitydefs.codepoint2name for test_xmlcharnamereplace()	2003-04-29 20:59:55 +00:00
Tim Peters	f2715e0764	Whitespace normalization.	2003-02-19 02:35:07 +00:00
Walter Dörwald	2e0b18af30	Change the treatment of positions returned by PEP293 error handers in the Unicode codecs: Negative positions are treated as being relative to the end of the input and out of bounds positions result in an IndexError. Also update the PEP and include an explanation of this in the documentation for codecs.register_error. Fixes a small bug in iconv_codecs: if the position from the callback is negative add it to the size instead of substracting it. From SF patch #677429.	2003-01-31 17:19:08 +00:00
Walter Dörwald	ea4250df7d	Add comments and remove duplicate tests.	2003-01-20 02:34:07 +00:00
Walter Dörwald	0cb27dd023	Make the test scripts work again with narrow Python builds.	2003-01-09 11:38:50 +00:00
Walter Dörwald	30537a46ac	Add a few test cases to increase code coverage: From: 69.73% of 294 source lines executed in file ./Modules/_codecsmodule.c 79.47% of 487 source lines executed in file Python/codecs.c 78.45% of 3643 source lines executed in file Objects/unicodeobject.c To: 70.41% of 294 source lines executed in file ./Modules/_codecsmodule.c 82.75% of 487 source lines executed in file Python/codecs.c 80.76% of 3638 source lines executed in file Objects/unicodeobject.c This actually unearthed a bug in the handling of None values in PyUnicode_EncodeCharmap.	2003-01-08 23:22:13 +00:00
Walter Dörwald	00445d2393	Fix typo in comment.	2002-11-25 17:58:02 +00:00
Martin v. Löwis	74a530d42d	Update character names.	2002-11-23 19:41:01 +00:00
Tim Peters	3de75266aa	Whitespace normalization.	2002-11-09 05:26:15 +00:00
Walter Dörwald	9ab7dd4d5b	Add a test case that checks that the proper exception is raises when the replacement from an encoding error callback is itself unencodable.	2002-09-06 17:21:40 +00:00
Walter Dörwald	3aeb632c31	PEP 293 implemention (from SF patch http://www.python.org/sf/432401 )	2002-09-02 13:14:32 +00:00

27 Commits