Commit Graph

26 Commits

Author SHA1 Message Date
Ezio Melotti 86e5e17bda Merged revisions 81758-81759 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r81758 | ezio.melotti | 2010-06-05 20:51:07 +0300 (Sat, 05 Jun 2010) | 15 lines

  Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.

  1) #8271: when a byte sequence is invalid, only the start byte and all the
     valid continuation bytes are now replaced by U+FFFD, instead of replacing
     the number of bytes specified by the start byte.
     See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
  2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
     in behavior);
  3) Add code and tests to reject surrogates (U+D800-U+DFFF) as defined in
     RFC 3629, but leave it commented out since it's not backward compatible;
  4) Change the error messages "unexpected code byte" to "invalid start byte"
     and "invalid data" to "invalid continuation byte";
  5) Add an extensive set of tests in test_unicode;
  6) Fix test_codeccallbacks because it was failing after this change.
........
  r81759 | ezio.melotti | 2010-06-05 22:21:32 +0300 (Sat, 05 Jun 2010) | 1 line

  Add a NEWS entry for r81758 and clarify a comment.
........
2010-07-03 05:34:39 +00:00
Jesus Cea 585ad8ae5e Merged revisions 69846 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r69846 | mark.dickinson | 2009-02-21 21:27:01 +0100 (Sat, 21 Feb 2009) | 2 lines

  Issue #5341: Fix a variety of spelling errors.
........
2009-07-02 15:37:21 +00:00
Fred Drake d995e1150c revert creation of the html.entities and html.parser modules
(http://bugs.python.org/issue2882)
2008-05-20 06:08:38 +00:00
Fred Drake cb51d84214 update references and documentation for modules in the new html package
(http://bugs.python.org/issue2882)
2008-05-17 21:14:05 +00:00
Walter Dörwald 6e39080649 Backport r57105 and r57145 from the py3k branch: UTF-32 codecs. 2007-08-17 16:41:28 +00:00
Richard Jones 7b9558d37d Conversion of exceptions over from faked-up classes to new-style C types. 2006-05-27 12:29:24 +00:00
Walter Dörwald 690402ff17 Add tests to increase code coverage in Python/codecs.c and Python/exceptions.c. 2005-11-17 18:51:34 +00:00
Walter Dörwald e22d339dc5 Add tests for various error cases and for readbuffer_encode() and
charbuffer_encode(). This increases code coverage in Modules/_codecsmodule.c
from 83% to 95%.
2005-11-17 08:52:34 +00:00
Walter Dörwald a47d1c08d0 SF bug #1251300: On UCS-4 builds the "unicode-internal" codec will now complain
about illegal code points. The codec now supports PEP 293 style error handlers.
(This is a variant of the Nik Haldimann's patch that detects truncated data)
2005-08-30 10:23:14 +00:00
Walter Dörwald 29ddfba3d8 Fix copy & paste error in comments. 2004-12-14 21:28:07 +00:00
Tim Peters 58eb11cf62 Whitespace normalization. 2004-01-18 20:29:55 +00:00
Walter Dörwald 4894c30626 Fix a bug in the memory reallocation code of PyUnicode_TranslateCharmap().
charmaptranslate_makespace() allocated more memory than required for the
next replacement but didn't remember that fact, so memory size was growing
exponentially every time a replacement string is longer that one character.
This fixes SF bug #828737.
2003-10-24 14:25:28 +00:00
Walter Dörwald a54b92b2eb Add a unicode prefix to the characters in the UnicodeEncodeError and
UnicodeTranslateError message.
2003-08-12 17:34:49 +00:00
Walter Dörwald fd196bd263 Enhance message for UnicodeEncodeError and UnicodeTranslateError.
If there is only one bad character it will now be printed in a
form that is a valid Python string.
2003-08-12 17:32:43 +00:00
Walter Dörwald 21d3a32b99 Combine the functionality of test_support.run_unittest()
and test_support.run_classtests() into run_unittest()
and use it wherever possible.

Also don't use "from test.test_support import ...", but
"from test import test_support" in a few spots.

From SF patch #662807.
2003-05-01 17:45:56 +00:00
Walter Dörwald 1b0be2d4c6 Use the new htmlentitydefs.codepoint2name for test_xmlcharnamereplace() 2003-04-29 20:59:55 +00:00
Tim Peters f2715e0764 Whitespace normalization. 2003-02-19 02:35:07 +00:00
Walter Dörwald 2e0b18af30 Change the treatment of positions returned by PEP293
error handers in the Unicode codecs: Negative
positions are treated as being relative to the end of
the input and out of bounds positions result in an
IndexError.

Also update the PEP and include an explanation of
this in the documentation for codecs.register_error.

Fixes a small bug in iconv_codecs: if the position
from the callback is negative *add* it to the size
instead of substracting it.

From SF patch #677429.
2003-01-31 17:19:08 +00:00
Walter Dörwald ea4250df7d Add comments and remove duplicate tests. 2003-01-20 02:34:07 +00:00
Walter Dörwald 0cb27dd023 Make the test scripts work again with narrow Python builds. 2003-01-09 11:38:50 +00:00
Walter Dörwald 30537a46ac Add a few test cases to increase code coverage:
From:
 69.73% of 294 source lines executed in file ./Modules/_codecsmodule.c
 79.47% of 487 source lines executed in file Python/codecs.c
 78.45% of 3643 source lines executed in file Objects/unicodeobject.c

To:
 70.41% of 294 source lines executed in file ./Modules/_codecsmodule.c
 82.75% of 487 source lines executed in file Python/codecs.c
 80.76% of 3638 source lines executed in file Objects/unicodeobject.c

This actually unearthed a bug in the handling of None
values in PyUnicode_EncodeCharmap.
2003-01-08 23:22:13 +00:00
Walter Dörwald 00445d2393 Fix typo in comment. 2002-11-25 17:58:02 +00:00
Martin v. Löwis 74a530d42d Update character names. 2002-11-23 19:41:01 +00:00
Tim Peters 3de75266aa Whitespace normalization. 2002-11-09 05:26:15 +00:00
Walter Dörwald 9ab7dd4d5b Add a test case that checks that the proper exception is raises
when the replacement from an encoding error callback is itself
unencodable.
2002-09-06 17:21:40 +00:00
Walter Dörwald 3aeb632c31 PEP 293 implemention (from SF patch http://www.python.org/sf/432401) 2002-09-02 13:14:32 +00:00