Commit Graph

82 Commits

Author SHA1 Message Date
Nick Coghlan c72e4e6dcc Issue #19619: Blacklist non-text codecs in method API
str.encode, bytes.decode and bytearray.decode now use an
internal API to throw LookupError for known non-text encodings,
rather than attempting the encoding or decoding operation and
then throwing a TypeError for an unexpected output type.

The latter mechanism remains in place for third party non-text
encodings.
2013-11-22 22:39:36 +10:00
Serhiy Storchaka 58cf607d13 Issue #12892: The utf-16* and utf-32* codecs now reject (lone) surrogates.
The utf-16* and utf-32* encoders no longer allow surrogate code points
(U+D800-U+DFFF) to be encoded.
The utf-32* decoders no longer decode byte sequences that correspond to
surrogate code points.
The surrogatepass error handler now works with the utf-16* and utf-32* codecs.

Based on patches by Victor Stinner and Kang-Hao (Kenny) Lu.
2013-11-19 11:32:41 +02:00
Nick Coghlan c4c2580d43 Close 19609: narrow scope of codec exc chaining 2013-11-15 21:47:37 +10:00
Nick Coghlan 8b097b4ed7 Close #17828: better handling of codec errors
- output type errors now redirect users to the type-neutral
  convenience functions in the codecs module
- stateless errors that occur during encoding and decoding
  will now be automatically wrapped in exceptions that give
  the name of the codec involved
2013-11-13 23:49:21 +10:00
Serhiy Storchaka c679227e31 Issue #1772673: The type of `char*` arguments now changed to `const char*`. 2013-10-19 21:03:34 +03:00
Antoine Pitrou 9ed5f27266 Issue #18722: Remove uses of the "register" keyword in C code. 2013-08-13 20:18:52 +02:00
Victor Stinner cc35159ed8 Issue #18408: normalizestring() now raises MemoryError on memory allocation failure 2013-07-12 00:02:55 +02:00
Andrew Svetlov 3ba3a3ee56 Issue #15422: get rid of PyCFunction_New macro 2012-12-25 13:32:35 +02:00
Ezio Melotti 1e58ae44df #16336: merge with 3.3. 2012-11-03 23:05:18 +02:00
Ezio Melotti a0b5c46fa2 #16336: merge with 3.2. 2012-11-03 23:04:41 +02:00
Ezio Melotti 540da76115 #16336: fix input checking in the surrogatepass error handler. Patch by Serhiy Storchaka. 2012-11-03 23:03:39 +02:00
Victor Stinner 76df43de30 Issue #16330: Use surrogate-related macros
Patch written by Serhiy Storchaka.
2012-10-30 01:42:39 +01:00
Philip Jenvey 5f9459fbed merge with 3.2 2012-10-26 17:05:09 -07:00
Philip Jenvey 45c41494bf bounds check for bad data (thanks amaury) 2012-10-26 17:01:53 -07:00
Victor Stinner 8f825060f1 Check newly created consistency using _PyUnicode_CheckConsistency(str, 1)
* In debug mode, fill the string data with invalid characters
 * Simplify also reference counting in PyCodec_BackslashReplaceErrors()
   and PyCodec_XMLCharRefReplaceError()
2012-04-27 13:55:39 +02:00
Antoine Pitrou f0ecdd2ab9 Issue #13722: Avoid silencing ImportErrors when initializing the codecs registry. 2012-01-18 22:31:12 +01:00
Antoine Pitrou 1b468af7be Issue #13722: Avoid silencing ImportErrors when initializing the codecs registry. 2012-01-18 22:30:21 +01:00
Victor Stinner ee450093a9 PyCodec_IgnoreErrors() avoids the deprecated "u#" format 2011-12-01 02:52:11 +01:00
Victor Stinner c06bb7affd Avoid the Py_UNICODE type in codecs.c 2011-11-04 21:36:35 +01:00
Victor Stinner b31f1bcd99 PyCodec_XMLCharRefReplaceError(): Remove unused variable 2011-11-04 21:29:10 +01:00
Martin v. Löwis 8ba79306d1 Fix C89 incompatibility. 2011-11-04 12:26:49 +01:00
Martin v. Löwis b09af03b8a Port error handlers from Py_UNICODE indexing to code point indexing. 2011-11-04 11:16:41 +01:00
Martin v. Löwis bd928fef42 Rename _Py_identifier to _Py_IDENTIFIER. 2011-10-14 10:20:37 +02:00
Victor Stinner f5cff56a1b Issue #13088: Add shared Py_hexdigits constant to format a number into base 16 2011-10-14 02:13:11 +02:00
Martin v. Löwis 1ee1b6fe0d Use identifier API for PyObject_GetAttrString. 2011-10-10 18:11:30 +02:00
Victor Stinner 1a15aba71d PyCodec_ReplaceErrors() uses "C" format instead of "u#" to build result 2011-10-02 19:00:15 +02:00
Victor Stinner 639418812f Use the new Py_ARRAY_LENGTH macro 2011-09-29 00:42:28 +02:00
Martin v. Löwis d63a3b8beb Implement PEP 393. 2011-09-28 07:41:54 +02:00
Antoine Pitrou cf9d3c08c8 Issue #1813: Fix codec lookup under Turkish locales. 2011-07-24 02:27:04 +02:00
Antoine Pitrou e4a189274f Issue #9804: ascii() now always represents unicode surrogate pairs as
a single `\UXXXXXXXX`, regardless of whether the character is printable
or not.  Also, the "backslashreplace" error handler now joins surrogate
pairs into a single character on UCS-2 builds.
2010-09-09 20:30:23 +00:00
Antoine Pitrou f95a1b3c53 Recorded merge of revisions 81029 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r81029 | antoine.pitrou | 2010-05-09 16:46:46 +0200 (dim., 09 mai 2010) | 3 lines

  Untabify C files. Will watch buildbots.
........
2010-05-09 15:52:27 +00:00
Georg Brandl 495f7b5adb Merged revisions 75365,75394,75402-75403,75418,75459,75484,75592-75596,75600,75602-75607,75610-75613,75616-75617,75623,75627,75640,75647,75696,75795 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r75365 | georg.brandl | 2009-10-11 22:16:16 +0200 (So, 11 Okt 2009) | 1 line

  Fix broken links found by "make linkcheck".  scipy.org seems to be done right now, so I could not verify links going there.
........
  r75394 | georg.brandl | 2009-10-13 20:10:59 +0200 (Di, 13 Okt 2009) | 1 line

  Fix markup.
........
  r75402 | georg.brandl | 2009-10-14 17:51:48 +0200 (Mi, 14 Okt 2009) | 1 line

  #7125: fix typo.
........
  r75403 | georg.brandl | 2009-10-14 17:57:46 +0200 (Mi, 14 Okt 2009) | 1 line

  #7126: os.environ changes *do* take effect in subprocesses started with os.system().
........
  r75418 | georg.brandl | 2009-10-14 20:48:32 +0200 (Mi, 14 Okt 2009) | 1 line

  #7116: str.join() takes an iterable.
........
  r75459 | georg.brandl | 2009-10-17 10:57:43 +0200 (Sa, 17 Okt 2009) | 1 line

  Fix refleaks in _ctypes PyCSimpleType_New, which fixes the refleak seen in test___all__.
........
  r75484 | georg.brandl | 2009-10-18 09:58:12 +0200 (So, 18 Okt 2009) | 1 line

  Fix missing word.
........
  r75592 | georg.brandl | 2009-10-22 09:05:48 +0200 (Do, 22 Okt 2009) | 1 line

  Fix punctuation.
........
  r75593 | georg.brandl | 2009-10-22 09:06:49 +0200 (Do, 22 Okt 2009) | 1 line

  Revert unintended change.
........
  r75594 | georg.brandl | 2009-10-22 09:56:02 +0200 (Do, 22 Okt 2009) | 1 line

  Fix markup.
........
  r75595 | georg.brandl | 2009-10-22 09:56:56 +0200 (Do, 22 Okt 2009) | 1 line

  Fix duplicate target.
........
  r75596 | georg.brandl | 2009-10-22 10:05:04 +0200 (Do, 22 Okt 2009) | 1 line

  Add a new directive marking up implementation details and start using it.
........
  r75600 | georg.brandl | 2009-10-22 13:01:46 +0200 (Do, 22 Okt 2009) | 1 line

  Make it more robust.
........
  r75602 | georg.brandl | 2009-10-22 13:28:06 +0200 (Do, 22 Okt 2009) | 1 line

  Document new directive.
........
  r75603 | georg.brandl | 2009-10-22 13:28:23 +0200 (Do, 22 Okt 2009) | 1 line

  Allow short form with text as argument.
........
  r75604 | georg.brandl | 2009-10-22 13:36:50 +0200 (Do, 22 Okt 2009) | 1 line

  Fix stylesheet for multi-paragraph impl-details.
........
  r75605 | georg.brandl | 2009-10-22 13:48:10 +0200 (Do, 22 Okt 2009) | 1 line

  Use "impl-detail" directive where applicable.
........
  r75606 | georg.brandl | 2009-10-22 17:00:06 +0200 (Do, 22 Okt 2009) | 1 line

  #6324: membership test tries iteration via __iter__.
........
  r75607 | georg.brandl | 2009-10-22 17:04:09 +0200 (Do, 22 Okt 2009) | 1 line

  #7088: document new functions in signal as Unix-only.
........
  r75610 | georg.brandl | 2009-10-22 17:27:24 +0200 (Do, 22 Okt 2009) | 1 line

  Reorder __slots__ fine print and add a clarification.
........
  r75611 | georg.brandl | 2009-10-22 17:42:32 +0200 (Do, 22 Okt 2009) | 1 line

  #7035: improve docs of the various <method>_errors() functions, and give them docstrings.
........
  r75612 | georg.brandl | 2009-10-22 17:52:15 +0200 (Do, 22 Okt 2009) | 1 line

  #7156: document curses as Unix-only.
........
  r75613 | georg.brandl | 2009-10-22 17:54:35 +0200 (Do, 22 Okt 2009) | 1 line

  #6977: getopt does not support optional option arguments.
........
  r75616 | georg.brandl | 2009-10-22 18:17:05 +0200 (Do, 22 Okt 2009) | 1 line

  Add proper references.
........
  r75617 | georg.brandl | 2009-10-22 18:20:55 +0200 (Do, 22 Okt 2009) | 1 line

  Make printout margin important.
........
  r75623 | georg.brandl | 2009-10-23 10:14:44 +0200 (Fr, 23 Okt 2009) | 1 line

  #7188: fix optionxform() docs.
........
  r75627 | fred.drake | 2009-10-23 15:04:51 +0200 (Fr, 23 Okt 2009) | 2 lines

  add further note about what's passed to optionxform
........
  r75640 | neil.schemenauer | 2009-10-23 21:58:17 +0200 (Fr, 23 Okt 2009) | 2 lines

  Improve some docstrings in the 'warnings' module.
........
  r75647 | georg.brandl | 2009-10-24 12:04:19 +0200 (Sa, 24 Okt 2009) | 1 line

  Fix markup.
........
  r75696 | georg.brandl | 2009-10-25 21:25:43 +0100 (So, 25 Okt 2009) | 1 line

  Fix a demo.
........
  r75795 | georg.brandl | 2009-10-27 16:10:22 +0100 (Di, 27 Okt 2009) | 1 line

  Fix a strange mis-edit.
........
2009-10-27 15:28:25 +00:00
Martin v. Löwis 43c57785d3 Rename utf8b error handler to surrogateescape. 2009-05-10 08:15:24 +00:00
Martin v. Löwis e0a2b72e61 Rename the surrogates error handler to surrogatepass. 2009-05-10 08:08:56 +00:00
Martin v. Löwis 011e842033 Issue #5915: Implement PEP 383, Non-decodable Bytes in
System Character Interfaces.
2009-05-05 04:43:17 +00:00
Martin v. Löwis aef3fb082c Make PyCodec_SurrogateErrors static. 2009-05-02 19:27:30 +00:00
Martin v. Löwis db12d454e6 Issue #3672: Reject surrogates in utf-8 codec; add surrogates error
handler.
2009-05-02 18:52:14 +00:00
Christian Heimes 6a27efa2d3 Issue 3723: Fixed initialization of subinterpreters
The patch fixes several issues with Py_NewInterpreter as well as the demo for multiple subinterpreters.
Most of the patch was written by MvL with help from Benjamin, Amaury and me. Graham Dumpleton has verified that this patch fixes an issue with mod_wsgi.
2008-10-30 21:48:26 +00:00
Marc-André Lemburg b2750b5d33 Move the codec decode type checks to bytes/bytearray.decode().
Use faster PyUnicode_FromEncodedObject() for bytes/bytearray.decode().

Add new PyCodec_KnownEncoding() API.

Add new PyUnicode_AsDecodedUnicode() and PyUnicode_AsEncodedUnicode() APIs.

Add missing PyUnicode_AsDecodedObject() to unicodeobject.h

Fix punicode codec to also work on memoryviews.
2008-06-06 12:18:17 +00:00
Christian Heimes 72b710a596 Renamed PyString to PyBytes 2008-05-26 13:28:38 +00:00
Christian Heimes 9c4756ea26 Renamed PyBytes to PyByteArray 2008-05-26 13:22:05 +00:00
Christian Heimes 819b8bf403 More PyImport_ImportModule -> PyImport_ImportModuleNoBlock 2008-01-03 23:05:47 +00:00
Christian Heimes 90aa7646af #1629: Renamed Py_Size, Py_Type and Py_Refcnt to Py_SIZE, Py_TYPE and Py_REFCNT. 2007-12-19 02:45:37 +00:00
Guido van Rossum 98297ee781 Merging the py3k-pep3137 branch back into the py3k branch.
No detailed change log; just check out the change log for the py3k-pep3137
branch.  The most obvious changes:

  - str8 renamed to bytes (PyString at the C level);
  - bytes renamed to buffer (PyBytes at the C level);
  - PyString and PyUnicode are no longer compatible.

I.e. we now have an immutable bytes type and a mutable bytes type.

The behavior of PyString was modified quite a bit, to make it more
bytes-like.  Some changes are still on the to-do list.
2007-11-06 21:34:58 +00:00
Guido van Rossum 21431e85d5 This is the uncontroversial half of patch 1263 by Thomas Lee:
changes to codecs.c and structmember.c to use PyUnicode instead of
PyString.
2007-10-19 21:48:41 +00:00
Neal Norwitz 9edcc2e2fd Handle error 2007-08-11 04:58:26 +00:00
Martin v. Löwis 427dbff8f0 Revert 55876. Use PyUnicode_AsEncodedString instead. 2007-06-12 05:53:00 +00:00
Martin v. Löwis 641d5cc6a6 Short-cut lookup of utf-8 codec, to make import work
on OSX.
2007-06-11 04:19:13 +00:00
Walter Dörwald 573c08c1b7 Change PyErr_Format() to generate a unicode string (by using
PyUnicode_FromFormatV() instead of PyString_FromFormatV()).

Change calls to PyErr_Format() to benefit from the new format
specifiers: Using %S, object instead of %s, PyString_AS_STRING(object)
with will work with unicode objects too.
2007-05-25 15:46:59 +00:00
Guido van Rossum 8d30cc0144 Get rid of all #ifdef Py_USING_UNICODE (it is always present now).
(With the help of unifdef from freshmeat.)
2007-05-03 17:49:24 +00:00