Commit Graph

496 Commits

Author SHA1 Message Date
Victor Stinner 42cb462682 Remove unicode_default_encoding constant
Inline its value in PyUnicode_GetDefaultEncoding(). The comment is now outdated
(we will not change its value anymore).
2010-09-01 19:39:01 +00:00
Antoine Pitrou fce7fd6426 Issue #9549: sys.setdefaultencoding() and PyUnicode_SetDefaultEncoding()
are now removed, since their effect was inexistent in 3.x (the default
encoding is hardcoded to utf-8 and cannot be changed).
2010-09-01 18:54:56 +00:00
Antoine Pitrou b0fa831d1e Issue #7415: PyUnicode_FromEncodedObject() now uses the new buffer API
properly.  Patch by Stefan Behnel.
2010-09-01 15:10:12 +00:00
Daniel Stutzbach 8515eaefda Issue 8781: On systems a signed 4-byte wchar_t and a 4-byte Py_UNICODE, use memcpy to convert between the two (as already done when wchar_t is unsigned) 2010-08-24 21:57:33 +00:00
Victor Stinner 3119ed73aa Fix PyUnicode_EncodeFSDefault() indentation 2010-08-18 22:26:50 +00:00
Victor Stinner ef8d95c498 Issue #9425: Create Py_UNICODE_strncmp() function
The code is based on strncmp() of the libiberty library,
function in the public domain.
2010-08-16 22:03:11 +00:00
Victor Stinner 47fcb5b4c3 Issue #9542: Create PyUnicode_FSDecoder() function
It's a ParseTuple converter: decode bytes objects to unicode using
PyUnicode_DecodeFSDefaultAndSize(); str objects are output as-is.

 * Don't specify surrogateescape error handler in the comments nor the
   documentation, but PyUnicode_DecodeFSDefaultAndSize() and
   PyUnicode_EncodeFSDefault() because these functions use strict error handler
   for the mbcs encoding (on Windows).
 * Remove PyUnicode_FSConverter() comment in unicodeobject.c to avoid
   inconsistency with unicodeobject.h.
2010-08-13 23:59:58 +00:00
Victor Stinner 4a2b7a1b14 Issue #9425: Create PyErr_WarnFormat() function
Similar to PyErr_WarnEx() but use PyUnicode_FromFormatV() to format the warning
message.

Strip also some trailing spaces.
2010-08-13 14:03:48 +00:00
Alexander Belopolsky f0f45142d5 Issue #2443: Added a new macro, Py_VA_COPY, which is equivalent to C99
va_copy, but available on all python platforms.  Untabified a few
unrelated files.
2010-08-11 17:31:17 +00:00
Victor Stinner 331ea92ade Issue #9425: create Py_UNICODE_strrchr() function 2010-08-10 16:37:20 +00:00
Georg Brandl 78eef3de88 Revert r83395, it introduces test failures and is not necessary anyway since we now have to nul-terminate the string anyway. 2010-08-01 20:51:02 +00:00
Georg Brandl bd534f0349 #8821: do not rely on Unicode strings being terminated with a \u0000, rather explicitly check range before looking for a second surrogate character. 2010-08-01 08:49:18 +00:00
Georg Brandl 8ee604b989 Use Py_CLEAR(). 2010-07-29 14:23:06 +00:00
Stefan Krah 99212f61db Sub-issue of #9036: Fix incorrect use of Py_CHARMASK. 2010-07-19 17:58:26 +00:00
Senthil Kumaran e51ee8a5bc Fix the docstrings of the capitalize method. 2010-07-05 12:00:56 +00:00
Ezio Melotti 9bf2b3ae6a Update comment about surrogates. 2010-07-03 04:52:19 +00:00
Ezio Melotti 57221d02ba Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.
1) #8271: when a byte sequence is invalid, only the start byte and all the
   valid continuation bytes are now replaced by U+FFFD, instead of replacing
   the number of bytes specified by the start byte.
   See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
   in behavior);
3) Change the error messages "unexpected code byte" to "invalid start byte"
   and "invalid data" to "invalid continuation byte";
4) Add an extensive set of tests in test_unicode;
5) Fix test_codeccallbacks because it was failing after this change.
2010-07-01 07:32:02 +00:00
Georg Brandl 952867aa30 #9078: fix some Unicode C API descriptions, in comments and docs. 2010-06-27 10:17:12 +00:00
Ezio Melotti c1897e716d Merged revisions 82248 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r82248 | ezio.melotti | 2010-06-26 21:44:42 +0300 (Sat, 26 Jun 2010) | 1 line

  Fix extra space.
........
2010-06-26 18:50:39 +00:00
Victor Stinner 554f3f0081 Issue #850997: mbcs encoding (Windows only) handles errors argument: strict
mode raises unicode errors. The encoder only supports "strict" and "replace"
error handlers, the decoder only supports "strict" and "ignore" error handlers.
2010-06-16 23:33:54 +00:00
Mark Dickinson 7db923cc99 Silence 'unused variable' gcc warning. Patch by Éric Araujo. 2010-06-12 09:10:14 +00:00
Victor Stinner 313a120ab6 Issue #8969: On Windows, use mbcs codec in strict mode to encode and decode
filenames and enable os.fsencode().
2010-06-11 23:56:51 +00:00
Antoine Pitrou cc0cfd3576 Merged revisions 81907 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r81907 | antoine.pitrou | 2010-06-11 23:42:26 +0200 (ven., 11 juin 2010) | 5 lines

  Issue #8941: decoding big endian UTF-32 data in UCS-2 builds could crash
  the interpreter with characters outside the Basic Multilingual Plane
  (higher than 0x10000).
........
2010-06-11 21:46:32 +00:00
Victor Stinner 37296e89a5 Fix r81869: ISO-8859-15 was seen as an alias to ISO-8859-1
Don't use normalize_encoding() result if it is truncated.
2010-06-10 13:36:23 +00:00
Victor Stinner 600d3bed6c Issue #8922: Normalize the encoding name in PyUnicode_AsEncodedString() to
enable shortcuts for upper case encoding name. Add also a shortcut for
"iso-8859-1" in PyUnicode_AsEncodedString() and PyUnicode_Decode().
2010-06-10 12:00:55 +00:00
Victor Stinner ae6265f8d0 Issue #8715: Create PyUnicode_EncodeFSDefault() function: Encode a Unicode
object to Py_FileSystemDefaultEncoding with the "surrogateescape" error
handler, return a bytes object. If Py_FileSystemDefaultEncoding is not set,
fall back to UTF-8.
2010-05-15 16:27:27 +00:00
Victor Stinner 59e62db0a3 Enable shortcuts for common encodings in PyUnicode_AsEncodedString() for any
error handler, not only the default error handler (strict)
2010-05-15 13:14:32 +00:00
Victor Stinner b9a20ad036 PyUnicode_DecodeFSDefaultAndSize() uses surrogateescape error handler
This function is only used to decode Python module filenames, but Python
doesn't support surrogates in modules filenames yet. So nobody noticed this
minor bug.
2010-04-30 16:37:52 +00:00
Victor Stinner 0ea2a468e3 Simplify PyUnicode_FSConverter(): remove reference to PyByteArray
PyByteArray is no more supported
2010-04-30 00:22:08 +00:00
Benjamin Peterson a23831ff44 condense condition 2010-04-25 21:54:00 +00:00
Victor Stinner 445a623226 Fix my previous commit (r80382) for wide build (unicodeobject.c) 2010-04-22 20:01:57 +00:00
Victor Stinner 31be90b0c7 Issue #8092: Fix PyUnicode_EncodeUTF8() to support error handler producing
unicode string (eg. backslashreplace)
2010-04-22 19:38:16 +00:00
Victor Stinner dcb2403022 Issue #8485: PyUnicode_FSConverter() doesn't accept bytearray object anymore,
you have to convert your bytearray filenames to bytes
2010-04-22 12:08:36 +00:00
Florent Xicluna 806d8cf0e8 Merged revisions 79494,79496 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r79494 | florent.xicluna | 2010-03-30 10:24:06 +0200 (mar, 30 mar 2010) | 2 lines

  #7643: Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according to Unicode Standard Annex #14.
........
  r79496 | florent.xicluna | 2010-03-30 18:29:03 +0200 (mar, 30 mar 2010) | 2 lines

  Highlight the change of behavior related to r79494.  Now VT and FF are linebreaks.
........
2010-03-30 19:34:18 +00:00
Victor Stinner 808fc0a0ee Merged revisions 79278,79280 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r79278 | victor.stinner | 2010-03-22 13:24:37 +0100 (lun., 22 mars 2010) | 2 lines

  Issue #1583863: An unicode subclass can now override the __str__ method
........
  r79280 | victor.stinner | 2010-03-22 13:36:28 +0100 (lun., 22 mars 2010) | 5 lines

  Fix the NEWS about my last commit: an unicode subclass can now override the
  __unicode__ method (and not the __str__ method).

  Simplify also the testcase.
........
2010-03-22 12:50:40 +00:00
Gregory P. Smith cc47d8c8d4 Update a comment with more details. 2010-02-27 08:33:11 +00:00
Ezio Melotti 5b2b242f07 Merged revisions 77743 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r77743 | ezio.melotti | 2010-01-25 13:24:37 +0200 (Mon, 25 Jan 2010) | 1 line

  #7775: fixed docstring for rpartition
........
2010-01-25 11:58:28 +00:00
Antoine Pitrou f068f94e82 Merged revisions 77469-77470 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r77469 | antoine.pitrou | 2010-01-13 14:43:37 +0100 (mer., 13 janv. 2010) | 3 lines

  Test commit to try to diagnose failures of the IA-64 buildbot
........
  r77470 | antoine.pitrou | 2010-01-13 15:01:26 +0100 (mer., 13 janv. 2010) | 3 lines

  Sanitize bloom filter macros
........
2010-01-13 14:19:12 +00:00
Antoine Pitrou cbfdee3e54 Merged revisions 77463 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r77463 | antoine.pitrou | 2010-01-13 09:55:20 +0100 (mer., 13 janv. 2010) | 3 lines

  Fix Windows build (re r77461)
........
2010-01-13 08:58:08 +00:00
Antoine Pitrou f2c5484f9e Merged revisions 77461 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r77461 | antoine.pitrou | 2010-01-13 08:55:48 +0100 (mer., 13 janv. 2010) | 5 lines

  Issue #7622: Improve the split(), rsplit(), splitlines() and replace()
  methods of bytes, bytearray and unicode objects by using a common
  implementation based on stringlib's fast search.  Patch by Florent Xicluna.
........
2010-01-13 08:07:53 +00:00
Benjamin Peterson 8667a9b6ea Python strings ending with '\0' should not be equivalent to their C counterparts in PyUnicode_CompareWithASCIIString 2010-01-09 21:45:28 +00:00
Mark Dickinson 6ce4a9a9f2 Merged revisions 76308 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r76308 | mark.dickinson | 2009-11-15 16:18:58 +0000 (Sun, 15 Nov 2009) | 3 lines

  Issue #7228:  Add '%lld' and '%llu' support to PyFormat_FromString,
  PyFormat_FromStringV and PyErr_Format.
........
2009-11-16 17:00:11 +00:00
Benjamin Peterson adf6a6c842 death to compiler warning 2009-11-10 21:23:15 +00:00
Georg Brandl 495f7b5adb Merged revisions 75365,75394,75402-75403,75418,75459,75484,75592-75596,75600,75602-75607,75610-75613,75616-75617,75623,75627,75640,75647,75696,75795 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r75365 | georg.brandl | 2009-10-11 22:16:16 +0200 (So, 11 Okt 2009) | 1 line

  Fix broken links found by "make linkcheck".  scipy.org seems to be done right now, so I could not verify links going there.
........
  r75394 | georg.brandl | 2009-10-13 20:10:59 +0200 (Di, 13 Okt 2009) | 1 line

  Fix markup.
........
  r75402 | georg.brandl | 2009-10-14 17:51:48 +0200 (Mi, 14 Okt 2009) | 1 line

  #7125: fix typo.
........
  r75403 | georg.brandl | 2009-10-14 17:57:46 +0200 (Mi, 14 Okt 2009) | 1 line

  #7126: os.environ changes *do* take effect in subprocesses started with os.system().
........
  r75418 | georg.brandl | 2009-10-14 20:48:32 +0200 (Mi, 14 Okt 2009) | 1 line

  #7116: str.join() takes an iterable.
........
  r75459 | georg.brandl | 2009-10-17 10:57:43 +0200 (Sa, 17 Okt 2009) | 1 line

  Fix refleaks in _ctypes PyCSimpleType_New, which fixes the refleak seen in test___all__.
........
  r75484 | georg.brandl | 2009-10-18 09:58:12 +0200 (So, 18 Okt 2009) | 1 line

  Fix missing word.
........
  r75592 | georg.brandl | 2009-10-22 09:05:48 +0200 (Do, 22 Okt 2009) | 1 line

  Fix punctuation.
........
  r75593 | georg.brandl | 2009-10-22 09:06:49 +0200 (Do, 22 Okt 2009) | 1 line

  Revert unintended change.
........
  r75594 | georg.brandl | 2009-10-22 09:56:02 +0200 (Do, 22 Okt 2009) | 1 line

  Fix markup.
........
  r75595 | georg.brandl | 2009-10-22 09:56:56 +0200 (Do, 22 Okt 2009) | 1 line

  Fix duplicate target.
........
  r75596 | georg.brandl | 2009-10-22 10:05:04 +0200 (Do, 22 Okt 2009) | 1 line

  Add a new directive marking up implementation details and start using it.
........
  r75600 | georg.brandl | 2009-10-22 13:01:46 +0200 (Do, 22 Okt 2009) | 1 line

  Make it more robust.
........
  r75602 | georg.brandl | 2009-10-22 13:28:06 +0200 (Do, 22 Okt 2009) | 1 line

  Document new directive.
........
  r75603 | georg.brandl | 2009-10-22 13:28:23 +0200 (Do, 22 Okt 2009) | 1 line

  Allow short form with text as argument.
........
  r75604 | georg.brandl | 2009-10-22 13:36:50 +0200 (Do, 22 Okt 2009) | 1 line

  Fix stylesheet for multi-paragraph impl-details.
........
  r75605 | georg.brandl | 2009-10-22 13:48:10 +0200 (Do, 22 Okt 2009) | 1 line

  Use "impl-detail" directive where applicable.
........
  r75606 | georg.brandl | 2009-10-22 17:00:06 +0200 (Do, 22 Okt 2009) | 1 line

  #6324: membership test tries iteration via __iter__.
........
  r75607 | georg.brandl | 2009-10-22 17:04:09 +0200 (Do, 22 Okt 2009) | 1 line

  #7088: document new functions in signal as Unix-only.
........
  r75610 | georg.brandl | 2009-10-22 17:27:24 +0200 (Do, 22 Okt 2009) | 1 line

  Reorder __slots__ fine print and add a clarification.
........
  r75611 | georg.brandl | 2009-10-22 17:42:32 +0200 (Do, 22 Okt 2009) | 1 line

  #7035: improve docs of the various <method>_errors() functions, and give them docstrings.
........
  r75612 | georg.brandl | 2009-10-22 17:52:15 +0200 (Do, 22 Okt 2009) | 1 line

  #7156: document curses as Unix-only.
........
  r75613 | georg.brandl | 2009-10-22 17:54:35 +0200 (Do, 22 Okt 2009) | 1 line

  #6977: getopt does not support optional option arguments.
........
  r75616 | georg.brandl | 2009-10-22 18:17:05 +0200 (Do, 22 Okt 2009) | 1 line

  Add proper references.
........
  r75617 | georg.brandl | 2009-10-22 18:20:55 +0200 (Do, 22 Okt 2009) | 1 line

  Make printout margin important.
........
  r75623 | georg.brandl | 2009-10-23 10:14:44 +0200 (Fr, 23 Okt 2009) | 1 line

  #7188: fix optionxform() docs.
........
  r75627 | fred.drake | 2009-10-23 15:04:51 +0200 (Fr, 23 Okt 2009) | 2 lines

  add further note about what's passed to optionxform
........
  r75640 | neil.schemenauer | 2009-10-23 21:58:17 +0200 (Fr, 23 Okt 2009) | 2 lines

  Improve some docstrings in the 'warnings' module.
........
  r75647 | georg.brandl | 2009-10-24 12:04:19 +0200 (Sa, 24 Okt 2009) | 1 line

  Fix markup.
........
  r75696 | georg.brandl | 2009-10-25 21:25:43 +0100 (So, 25 Okt 2009) | 1 line

  Fix a demo.
........
  r75795 | georg.brandl | 2009-10-27 16:10:22 +0100 (Di, 27 Okt 2009) | 1 line

  Fix a strange mis-edit.
........
2009-10-27 15:28:25 +00:00
Benjamin Peterson f38a69f979 kill merged line 2009-09-18 21:49:06 +00:00
Benjamin Peterson 308d637c94 Merged revisions 74929 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r74929 | benjamin.peterson | 2009-09-18 16:14:55 -0500 (Fri, 18 Sep 2009) | 1 line

  add keyword arguments support to str/unicode encode and decode #6300
........
2009-09-18 21:42:35 +00:00
Alexandre Vassalotti e85bd987c4 Merged revisions 73871 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r73871 | alexandre.vassalotti | 2009-07-06 22:17:30 -0400 (Mon, 06 Jul 2009) | 7 lines

  Grow the allocated buffer in PyUnicode_EncodeUTF7 to avoid buffer overrun.

  Without this change, test_unicode.UnicodeTest.test_codecs_utf7 crashes in
  debug mode. What happens is the unicode string u'\U000abcde' with a length
  of 1 encodes to the string '+2m/c3g-' of length 8. Since only 5 bytes is
  reserved in the buffer, a buffer overrun occurs.
........
2009-07-21 00:39:03 +00:00
Amaury Forgeot d'Arc 84ec8d9314 #6373: SystemError in str.encode('latin1', 'surrogateescape')
if the string contains unpaired surrogates.
(In debug build, crash in assert())

This can happen with normal processing, if python starts with utf-8,
then calls sys.setfilesystemencoding('latin-1')
2009-06-29 22:36:49 +00:00
Georg Brandl c6c3178942 Merged revisions 73190,73213,73257-73258,73260,73275,73294 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r73190 | georg.brandl | 2009-06-04 01:23:45 +0200 (Do, 04 Jun 2009) | 2 lines

  Avoid PendingDeprecationWarnings emitted by deprecated unittest methods.
........
  r73213 | georg.brandl | 2009-06-04 12:15:57 +0200 (Do, 04 Jun 2009) | 1 line

  #5967: note that the C slicing APIs do not support negative indices.
........
  r73257 | georg.brandl | 2009-06-06 19:50:05 +0200 (Sa, 06 Jun 2009) | 1 line

  #6211: elaborate a bit on ways to call the function.
........
  r73258 | georg.brandl | 2009-06-06 19:51:31 +0200 (Sa, 06 Jun 2009) | 1 line

  #6204: use a real reference instead of "see later".
........
  r73260 | georg.brandl | 2009-06-06 20:21:58 +0200 (Sa, 06 Jun 2009) | 1 line

  #6224: s/JPython/Jython/, and remove one link to a module nine years old.
........
  r73275 | georg.brandl | 2009-06-07 22:37:52 +0200 (So, 07 Jun 2009) | 1 line

  Add Ezio.
........
  r73294 | georg.brandl | 2009-06-08 15:34:52 +0200 (Mo, 08 Jun 2009) | 1 line

  #6194: O_SHLOCK/O_EXLOCK are not really more platform independent than lockf().
........
2009-06-08 13:41:29 +00:00
Raymond Hettinger 3ad05763a6 Strengthen the guard. The code doesn't work well with subclasses. 2009-05-29 22:11:22 +00:00