Commit Graph

521 Commits

Author SHA1 Message Date
Victor Stinner c911bbfd5d str, bytes, bytearray docstring: remove unnecessary [...] 2010-11-07 19:04:46 +00:00
Victor Stinner e14e212221 Fix encode/decode method doc of str, bytes, bytearray types
* Specify the default encoding: write 'utf-8' instead of
   sys.getdefaultencoding(), because the default encoding is now constant
 * Specify the default errors value
2010-11-07 18:41:46 +00:00
Eric Smith 51d2fd983b Added more to docstrings for str.format, format_map, and __format__. 2010-11-06 19:27:37 +00:00
David Malcolm 9696088b6d Issue #10288: The deprecated family of "char"-handling macros
(ISLOWER()/ISUPPER()/etc) have now been removed: use Py_ISLOWER() etc
instead.
2010-11-05 17:23:41 +00:00
Eric Smith 27bbca6f79 Issue #6081: Add str.format_map. str.format_map(mapping) is similar to str.format(**mapping), except mapping does not get converted to a dict. 2010-11-04 17:06:58 +00:00
Victor Stinner ad15872854 Simplify PyUnicode_Encode/DecodeFSDefault on Windows/Mac OS X
* Windows always uses mbcs
 * Mac OS X always uses utf-8
2010-10-27 00:25:46 +00:00
Victor Stinner f933e1ab6f Issue #4388: On Mac OS X, decode command line arguments from UTF-8, instead of
the locale encoding. If the LANG (and LC_ALL and LC_CTYPE) environment variable
is not set, the locale encoding is ISO-8859-1, whereas most programs (including
Python) expect UTF-8. Python already uses UTF-8 for the filesystem encoding and
to encode command line arguments on this OS.
2010-10-20 22:58:25 +00:00
Victor Stinner 9a90900da5 PyUnicode_FromFormatV(): Fix %A format
It was not completly implemented. Add a test.
2010-10-18 20:59:24 +00:00
Benjamin Peterson 8f67d0893f make hashes always the size of pointers; introduce Py_hash_t #9778 2010-10-17 20:54:53 +00:00
Victor Stinner 168e117e0a Add an optional size argument to _Py_char2wchar()
_Py_char2wchar() callers usually need the result size in characters. Since it's
trivial to compute it in _Py_char2wchar() (O(1) whereas wcslen() is O(n)), add
an option to get it.
2010-10-16 23:16:16 +00:00
Victor Stinner f3170ccef8 Use locale encoding if Py_FileSystemDefaultEncoding is not set
* PyUnicode_EncodeFSDefault(), PyUnicode_DecodeFSDefaultAndSize() and
   PyUnicode_DecodeFSDefault() use the locale encoding instead of UTF-8 if
   Py_FileSystemDefaultEncoding is NULL
 * redecode_filenames() functions and _Py_code_object_list (issue #9630)
   are no more needed: remove them
2010-10-15 12:04:23 +00:00
Georg Brandl 66c221e993 #9418: first step of moving private string methods to _string module. 2010-10-14 07:04:07 +00:00
Victor Stinner beb4135b8c PyUnicode_AsWideCharString() takes a PyObject*, not a PyUnicodeObject*
All unicode functions uses PyObject* except PyUnicode_AsWideChar(). Fix the
prototype for the new function PyUnicode_AsWideCharString().
2010-10-07 01:02:42 +00:00
Victor Stinner 5593d8aeb4 Issue #8670: PyUnicode_AsWideChar() and PyUnicode_AsWideCharString() replace
UTF-16 surrogate pairs by single non-BMP characters for 16 bits Py_UNICODE
and 32 bits wchar_t (eg. Linux in narrow build).
2010-10-02 11:11:27 +00:00
Victor Stinner 1c24bd0252 Issue #8870: PyUnicode_AsWideCharString() doesn't count the trailing nul character
And write unit tests for PyUnicode_AsWideChar() and PyUnicode_AsWideCharString().
2010-10-02 11:03:13 +00:00
Victor Stinner 71e91a358b Fix PyUnicode_AsWideCharString(): set *size if size is not NULL 2010-09-29 17:55:12 +00:00
Victor Stinner c39211f51e Issue #9630: Redecode filenames when setting the filesystem encoding
Redecode the filenames of:

 - all modules: __file__ and __path__ attributes
 - all code objects: co_filename attribute
 - sys.path
 - sys.meta_path
 - sys.executable
 - sys.path_importer_cache (keys)

Keep weak references to all code objects until initfsencoding() is called, to
be able to redecode co_filename attribute of all code objects.
2010-09-29 16:35:47 +00:00
Victor Stinner 137c34c027 Issue #9979: Create function PyUnicode_AsWideCharString(). 2010-09-29 10:25:54 +00:00
Benjamin Peterson d4ac96a336 use return NULL; it's just as correct 2010-09-12 16:40:53 +00:00
Victor Stinner 4c7db315df Issue #9738, #9836: Fix refleak introduced by r84704 2010-09-12 07:51:18 +00:00
Benjamin Peterson 9be0b2e312 detect non-ascii characters much earlier (plugs ref leak) 2010-09-12 03:40:54 +00:00
Victor Stinner 1205f2774e Issue #9738: PyUnicode_FromFormat() and PyErr_Format() raise an error on
a non-ASCII byte in the format string.

Document also the encoding.
2010-09-11 00:54:47 +00:00
Victor Stinner 46408606d8 Rename PyUnicode_strdup() to PyUnicode_AsUnicodeCopy() 2010-09-03 16:18:00 +00:00
Victor Stinner 71133ff368 Create PyUnicode_strdup() function 2010-09-01 23:43:53 +00:00
Victor Stinner c4eb765fc1 Create Py_UNICODE_strcat() function 2010-09-01 23:43:50 +00:00
Victor Stinner 42cb462682 Remove unicode_default_encoding constant
Inline its value in PyUnicode_GetDefaultEncoding(). The comment is now outdated
(we will not change its value anymore).
2010-09-01 19:39:01 +00:00
Antoine Pitrou fce7fd6426 Issue #9549: sys.setdefaultencoding() and PyUnicode_SetDefaultEncoding()
are now removed, since their effect was inexistent in 3.x (the default
encoding is hardcoded to utf-8 and cannot be changed).
2010-09-01 18:54:56 +00:00
Antoine Pitrou b0fa831d1e Issue #7415: PyUnicode_FromEncodedObject() now uses the new buffer API
properly.  Patch by Stefan Behnel.
2010-09-01 15:10:12 +00:00
Daniel Stutzbach 8515eaefda Issue 8781: On systems a signed 4-byte wchar_t and a 4-byte Py_UNICODE, use memcpy to convert between the two (as already done when wchar_t is unsigned) 2010-08-24 21:57:33 +00:00
Victor Stinner 3119ed73aa Fix PyUnicode_EncodeFSDefault() indentation 2010-08-18 22:26:50 +00:00
Victor Stinner ef8d95c498 Issue #9425: Create Py_UNICODE_strncmp() function
The code is based on strncmp() of the libiberty library,
function in the public domain.
2010-08-16 22:03:11 +00:00
Victor Stinner 47fcb5b4c3 Issue #9542: Create PyUnicode_FSDecoder() function
It's a ParseTuple converter: decode bytes objects to unicode using
PyUnicode_DecodeFSDefaultAndSize(); str objects are output as-is.

 * Don't specify surrogateescape error handler in the comments nor the
   documentation, but PyUnicode_DecodeFSDefaultAndSize() and
   PyUnicode_EncodeFSDefault() because these functions use strict error handler
   for the mbcs encoding (on Windows).
 * Remove PyUnicode_FSConverter() comment in unicodeobject.c to avoid
   inconsistency with unicodeobject.h.
2010-08-13 23:59:58 +00:00
Victor Stinner 4a2b7a1b14 Issue #9425: Create PyErr_WarnFormat() function
Similar to PyErr_WarnEx() but use PyUnicode_FromFormatV() to format the warning
message.

Strip also some trailing spaces.
2010-08-13 14:03:48 +00:00
Alexander Belopolsky f0f45142d5 Issue #2443: Added a new macro, Py_VA_COPY, which is equivalent to C99
va_copy, but available on all python platforms.  Untabified a few
unrelated files.
2010-08-11 17:31:17 +00:00
Victor Stinner 331ea92ade Issue #9425: create Py_UNICODE_strrchr() function 2010-08-10 16:37:20 +00:00
Georg Brandl 78eef3de88 Revert r83395, it introduces test failures and is not necessary anyway since we now have to nul-terminate the string anyway. 2010-08-01 20:51:02 +00:00
Georg Brandl bd534f0349 #8821: do not rely on Unicode strings being terminated with a \u0000, rather explicitly check range before looking for a second surrogate character. 2010-08-01 08:49:18 +00:00
Georg Brandl 8ee604b989 Use Py_CLEAR(). 2010-07-29 14:23:06 +00:00
Stefan Krah 99212f61db Sub-issue of #9036: Fix incorrect use of Py_CHARMASK. 2010-07-19 17:58:26 +00:00
Senthil Kumaran e51ee8a5bc Fix the docstrings of the capitalize method. 2010-07-05 12:00:56 +00:00
Ezio Melotti 9bf2b3ae6a Update comment about surrogates. 2010-07-03 04:52:19 +00:00
Ezio Melotti 57221d02ba Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.
1) #8271: when a byte sequence is invalid, only the start byte and all the
   valid continuation bytes are now replaced by U+FFFD, instead of replacing
   the number of bytes specified by the start byte.
   See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
   in behavior);
3) Change the error messages "unexpected code byte" to "invalid start byte"
   and "invalid data" to "invalid continuation byte";
4) Add an extensive set of tests in test_unicode;
5) Fix test_codeccallbacks because it was failing after this change.
2010-07-01 07:32:02 +00:00
Georg Brandl 952867aa30 #9078: fix some Unicode C API descriptions, in comments and docs. 2010-06-27 10:17:12 +00:00
Ezio Melotti c1897e716d Merged revisions 82248 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r82248 | ezio.melotti | 2010-06-26 21:44:42 +0300 (Sat, 26 Jun 2010) | 1 line

  Fix extra space.
........
2010-06-26 18:50:39 +00:00
Victor Stinner 554f3f0081 Issue #850997: mbcs encoding (Windows only) handles errors argument: strict
mode raises unicode errors. The encoder only supports "strict" and "replace"
error handlers, the decoder only supports "strict" and "ignore" error handlers.
2010-06-16 23:33:54 +00:00
Mark Dickinson 7db923cc99 Silence 'unused variable' gcc warning. Patch by Éric Araujo. 2010-06-12 09:10:14 +00:00
Victor Stinner 313a120ab6 Issue #8969: On Windows, use mbcs codec in strict mode to encode and decode
filenames and enable os.fsencode().
2010-06-11 23:56:51 +00:00
Antoine Pitrou cc0cfd3576 Merged revisions 81907 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r81907 | antoine.pitrou | 2010-06-11 23:42:26 +0200 (ven., 11 juin 2010) | 5 lines

  Issue #8941: decoding big endian UTF-32 data in UCS-2 builds could crash
  the interpreter with characters outside the Basic Multilingual Plane
  (higher than 0x10000).
........
2010-06-11 21:46:32 +00:00
Victor Stinner 37296e89a5 Fix r81869: ISO-8859-15 was seen as an alias to ISO-8859-1
Don't use normalize_encoding() result if it is truncated.
2010-06-10 13:36:23 +00:00
Victor Stinner 600d3bed6c Issue #8922: Normalize the encoding name in PyUnicode_AsEncodedString() to
enable shortcuts for upper case encoding name. Add also a shortcut for
"iso-8859-1" in PyUnicode_AsEncodedString() and PyUnicode_Decode().
2010-06-10 12:00:55 +00:00