Commit Graph

158 Commits

Author SHA1 Message Date
Georg Brandl c6bc4c6897 Fix a few typos in the unicode header. 2011-10-05 16:23:09 +02:00
Georg Brandl 4975a9b44d Fix grammar. 2011-10-05 16:12:21 +02:00
Victor Stinner b9275c104e Speedup str[a:b] and PyUnicode_FromKindAndData
* str[a:b] doesn't scan the string for the maximum character if the string
   is ascii only
 * PyUnicode_FromKindAndData() stops if we are sure that we cannot use a
   shorter character type. For example, _PyUnicode_FromUCS1() stops if we
   have at least one character in range U+0080-U+00FF
2011-10-05 14:01:42 +02:00
Victor Stinner 30134f53fc Complete documentation of compact ASCII strings 2011-10-04 01:32:45 +02:00
Victor Stinner a41463c203 Document utf8_length and wstr_length states
Ensure these states with assertions in _PyUnicode_CheckConsistency().
2011-10-04 01:05:08 +02:00
Victor Stinner 7f11ad4594 Unicode: document when the wstr pointer is shared with data
Add also related assertions to _PyUnicode_CheckConsistency().
2011-10-04 00:00:20 +02:00
Victor Stinner 8cfcbed4e3 Improve string forms and PyUnicode_Resize() documentation
Remove also the FIXME for resize_copy(): as discussed with Martin, copy the
string on resize if the string is not resizable is just fine.
2011-10-03 23:19:21 +02:00
Victor Stinner 85041a54bd _PyUnicode_CheckConsistency() checks utf8 field consistency 2011-10-03 14:42:39 +02:00
Victor Stinner a3b334da6d PyUnicode_Ready() now sets ascii=1 if maxchar < 128
ascii=1 is no more reserved to PyASCIIObject. Use
PyUnicode_IS_COMPACT_ASCII(obj) to check if obj is a PyASCIIObject (as before).
2011-10-03 13:53:37 +02:00
Victor Stinner 910337b42e Add _PyUnicode_CheckConsistency() macro to help debugging
* Document Unicode string states
 * Use _PyUnicode_CheckConsistency() to ensure that objects are always
   consistent.
2011-10-03 03:20:16 +02:00
Victor Stinner 37943769ef PyUnicode_READ_CHAR() ensures that the string is ready 2011-10-02 20:33:18 +02:00
Victor Stinner 7a48ff7e06 Use Py_UCS1 instead of unsigned char in unicodeobject.h 2011-10-02 00:55:25 +02:00
Victor Stinner cd9950fd09 PyUnicode_WriteChar() raises IndexError on invalid index
PyUnicode_WriteChar() raises also a ValueError if the string has more than 1
reference.
2011-10-02 00:34:53 +02:00
Victor Stinner 9f789e7f63 _PyUnicode_AsKind() is *not* part of the stable ABI 2011-10-01 03:57:28 +02:00
Victor Stinner 4584a5ba1a PyUnicode_CHARACTER_SIZE(): add a reference to PyUnicode_KIND_SIZE() 2011-10-01 02:39:37 +02:00
Victor Stinner 034f6cf10c Add PyUnicode_Copy() function, include it to the public API 2011-09-30 02:26:44 +02:00
Victor Stinner d8f6510acc _PyUnicode_Ready() cannot be used on ready strings anymore
* Change its prototype: PyObject* instead of PyUnicodeoObject*.
 * Remove an old assertion, the result of PyUnicode_READY (_PyUnicode_Ready)
   must be checked instead
2011-09-29 19:43:17 +02:00
Victor Stinner bc8b81bc4e Move _PyUnicode_UTF8() and _PyUnicode_UTF8_LENGTH() outside unicodeobject.h
Move these macros to unicodeobject.c
2011-09-29 19:31:34 +02:00
Victor Stinner a0702ab1fe Add a note in PyUnicode_CopyCharacters() doc: it doesn't write null character
Cleanup also the code (avoid the goto).
2011-09-29 14:14:38 +02:00
Victor Stinner f5ca1a21a5 PyUnicode_CopyCharacters() fails if 'to' has more than 1 reference 2011-09-28 23:54:59 +02:00
Victor Stinner 17222160e7 Mark _PyUnicode_FindMaxCharAndNumSurrogatePairs() as private 2011-09-28 22:15:37 +02:00
Victor Stinner 157f83fcfc Strip trailing spaces in unicodeobject.[ch] 2011-09-28 21:41:31 +02:00
Victor Stinner be78eaf2de PyUnicode_CopyCharacters() checks for buffer and character overflow
It now returns the number of written characters on success.
2011-09-28 21:37:03 +02:00
Victor Stinner fb5f5f2420 Mark PyUnicode_CONVERT_BYTES as private 2011-09-28 21:39:49 +02:00
Victor Stinner 5ce1b0dbc0 Set Py_UNICODE_REPLACEMENT_CHARACTER type to Py_UCS4, instead of Py_UNICODE 2011-09-28 20:29:27 +02:00
Martin v. Löwis d63a3b8beb Implement PEP 393. 2011-09-28 07:41:54 +02:00
Victor Stinner f955eb210f Merge 3.2: Fix PyUnicode_AsWideCharString() doc
- Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null
   character
 - Fix spelling of the null character
2011-09-06 02:01:29 +02:00
Victor Stinner d88d9836c5 Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null character
Fix also spelling of the null character.
2011-09-06 02:00:05 +02:00
Ezio Melotti 8c9375bb59 #10542: Add 4 macros to work with surrogates: Py_UNICODE_IS_SURROGATE, Py_UNICODE_IS_HIGH_SURROGATE, Py_UNICODE_IS_LOW_SURROGATE, Py_UNICODE_JOIN_SURROGATES. 2011-08-22 20:03:25 +03:00
Victor Stinner 99b9538636 Issue #9642: Uniformize the tests on the availability of the mbcs codec
Add a new HAVE_MBCS define.
2011-07-04 14:23:54 +02:00
Victor Stinner f3fd733f92 Remove useless argument of _PyUnicode_AsDefaultEncodedString() 2011-03-02 01:03:11 +00:00
Victor Stinner 0d711169fa Issue #9738: Ooops, fix typos in my previous commit (r87506) 2010-12-27 02:39:20 +00:00
Victor Stinner dc2081f72b Issue #9738: document encodings of unicode functions 2010-12-27 01:49:29 +00:00
Georg Brandl b550308597 Take PyUnicode_TransformDecimalToASCII out of the limited API. 2010-12-05 11:40:48 +00:00
Alexander Belopolsky 942af5a9a4 Issue #10557: Fixed error messages from float() and other numeric
types.  Added a new API function, PyUnicode_TransformDecimalToASCII(),
which transforms non-ASCII decimal digits in a Unicode string to their
ASCII equivalents.
2010-12-04 03:38:46 +00:00
Martin v. Löwis 4d0d471a80 Merge branches/pep-0384. 2010-12-03 20:14:31 +00:00
Alexander Belopolsky 83283c270a Issue #10413: Updated comments to reflect code changes 2010-11-16 14:29:01 +00:00
Victor Stinner 09f24bb408 Issue #8761: Mangle PyUnicode_CompareWithASCIIString function name for
narrow/wide unicode build.
2010-10-24 20:38:25 +00:00
Benjamin Peterson 8f67d0893f make hashes always the size of pointers; introduce Py_hash_t #9778 2010-10-17 20:54:53 +00:00
Victor Stinner f3170ccef8 Use locale encoding if Py_FileSystemDefaultEncoding is not set
* PyUnicode_EncodeFSDefault(), PyUnicode_DecodeFSDefaultAndSize() and
   PyUnicode_DecodeFSDefault() use the locale encoding instead of UTF-8 if
   Py_FileSystemDefaultEncoding is NULL
 * redecode_filenames() functions and _Py_code_object_list (issue #9630)
   are no more needed: remove them
2010-10-15 12:04:23 +00:00
Victor Stinner beb4135b8c PyUnicode_AsWideCharString() takes a PyObject*, not a PyUnicodeObject*
All unicode functions uses PyObject* except PyUnicode_AsWideChar(). Fix the
prototype for the new function PyUnicode_AsWideCharString().
2010-10-07 01:02:42 +00:00
Victor Stinner 137c34c027 Issue #9979: Create function PyUnicode_AsWideCharString(). 2010-09-29 10:25:54 +00:00
Amaury Forgeot d'Arc feb7307db4 #9210: remove --with-wctype-functions configure option.
The internal unicode database is now always used.

(after 5 years: see
  http://mail.python.org/pipermail/python-dev/2004-December/050193.html
)
2010-09-12 22:42:57 +00:00
Victor Stinner 1205f2774e Issue #9738: PyUnicode_FromFormat() and PyErr_Format() raise an error on
a non-ASCII byte in the format string.

Document also the encoding.
2010-09-11 00:54:47 +00:00
Victor Stinner 46408606d8 Rename PyUnicode_strdup() to PyUnicode_AsUnicodeCopy() 2010-09-03 16:18:00 +00:00
Victor Stinner 71133ff368 Create PyUnicode_strdup() function 2010-09-01 23:43:53 +00:00
Victor Stinner c4eb765fc1 Create Py_UNICODE_strcat() function 2010-09-01 23:43:50 +00:00
Antoine Pitrou fce7fd6426 Issue #9549: sys.setdefaultencoding() and PyUnicode_SetDefaultEncoding()
are now removed, since their effect was inexistent in 3.x (the default
encoding is hardcoded to utf-8 and cannot be changed).
2010-09-01 18:54:56 +00:00
Amaury Forgeot d'Arc 324ac65ceb #5127: Even on narrow unicode builds, the C functions that access the Unicode
Database (Py_UNICODE_TOLOWER, Py_UNICODE_ISDECIMAL, and others) now accept
and return characters from the full Unicode range (Py_UCS4).

The differences from Python code are few:
- unicodedata.numeric(), unicodedata.decimal() and unicodedata.digit()
  now return the correct value for large code points
- repr() may consider more characters as printable.
2010-08-18 20:44:58 +00:00
Victor Stinner ef8d95c498 Issue #9425: Create Py_UNICODE_strncmp() function
The code is based on strncmp() of the libiberty library,
function in the public domain.
2010-08-16 22:03:11 +00:00