Victor Stinner
52e2cc8604
backout 7876cd49300d: Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum
2011-12-19 22:14:45 +01:00
Victor Stinner
0ba5af20c0
Move PyUnicode_WCHAR_KIND outside PyUnicode_Kind enum
2011-12-17 22:18:27 +01:00
Victor Stinner
1b57967b96
Issue #13560 : Locale codec functions use the classic "errors" parameter,
...
instead of surrogateescape
So it would be possible to support more error handlers later.
2011-12-17 05:47:23 +01:00
Victor Stinner
f2ea71fcc8
Issue #13560 : Add PyUnicode_EncodeLocale()
...
* Use PyUnicode_EncodeLocale() in time.strftime() if wcsftime() is not
available
* Document my last changes in Misc/NEWS
2011-12-17 04:13:41 +01:00
Victor Stinner
af02e1c85a
Add PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale()
...
* PyUnicode_DecodeLocaleAndSize() and PyUnicode_DecodeLocale() decode a string
from the current locale encoding
* _Py_char2wchar() writes an "error code" in the size argument to indicate
if the function failed because of memory allocation failure or because of a
decoding error. The function doesn't write the error message directly to
stderr.
* Fix time.strftime() (if wcsftime() is missing): decode strftime() result
from the current locale encoding, not from the filesystem encoding.
2011-12-16 23:56:01 +01:00
Victor Stinner
16e6a80923
PyUnicode_Resize(): warn about canonical representation
...
Call also directly unicode_resize() in unicodeobject.c
2011-12-12 13:24:15 +01:00
Victor Stinner
b0a82a6a7f
Fix PyUnicode_Resize() for compact string: leave the string unchanged on error
...
Fix also PyUnicode_Resize() doc
2011-12-12 13:08:33 +01:00
Victor Stinner
bf6e560d0c
Make PyUnicode_Copy() private => _PyUnicode_Copy()
...
Undocument the function.
Make also decode_utf8_errors() as private (static).
2011-12-12 01:53:47 +01:00
Victor Stinner
7a9105a380
resize_copy() now supports legacy ready strings
2011-12-12 00:13:42 +01:00
Victor Stinner
24c74be9a3
PyUnicode_IS_ASCII() macro ensures that the string is ready
...
It has no sense to check if a not ready string is ASCII or not.
2011-12-12 01:24:20 +01:00
Victor Stinner
551ac95733
Py_UNICODE_HIGH_SURROGATE() and Py_UNICODE_LOW_SURROGATE() macros
...
And use surrogates macros everywhere in unicodeobject.c
2011-11-29 22:58:13 +01:00
Victor Stinner
f3ae6208c7
PyUnicode_GET_SIZE() checks that PyUnicode_AsUnicode() succeed
...
using an assertion
2011-11-21 02:24:49 +01:00
Victor Stinner
77faf69ca1
_PyUnicode_CheckConsistency() also checks maxchar maximum value,
...
not only its minimum value
2011-11-20 18:56:05 +01:00
Victor Stinner
9343999597
Fix PyUnicode_CopyCharacters() doc
2011-11-20 18:29:14 +01:00
Victor Stinner
7c8bbbbb0c
Ensure that Py_UCS4 is 32 bits and Py_UCS2 is 16 bits
2011-11-20 18:28:29 +01:00
Victor Stinner
6f9568bb1f
Fix misused of "PyUnicodeObject" structure name in unicodeobject.h
2011-11-17 00:12:44 +01:00
Martin v. Löwis
1db7c13be1
Port encoders from Py_UNICODE API to unicode object API.
2011-11-10 18:24:32 +01:00
Martin v. Löwis
d10759f6ed
Make _PyUnicode_FromId return borrowed references.
...
http://mail.python.org/pipermail/python-dev/2011-November/114347.html
2011-11-07 13:00:05 +01:00
Victor Stinner
e30c0a1014
Fix gdb/libpython.py for not ready Unicode strings
...
_PyUnicode_CheckConsistency() checks also hash and length value for not ready
Unicode strings.
2011-11-04 20:54:05 +01:00
Victor Stinner
7931d9a951
Replace PyUnicodeObject type by PyObject
...
* _PyUnicode_CheckConsistency() now takes a PyObject* instead of void*
* Remove now useless casts to PyObject*
2011-11-04 00:22:48 +01:00
Martin v. Löwis
23e275b3ad
Port UCS1 and charmap codecs to new API.
2011-11-02 18:02:51 +01:00
Martin v. Löwis
0d3072e98d
Drop Py_UCS4_ functions. Closes #13246 .
2011-10-31 08:40:56 +01:00
Victor Stinner
9db1a8b69f
Replace PyUnicodeObject* by PyObject* where it was irrevelant
...
A Unicode string can now be a PyASCIIObject, PyCompactUnicodeObject or
PyUnicodeObject. Aliasing a PyASCIIObject* or PyCompactUnicodeObject* to
PyUnicodeObject* is wrong
2011-10-23 20:04:37 +02:00
Victor Stinner
55c7e00fc0
Simplify _PyUnicode_COMPACT_DATA() macro
2011-10-18 23:32:53 +02:00
Victor Stinner
3a50e7056e
Issue #12281 : Rewrite the MBCS codec to handle correctly replace and ignore
...
error handlers on all Windows versions. The MBCS codec is now supporting all
error handlers, instead of only replace to encode and ignore to decode.
2011-10-18 21:21:00 +02:00
Martin v. Löwis
bd928fef42
Rename _Py_identifier to _Py_IDENTIFIER.
2011-10-14 10:20:37 +02:00
Victor Stinner
8813104e53
Simplify PyUnicode_MAX_CHAR_VALUE
...
Use PyUnicode_IS_ASCII instead of PyUnicode_IS_COMPACT_ASCII, so the following
test can be removed:
PyUnicode_DATA(op) == (((PyCompactUnicodeObject *)(op))->utf8)
2011-10-13 01:12:01 +02:00
Martin v. Löwis
87da872c69
Drop extra semicolon.
2011-10-09 11:54:42 +02:00
Martin v. Löwis
afe55bba33
Add API for static strings, primarily good for identifiers.
...
Thanks to Konrad Schöbel and Jasper Schulz for helping with the mass-editing.
2011-10-09 10:38:36 +02:00
Martin v. Löwis
c47adb04b3
Change PyUnicode_KIND to 1,2,4. Drop _KIND_SIZE and _CHARACTER_SIZE.
2011-10-07 20:55:35 +02:00
Georg Brandl
db6c7f5c33
Update C API docs for PEP 393.
2011-10-07 11:19:11 +02:00
Victor Stinner
b066cc6aba
Fix PyUnicode_CHARACTER_SIZE and PyUnicode_KIND_SIZE
2011-10-06 15:54:53 +02:00
Antoine Pitrou
dbf697ae5c
Fix compilation warnings under 64-bit Windows
2011-10-06 15:34:41 +02:00
Éric Araujo
0f4ee93b06
Branch merge
2011-10-06 13:22:21 +02:00
Victor Stinner
1d4b35f4e5
rephrase PyUnicode_1BYTE_KIND documentation
2011-10-06 01:51:19 +02:00
Victor Stinner
fb9ea8c57e
Don't check for the maximum character when copying from unicodeobject.c
...
* Create copy_characters() function which doesn't check for the maximum
character in release mode
* _PyUnicode_CheckConsistency() is no more static to be able to use it
in _PyUnicode_FormatAdvanced() (in formatter_unicode.c)
* _PyUnicode_CheckConsistency() checks the string hash
2011-10-06 01:45:57 +02:00
Éric Araujo
80a348c0a0
Fix typo
2011-10-05 01:11:12 +02:00
Victor Stinner
30134f53fc
Complete documentation of compact ASCII strings
2011-10-04 01:32:45 +02:00
Victor Stinner
a41463c203
Document utf8_length and wstr_length states
...
Ensure these states with assertions in _PyUnicode_CheckConsistency().
2011-10-04 01:05:08 +02:00
Victor Stinner
7f11ad4594
Unicode: document when the wstr pointer is shared with data
...
Add also related assertions to _PyUnicode_CheckConsistency().
2011-10-04 00:00:20 +02:00
Victor Stinner
8cfcbed4e3
Improve string forms and PyUnicode_Resize() documentation
...
Remove also the FIXME for resize_copy(): as discussed with Martin, copy the
string on resize if the string is not resizable is just fine.
2011-10-03 23:19:21 +02:00
Victor Stinner
c3cec7868b
Add asciilib: similar to ucs1, ucs2 and ucs4 library, but specialized to ASCII
...
ucs1, ucs2 and ucs4 libraries have to scan created substring to find the
maximum character, whereas it is not need to ASCII strings. Because ASCII
strings are common, it is useful to optimize ASCII.
2011-10-05 21:24:08 +02:00
Victor Stinner
4d0d54bcba
Document requierements of Unicode kinds
2011-10-05 01:31:05 +02:00
Georg Brandl
07de325672
More fixes.
2011-10-05 16:47:38 +02:00
Georg Brandl
c6bc4c6897
Fix a few typos in the unicode header.
2011-10-05 16:23:09 +02:00
Georg Brandl
4975a9b44d
Fix grammar.
2011-10-05 16:12:21 +02:00
Victor Stinner
b9275c104e
Speedup str[a:b] and PyUnicode_FromKindAndData
...
* str[a:b] doesn't scan the string for the maximum character if the string
is ascii only
* PyUnicode_FromKindAndData() stops if we are sure that we cannot use a
shorter character type. For example, _PyUnicode_FromUCS1() stops if we
have at least one character in range U+0080-U+00FF
2011-10-05 14:01:42 +02:00
Victor Stinner
85041a54bd
_PyUnicode_CheckConsistency() checks utf8 field consistency
2011-10-03 14:42:39 +02:00
Victor Stinner
a3b334da6d
PyUnicode_Ready() now sets ascii=1 if maxchar < 128
...
ascii=1 is no more reserved to PyASCIIObject. Use
PyUnicode_IS_COMPACT_ASCII(obj) to check if obj is a PyASCIIObject (as before).
2011-10-03 13:53:37 +02:00
Victor Stinner
910337b42e
Add _PyUnicode_CheckConsistency() macro to help debugging
...
* Document Unicode string states
* Use _PyUnicode_CheckConsistency() to ensure that objects are always
consistent.
2011-10-03 03:20:16 +02:00
Victor Stinner
37943769ef
PyUnicode_READ_CHAR() ensures that the string is ready
2011-10-02 20:33:18 +02:00
Victor Stinner
7a48ff7e06
Use Py_UCS1 instead of unsigned char in unicodeobject.h
2011-10-02 00:55:25 +02:00
Victor Stinner
cd9950fd09
PyUnicode_WriteChar() raises IndexError on invalid index
...
PyUnicode_WriteChar() raises also a ValueError if the string has more than 1
reference.
2011-10-02 00:34:53 +02:00
Victor Stinner
9f789e7f63
_PyUnicode_AsKind() is *not* part of the stable ABI
2011-10-01 03:57:28 +02:00
Victor Stinner
4584a5ba1a
PyUnicode_CHARACTER_SIZE(): add a reference to PyUnicode_KIND_SIZE()
2011-10-01 02:39:37 +02:00
Victor Stinner
034f6cf10c
Add PyUnicode_Copy() function, include it to the public API
2011-09-30 02:26:44 +02:00
Victor Stinner
d8f6510acc
_PyUnicode_Ready() cannot be used on ready strings anymore
...
* Change its prototype: PyObject* instead of PyUnicodeoObject*.
* Remove an old assertion, the result of PyUnicode_READY (_PyUnicode_Ready)
must be checked instead
2011-09-29 19:43:17 +02:00
Victor Stinner
bc8b81bc4e
Move _PyUnicode_UTF8() and _PyUnicode_UTF8_LENGTH() outside unicodeobject.h
...
Move these macros to unicodeobject.c
2011-09-29 19:31:34 +02:00
Victor Stinner
a0702ab1fe
Add a note in PyUnicode_CopyCharacters() doc: it doesn't write null character
...
Cleanup also the code (avoid the goto).
2011-09-29 14:14:38 +02:00
Victor Stinner
f5ca1a21a5
PyUnicode_CopyCharacters() fails if 'to' has more than 1 reference
2011-09-28 23:54:59 +02:00
Victor Stinner
17222160e7
Mark _PyUnicode_FindMaxCharAndNumSurrogatePairs() as private
2011-09-28 22:15:37 +02:00
Victor Stinner
157f83fcfc
Strip trailing spaces in unicodeobject.[ch]
2011-09-28 21:41:31 +02:00
Victor Stinner
be78eaf2de
PyUnicode_CopyCharacters() checks for buffer and character overflow
...
It now returns the number of written characters on success.
2011-09-28 21:37:03 +02:00
Victor Stinner
fb5f5f2420
Mark PyUnicode_CONVERT_BYTES as private
2011-09-28 21:39:49 +02:00
Victor Stinner
5ce1b0dbc0
Set Py_UNICODE_REPLACEMENT_CHARACTER type to Py_UCS4, instead of Py_UNICODE
2011-09-28 20:29:27 +02:00
Martin v. Löwis
d63a3b8beb
Implement PEP 393.
2011-09-28 07:41:54 +02:00
Victor Stinner
f955eb210f
Merge 3.2: Fix PyUnicode_AsWideCharString() doc
...
- Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null
character
- Fix spelling of the null character
2011-09-06 02:01:29 +02:00
Victor Stinner
d88d9836c5
Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null character
...
Fix also spelling of the null character.
2011-09-06 02:00:05 +02:00
Ezio Melotti
8c9375bb59
#10542 : Add 4 macros to work with surrogates: Py_UNICODE_IS_SURROGATE, Py_UNICODE_IS_HIGH_SURROGATE, Py_UNICODE_IS_LOW_SURROGATE, Py_UNICODE_JOIN_SURROGATES.
2011-08-22 20:03:25 +03:00
Victor Stinner
99b9538636
Issue #9642 : Uniformize the tests on the availability of the mbcs codec
...
Add a new HAVE_MBCS define.
2011-07-04 14:23:54 +02:00
Victor Stinner
f3fd733f92
Remove useless argument of _PyUnicode_AsDefaultEncodedString()
2011-03-02 01:03:11 +00:00
Victor Stinner
0d711169fa
Issue #9738 : Ooops, fix typos in my previous commit (r87506)
2010-12-27 02:39:20 +00:00
Victor Stinner
dc2081f72b
Issue #9738 : document encodings of unicode functions
2010-12-27 01:49:29 +00:00
Georg Brandl
b550308597
Take PyUnicode_TransformDecimalToASCII out of the limited API.
2010-12-05 11:40:48 +00:00
Alexander Belopolsky
942af5a9a4
Issue #10557 : Fixed error messages from float() and other numeric
...
types. Added a new API function, PyUnicode_TransformDecimalToASCII(),
which transforms non-ASCII decimal digits in a Unicode string to their
ASCII equivalents.
2010-12-04 03:38:46 +00:00
Martin v. Löwis
4d0d471a80
Merge branches/pep-0384.
2010-12-03 20:14:31 +00:00
Alexander Belopolsky
83283c270a
Issue #10413 : Updated comments to reflect code changes
2010-11-16 14:29:01 +00:00
Victor Stinner
09f24bb408
Issue #8761 : Mangle PyUnicode_CompareWithASCIIString function name for
...
narrow/wide unicode build.
2010-10-24 20:38:25 +00:00
Benjamin Peterson
8f67d0893f
make hashes always the size of pointers; introduce Py_hash_t #9778
2010-10-17 20:54:53 +00:00
Victor Stinner
f3170ccef8
Use locale encoding if Py_FileSystemDefaultEncoding is not set
...
* PyUnicode_EncodeFSDefault(), PyUnicode_DecodeFSDefaultAndSize() and
PyUnicode_DecodeFSDefault() use the locale encoding instead of UTF-8 if
Py_FileSystemDefaultEncoding is NULL
* redecode_filenames() functions and _Py_code_object_list (issue #9630 )
are no more needed: remove them
2010-10-15 12:04:23 +00:00
Victor Stinner
beb4135b8c
PyUnicode_AsWideCharString() takes a PyObject*, not a PyUnicodeObject*
...
All unicode functions uses PyObject* except PyUnicode_AsWideChar(). Fix the
prototype for the new function PyUnicode_AsWideCharString().
2010-10-07 01:02:42 +00:00
Victor Stinner
137c34c027
Issue #9979 : Create function PyUnicode_AsWideCharString().
2010-09-29 10:25:54 +00:00
Amaury Forgeot d'Arc
feb7307db4
#9210 : remove --with-wctype-functions configure option.
...
The internal unicode database is now always used.
(after 5 years: see
http://mail.python.org/pipermail/python-dev/2004-December/050193.html
)
2010-09-12 22:42:57 +00:00
Victor Stinner
1205f2774e
Issue #9738 : PyUnicode_FromFormat() and PyErr_Format() raise an error on
...
a non-ASCII byte in the format string.
Document also the encoding.
2010-09-11 00:54:47 +00:00
Victor Stinner
46408606d8
Rename PyUnicode_strdup() to PyUnicode_AsUnicodeCopy()
2010-09-03 16:18:00 +00:00
Victor Stinner
71133ff368
Create PyUnicode_strdup() function
2010-09-01 23:43:53 +00:00
Victor Stinner
c4eb765fc1
Create Py_UNICODE_strcat() function
2010-09-01 23:43:50 +00:00
Antoine Pitrou
fce7fd6426
Issue #9549 : sys.setdefaultencoding() and PyUnicode_SetDefaultEncoding()
...
are now removed, since their effect was inexistent in 3.x (the default
encoding is hardcoded to utf-8 and cannot be changed).
2010-09-01 18:54:56 +00:00
Amaury Forgeot d'Arc
324ac65ceb
#5127 : Even on narrow unicode builds, the C functions that access the Unicode
...
Database (Py_UNICODE_TOLOWER, Py_UNICODE_ISDECIMAL, and others) now accept
and return characters from the full Unicode range (Py_UCS4).
The differences from Python code are few:
- unicodedata.numeric(), unicodedata.decimal() and unicodedata.digit()
now return the correct value for large code points
- repr() may consider more characters as printable.
2010-08-18 20:44:58 +00:00
Victor Stinner
ef8d95c498
Issue #9425 : Create Py_UNICODE_strncmp() function
...
The code is based on strncmp() of the libiberty library,
function in the public domain.
2010-08-16 22:03:11 +00:00
Victor Stinner
47fcb5b4c3
Issue #9542 : Create PyUnicode_FSDecoder() function
...
It's a ParseTuple converter: decode bytes objects to unicode using
PyUnicode_DecodeFSDefaultAndSize(); str objects are output as-is.
* Don't specify surrogateescape error handler in the comments nor the
documentation, but PyUnicode_DecodeFSDefaultAndSize() and
PyUnicode_EncodeFSDefault() because these functions use strict error handler
for the mbcs encoding (on Windows).
* Remove PyUnicode_FSConverter() comment in unicodeobject.c to avoid
inconsistency with unicodeobject.h.
2010-08-13 23:59:58 +00:00
Victor Stinner
331ea92ade
Issue #9425 : create Py_UNICODE_strrchr() function
2010-08-10 16:37:20 +00:00
Georg Brandl
952867aa30
#9078 : fix some Unicode C API descriptions, in comments and docs.
2010-06-27 10:17:12 +00:00
Benjamin Peterson
ccbd69437a
rephrase
2010-05-15 17:43:18 +00:00
Victor Stinner
ae6265f8d0
Issue #8715 : Create PyUnicode_EncodeFSDefault() function: Encode a Unicode
...
object to Py_FileSystemDefaultEncoding with the "surrogateescape" error
handler, return a bytes object. If Py_FileSystemDefaultEncoding is not set,
fall back to UTF-8.
2010-05-15 16:27:27 +00:00
Victor Stinner
77c3862417
Issue #8711 : Document PyUnicode_DecodeFSDefault*() functions
...
* Add paragraph titles to c-api/unicode.rst.
* Fix PyUnicode_DecodeFSDefault*() comment: it now uses the "surrogateescape"
error handler (and not "replace")
* Remove "The function is intended to be used for paths and file names only
during bootstrapping process where the codecs are not set up." from
PyUnicode_FSConverter() comment: it is used after the bootstrapping and for
other purposes than file names
2010-05-14 15:58:55 +00:00
Antoine Pitrou
f95a1b3c53
Recorded merge of revisions 81029 via svnmerge from
...
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81029 | antoine.pitrou | 2010-05-09 16:46:46 +0200 (dim., 09 mai 2010) | 3 lines
Untabify C files. Will watch buildbots.
........
2010-05-09 15:52:27 +00:00
Benjamin Peterson
ad465f904b
alias PyUnicode_CompareWithASCII
2010-05-07 20:21:26 +00:00
Victor Stinner
dcb2403022
Issue #8485 : PyUnicode_FSConverter() doesn't accept bytearray object anymore,
...
you have to convert your bytearray filenames to bytes
2010-04-22 12:08:36 +00:00
Martin v. Löwis
011e842033
Issue #5915 : Implement PEP 383, Non-decodable Bytes in
...
System Character Interfaces.
2009-05-05 04:43:17 +00:00