Commit Graph

75 Commits

Author SHA1 Message Date
Antoine Pitrou 653dece278 Issue #4426: The UTF-7 decoder was too strict and didn't accept some legal sequences.
Patch by Nick Barnes and Victor Stinner.
2009-05-04 18:32:32 +00:00
Marc-André Lemburg 3741bfdd3c Fix #4846 (Py_UNICODE_ISSPACE causes linker error) by moving the declaration
into the extern "C" section.

Add a few more comments and apply some minor edits to make the file contents
fit the original structure again.
2009-01-05 19:43:35 +00:00
Alexandre Vassalotti a925bed208 Sort UCS-2/UCS-4 name mangling list. 2008-12-28 02:10:35 +00:00
Alexandre Vassalotti d8f8ee4af8 Fix name mangling of PyUnicode_ClearFreeList. 2008-12-28 01:52:58 +00:00
Amaury Forgeot d'Arc 07d539d08b #4122: On Windows, Py_UNICODE_ISSPACE cannot be used in an extension module:
compilation fails with "undefined reference to _Py_ascii_whitespace"

Will backport to 2.6.
2008-10-14 21:47:22 +00:00
Eric Smith dc13b79a38 Refactor and clean up str.format() code (and helpers) in advance of optimizations. 2008-05-30 18:10:04 +00:00
Christian Heimes 3b718a79af Implemented Martin's suggestion to clear the free lists during the garbage collection of the highest generation. 2008-02-14 12:47:33 +00:00
Christian Heimes 4d4f270941 Patch #1970 by Antoine Pitrou: Speedup unicode whitespace and linebreak detection. The speedup is about 25% for split() (571 / 457 usec) and 35% (175 / 127 usec )for splitlines() 2008-01-30 11:32:37 +00:00
Neal Norwitz cfb41c4985 Add stdarg include for va_list to get this to compile on cygwin 2008-01-27 07:41:33 +00:00
Christian Heimes 7f39c9fcbb Backport of several functions from Python 3.0 to 2.6 including PyUnicode_FromString, PyUnicode_Format and PyLong_From/AsSsize_t. The functions are partly required for the backport of the bytearray type and _fileio module. They should also make it easier to port C to 3.0.
First chapter of the Python 3.0 io framework back port: _fileio
The next step depends on a working bytearray type which itself depends on a backport of the nwe buffer API.
2008-01-25 12:18:43 +00:00
Christian Heimes e93237dfcc #1629: Renamed Py_Size, Py_Type and Py_Refcnt to Py_SIZE, Py_TYPE and Py_REFCNT. Macros for b/w compatibility are available. 2007-12-19 02:37:44 +00:00
Amaury Forgeot d'Arc 5087980c1e The incremental decoder for utf-7 must preserve its state between calls.
Solves issue1460.

Might not be a backport candidate: a new API function was added,
and some code may rely on details in utf-7.py.
2007-11-20 23:31:27 +00:00
Walter Dörwald 6e39080649 Backport r57105 and r57145 from the py3k branch: UTF-32 codecs. 2007-08-17 16:41:28 +00:00
Martin v. Löwis 6819210b9e PEP 3123: Provide forward compatibility with Python 3.0, while keeping
backwards compatibility. Add Py_Refcnt, Py_Type, Py_Size, and
PyVarObject_HEAD_INIT.
2007-07-21 06:55:02 +00:00
Neal Norwitz ee3a1b5244 Variation of patch # 1624059 to speed up checking if an object is a subclass
of some of the common builtin types.

Use a bit in tp_flags for each common builtin type.  Check the bit
to determine if any instance is a subclass of these common types.
The check avoids a function call and O(n) search of the base classes.
The check is done in the various Py*_Check macros rather than calling
PyType_IsSubtype().

All the bits are set in tp_flags when the type is declared
in the Objects/*object.c files because PyType_Ready() is not called
for all the types.  Should PyType_Ready() be called for all types?
If so and the change is made, the changes to the Objects/*object.c files
can be reverted (remove setting the tp_flags).  Objects/typeobject.c
would also have to be modified to add conditions
for Py*_CheckExact() in addition to each the PyType_IsSubtype check.
2007-02-25 19:44:48 +00:00
Marc-André Lemburg 040f76b79c Slightly revised version of patch #1538956:
Replace UnicodeDecodeErrors raised during == and !=
compares of Unicode and other objects with a new
UnicodeWarning.

All other comparisons continue to raise exceptions.
Exceptions other than UnicodeDecodeErrors are also left
untouched.
2006-08-14 10:55:19 +00:00
Martin v. Löwis d825143be1 Patch #1455898: Incremental mode for "mbcs" codec. 2006-06-14 05:21:04 +00:00
Martin v. Löwis 3f767795f6 Patch #1359618: Speed-up charmap encoder. 2006-06-04 19:36:28 +00:00
Fredrik Lundh 80f8e80c15 needforspeed: added Py_MEMCPY macro (currently tuned for Visual C only),
and use it for string copy operations.  this gives a 20% speedup on some
string benchmarks.
2006-05-28 12:06:46 +00:00
Fredrik Lundh b3167cbcd7 needforspeed: added rpartition implementation 2006-05-26 18:15:38 +00:00
Fredrik Lundh 06a69dd8ff needforspeed: partition implementation, part two.
feel free to improve the documentation and the docstrings.
2006-05-26 08:54:28 +00:00
Fredrik Lundh 3d885e0195 needforspeed: check first *and* last character before doing a full memcmp 2006-05-23 10:10:57 +00:00
Fredrik Lundh 8a8e05a2b9 needforspeed: use memcpy for "long" strings; use a better algorithm
for long repeats.
2006-05-22 17:12:58 +00:00
Fredrik Lundh f1d60a5384 needforspeed: speed up unicode repeat, unicode string copy 2006-05-22 16:29:30 +00:00
Martin v. Löwis 18e165558b Merge ssize_t branch. 2006-02-15 17:27:45 +00:00
Tim Peters 2576c97f52 _PyUnicode_IsWhitespace(),
_PyUnicode_IsLinebreak():
Changed the declarations to match the definitions.

Don't know why they differed; MSVC warned about it;
don't know why only these two functions use "const".
Someone who does may want to do something saner ;-).
2005-10-29 02:33:18 +00:00
Walter Dörwald a47d1c08d0 SF bug #1251300: On UCS-4 builds the "unicode-internal" codec will now complain
about illegal code points. The codec now supports PEP 293 style error handlers.
(This is a variant of the Nik Haldimann's patch that detects truncated data)
2005-08-30 10:23:14 +00:00
Marc-André Lemburg a9cadcd41b Correct the handling of 0-termination of PyUnicode_AsWideChar()
and its usage in PyLocale_strcoll().

Clarify the documentation on this.

Thanks to Andreas Degert for pointing this out.
2004-11-22 13:02:31 +00:00
Raymond Hettinger 57341c37c9 SF patch #1056231: typo in comment (unicodeobject.h) 2004-10-31 05:46:59 +00:00
Walter Dörwald 69652035bc SF patch #998993: The UTF-8 and the UTF-16 stateful decoders now support
decoding incomplete input (when the input stream is temporarily exhausted).
codecs.StreamReader now implements buffering, which enables proper
readline support for the UTF-16 decoders. codecs.StreamReader.read()
has a new argument chars which specifies the number of characters to
return. codecs.StreamReader.readline() and codecs.StreamReader.readlines()
have a new argument keepends. Trailing "\n"s will be stripped from the lines
if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and
PyUnicode_DecodeUTF16Stateful.
2004-09-07 20:24:22 +00:00
Hye-Shik Chang e9ddfbb412 SF #989185: Drop unicode.iswide() and unicode.width() and add
unicodedata.east_asian_width().  You can still implement your own
simple width() function using it like this:
    def width(u):
        w = 0
        for c in unicodedata.normalize('NFC', u):
            cwidth = unicodedata.east_asian_width(c)
            if cwidth in ('W', 'F'): w += 2
            else: w += 1
        return w
2004-08-04 07:38:35 +00:00
Marc-André Lemburg d2d4598ec2 Allow string and unicode return types from .encode()/.decode()
methods on string and unicode objects. Added unicode.decode()
which was missing for no apparent reason.
2004-07-08 17:57:32 +00:00
Hye-Shik Chang 974ed7cfa5 - SF #962502: Add two more methods for unicode type; width() and
iswide() for east asian width manipulation. (Inspired by David
Goodger, Reviewed by Martin v. Loewis)
- Move _PyUnicode_TypeRecord.flags to the end of the struct so that
no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
2004-06-02 16:49:17 +00:00
Hye-Shik Chang 3ae811b57d Add rsplit method for str and unicode builtin types.
SF feature request #801847.
Original patch is written by Sean Reifschneider.
2003-12-15 18:49:53 +00:00
Marc-André Lemburg 9c329de47e Add name mangling for new PyUnicode_FromOrdinal() and fix declaration
to use new extern macro.
2002-08-12 08:19:10 +00:00
Mark Hammond 91a681debf Excise DL_EXPORT from Include.
Thanks to Skip Montanaro and Kalle Svensson for the patches.
2002-08-12 07:21:58 +00:00
Marc-André Lemburg cc8764ca9d Add C API PyUnicode_FromOrdinal() which exposes unichr() at C level.
u'%c' will now raise a ValueError in case the argument is an
integer outside the valid range of Unicode code point ordinals.

Closes SF bug #593581.
2002-08-11 12:23:04 +00:00
Marc-André Lemburg 4da6fd63bc Fix for bug [ 561796 ] string.find causes lazy error 2002-05-29 11:33:13 +00:00
Walter Dörwald de02bcb265 Apply patch diff.txt from SF feature request
http://www.python.org/sf/444708

This adds the optional argument for str.strip
to unicode.strip too and makes it possible
to call str.strip with a unicode argument
and unicode.strip with a str argument.
2002-04-22 17:42:37 +00:00
Guido van Rossum b8c65bc27f SF patch #470578: Fixes to synchronize unicode() and str()
This patch implements what we have discussed on python-dev late in
    September: str(obj) and unicode(obj) should behave similar, while
    the old behaviour is retained for unicode(obj, encoding, errors).

    The patch also adds a new feature with which objects can provide
    unicode(obj) with input data: the __unicode__ method. Currently no
    new tp_unicode slot is implemented; this is left as option for the
    future.

    Note that PyUnicode_FromEncodedObject() no longer accepts Unicode
    objects as input. The API name already suggests that Unicode
    objects do not belong in the list of acceptable objects and the
    functionality was only needed because
    PyUnicode_FromEncodedObject() was being used directly by
    unicode(). The latter was changed in the discussed way:

    * unicode(obj) calls PyObject_Unicode()
    * unicode(obj, encoding, errors) calls PyUnicode_FromEncodedObject()

    One thing left open to discussion is whether to leave the
    PyUnicode_FromObject() API as a thin API extension on top of
    PyUnicode_FromEncodedObject() or to turn it into a (macro) alias
    for PyObject_Unicode() and deprecate it. Doing so would have some
    surprising consequences though, e.g.  u"abc" + 123 would turn out
    as u"abc123"...

[Marc-Andre didn't have time to check this in before the deadline.  I
hope this is OK, Marc-Andre!  You can still make changes and commit
them on the trunk after the branch has been made, but then please mail
Barry a context diff if you want the change to be merged into the
2.2b1 release branch.  GvR]
2001-10-19 02:01:31 +00:00
Marc-André Lemburg c60e6f7771 Patch #435971: UTF-7 codec by Brian Quinlan. 2001-09-20 10:35:46 +00:00
Marc-André Lemburg 5e6007c5db Fix for bug #462737. 2001-09-19 11:21:03 +00:00
Tim Peters 78e0fc74bc Possibly the end of SF [#460020] bug or feature: unicode() and subclasses.
Changed unicode(i) to return a true Unicode object when i is an instance of
a unicode subclass.  Added PyUnicode_CheckExact macro.
2001-09-11 03:07:38 +00:00
Guido van Rossum 5eef77a21b Make the Py<type>_Check() macro use PyObject_TypeCheck(). 2001-08-30 03:08:07 +00:00
Martin v. Löwis 339d0f720e Patch #445762: Support --disable-unicode
- Do not compile unicodeobject, unicodectype, and unicodedata if Unicode is disabled
- check for Py_USING_UNICODE in all places that use Unicode functions
- disables unicode literals, and the builtin functions
- add the types.StringTypes list
- remove Unicode literals from most tests.
2001-08-17 18:39:25 +00:00
Tim Peters 772747b3f1 SF patch #438013 Remove 2-byte Py_UCS2 assumptions
Removed all instances of Py_UCS2 from the codebase, and so also (I hope)
the last remaining reliance on the platform having an integral type
with exactly 16 bits.
PyUnicode_DecodeUTF16() and PyUnicode_EncodeUTF16() now read and write
one byte at a time.
2001-08-09 22:21:55 +00:00
Marc-André Lemburg b5ac6f62c7 As discussed on python-dev: this patch adds name mangling to
assure that extensions and interpreters using the Unicode APIs
were compiled using the same Unicode width.
2001-07-31 14:30:16 +00:00
Jeremy Hylton 3ce45389bd Add _PyUnicode_AsDefaultEncodedString to unicodeobject.h.
And remove all the extern decls in the middle of .c files.
Apparently, it was excluded from the header file because it is
intended for internal use by the interpreter.  It's still intended for
internal use and documented as such in the header file.
2001-07-30 22:34:24 +00:00
Fredrik Lundh 72b068566a removed "register const" from scalar arguments to the unicode
predicates
2001-06-27 22:08:26 +00:00
Fredrik Lundh 8f4558583f use Py_UNICODE_WIDE instead of USE_UCS4_STORAGE and Py_UNICODE_SIZE
tests.
2001-06-27 18:59:43 +00:00