Commit Graph

403 Commits

Author SHA1 Message Date
Ezio Melotti e57e50c8e7 Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.
1) #8271: when a byte sequence is invalid, only the start byte and all the
   valid continuation bytes are now replaced by U+FFFD, instead of replacing
   the number of bytes specified by the start byte.
   See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
   in behavior);
3) Add code and tests to reject surrogates (U+D800-U+DFFF) as defined in
   RFC 3629, but leave it commented out since it's not backward compatible;
4) Change the error messages "unexpected code byte" to "invalid start byte"
   and "invalid data" to "invalid continuation byte";
5) Add an extensive set of tests in test_unicode;
6) Fix test_codeccallbacks because it was failing after this change.
2010-06-05 17:51:07 +00:00
Brett Cannon a7f13ee3f5 Remove an unneeded variable and assignment.
Found using Clang's static analyzer.
2010-05-04 01:16:51 +00:00
Benjamin Peterson bea424af98 more _PyString_Resize error checking 2010-04-03 00:57:33 +00:00
Florent Xicluna 22b243809e #7643: Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according to Unicode Standard Annex #14. 2010-03-30 08:24:06 +00:00
Larry Hastings 402b73fb8d Backported PyCapsule from 3.1, and converted most uses of
CObject to PyCapsule.
2010-03-25 00:54:54 +00:00
Victor Stinner 95affc4449 Issue #1583863: An unicode subclass can now override the __str__ method 2010-03-22 12:24:37 +00:00
Ezio Melotti 321251567e #7649: "u'%c' % char" now behaves like "u'%s' % char" and raises a UnicodeDecodeError if 'char' is a byte string that can't be decoded using the default encoding. 2010-02-25 17:36:04 +00:00
Victor Stinner f20f9c299e Issue #7649: Fix u'%c' % char for character in range 0x80..0xFF
=> raise an UnicodeDecodeError. Patch written by Ezio Melotti.
2010-02-23 23:16:07 +00:00
Ezio Melotti 1fafaab5e5 #7775: fixed docstring for rpartition 2010-01-25 11:24:37 +00:00
Antoine Pitrou 10042922d9 Sanitize bloom filter macros 2010-01-13 14:01:26 +00:00
Antoine Pitrou 5c767c2f87 Fix Windows build (re r77461) 2010-01-13 08:55:20 +00:00
Antoine Pitrou 6467213bfd Issue #7622: Improve the split(), rsplit(), splitlines() and replace()
methods of bytes, bytearray and unicode objects by using a common
implementation based on stringlib's fast search.  Patch by Florent Xicluna.
2010-01-13 07:55:48 +00:00
R. David Murray 0a0a1a842c Issue #1680159: unicode coercion during an 'in' operation was masking
any errors that might occur during coercion of the left operand and
turning them into a TypeError with a message text that was confusing in
the given context.  This patch lets any errors through, as was already
done during coercion of the right hand side.
2009-12-14 16:28:26 +00:00
Eric Smith c4ab8339e9 Issue #3382: Make '%F' and float.__format__('F') convert results to upper case. Much of the patch came from Mark Dickinson. 2009-11-29 17:40:57 +00:00
Mark Dickinson 9dd5e16c5d Issue #7117, continued: Remove substitution of %g-style formatting for
%f-style formatting, which used to occur at high precision.  Float formatting
should now be consistent between 2.7 and 3.1.
2009-11-23 20:54:09 +00:00
Mark Dickinson 18cfada1ea Remove restriction on precision when formatting floats. This is the
first step towards removing the %f -> %g switch (see issues 7117,
5859).
2009-11-23 18:46:41 +00:00
Eric Smith c1bdf89145 Finished removing _PyOS_double_to_string, as mentioned in issue 7117. 2009-10-26 17:46:17 +00:00
Georg Brandl 9b4e5820cb #7116: str.join() takes an iterable. 2009-10-14 18:48:32 +00:00
Benjamin Peterson 332d721750 add keyword arguments support to str/unicode encode and decode #6300 2009-09-18 21:14:55 +00:00
Georg Brandl e9741f3ed8 Issue #6922: Fix an infinite loop when trying to decode an invalid
UTF-32 stream with a non-raising error handler like "replace" or "ignore".
2009-09-17 11:28:09 +00:00
Mark Dickinson 2fdd58ad18 Silence gcc 'comparison always false' warning 2009-08-28 20:46:24 +00:00
Alexandre Vassalotti fd00916c2e Grow the allocated buffer in PyUnicode_EncodeUTF7 to avoid buffer overrun.
Without this change, test_unicode.UnicodeTest.test_codecs_utf7 crashes in
debug mode. What happens is the unicode string u'\U000abcde' with a length
of 1 encodes to the string '+2m/c3g-' of length 8. Since only 5 bytes is
reserved in the buffer, a buffer overrun occurs.
2009-07-07 02:17:30 +00:00
Georg Brandl 18187e2167 #6224: s/JPython/Jython/, and remove one link to a module nine years old. 2009-06-06 18:21:58 +00:00
Georg Brandl ba68a99656 #5929: fix signedness warning. 2009-05-05 09:19:43 +00:00
Antoine Pitrou 653dece278 Issue #4426: The UTF-7 decoder was too strict and didn't accept some legal sequences.
Patch by Nick Barnes and Victor Stinner.
2009-05-04 18:32:32 +00:00
Walter Dörwald 342c8db859 There's no %A in Python 2.x! 2009-05-03 22:46:07 +00:00
Walter Dörwald ed960ac404 Issue #5108: Handle %s like %S and %R in PyUnicode_FromFormatV(): Call
PyUnicode_DecodeUTF8() once, remember the result and output it in a second
step. This avoids problems with counting UTF-8 bytes that ignores the effect
of using the replace error handler in PyUnicode_DecodeUTF8().
2009-05-03 22:36:33 +00:00
Eric Smith 068f06568b Issue #5835, deprecate PyOS_ascii_formatd.
If anyone wants to clean up the documentation, feel free. It's my first documentation foray, and it's not that great.

Will port to py3k with a different strategy.
2009-04-25 21:40:15 +00:00
Mark Dickinson d4814bfa23 Issue #532631: Apply floatformat changes to unicodeobject.c
as well as stringobject.c.
2009-03-29 16:24:29 +00:00
Mark Dickinson 2e648ecc7d Issue #532631: Replace confusing fabs(x)/1e25 >= 1e25 test
with fabs(x) >= 1e50, and fix documentation.
2009-03-29 14:37:51 +00:00
Hirokazu Yamamoto 52a3492efb There is no macro named SIZEOF_SSIZE_T. Should use SIZEOF_SIZE_T instead. 2009-03-21 10:32:52 +00:00
Mark Dickinson 6b265f1bf8 Issue 4474: On platforms with sizeof(wchar_t) == 4 and
sizeof(Py_UNICODE) == 2, PyUnicode_FromWideChar now converts
each character outside the BMP to the appropriate surrogate pair.

Thanks Victor Stinner for the patch.

(backport of r70452 from py3k to trunk)
2009-03-18 16:07:26 +00:00
Mark Dickinson 3e4caeb3bf Issue #5341: Fix a variety of spelling errors. 2009-02-21 20:27:01 +00:00
Georg Brandl cbb4958cd8 Fix warnings GCC emits where the argument of PyErr_Format is a single variable. 2009-02-13 11:06:59 +00:00
Benjamin Peterson 1c5d21d644 fix indentation in comment 2009-01-31 22:33:02 +00:00
Benjamin Peterson be1399e39a fix indentation; looks like all I managed to do the first time is make things uglier 2009-01-31 22:03:19 +00:00
Benjamin Peterson d17fec74e5 fix indentation 2009-01-31 21:47:42 +00:00
Benjamin Peterson 857ce15791 completely detabify unicodeobject.c 2009-01-31 16:29:18 +00:00
Alexandre Vassalotti 034e08ce8d Remove unnecessary casts related to unicode_decode_call_errorhandler.
Make the _PyUnicode_Resize macro a static function.

These changes are needed to avoid breaking strict aliasing rules.
2008-12-27 06:36:10 +00:00
Amaury Forgeot d'Arc 2a1fd05971 Fix a small typo in docstring 2008-11-29 02:03:32 +00:00
Andrew M. Kuchling efeb43eb31 Docstring change for *partition: use same tense as other docstrings.
Hyphenate left- and right-justified.
Fix 'registerd' typo
2008-10-04 01:05:56 +00:00
Christian Heimes 32a66a0410 Fixed a couple more C99 comments and one occurence of inline. 2008-10-02 19:47:50 +00:00
Georg Brandl 98064078f7 Fix varname in docstring. #3822. 2008-09-09 19:26:00 +00:00
Amaury Forgeot d'Arc 06847b13ca Correct a crash when two successive unicode allocations fail with a MemoryError:
the freelist contained half-initialized objects with freed pointers.

The comment
/* XXX UNREF/NEWREF interface should be more symmetrical */
was copied from tupleobject.c, and appears in some other places.
I sign the petition.
2008-07-31 23:39:05 +00:00
Neal Norwitz e7d8be80ba Security patches from Apple: prevent int overflow when allocating memory 2008-07-31 17:17:14 +00:00
Antoine Pitrou 4982d5d04a #2242: utf7 decoding crashes on bogus input on some Windows/MSVC versions 2008-07-25 17:45:59 +00:00
Eric Smith d6c393ab2b Backed out r65069, pending fixing it in Windows. 2008-07-17 19:49:47 +00:00
Eric Smith 454816d8bd Issue 3382: Make '%F' and float.__format__('F') convert results to upper case. 2008-07-17 17:48:39 +00:00
Robert Schuppenies 9be2ec109b Added additional __sizeof__ implementations and addressed comments made in
Issue3122.
2008-07-10 15:24:04 +00:00
Robert Schuppenies 901c997de0 Issue 3048: Fixed sys.getsizeof for unicode objects. 2008-06-10 10:10:31 +00:00