cpython

Commit Graph

Author	SHA1	Message	Date
Raymond Hettinger	0ebac97058	Patch 549187. Improve string formatting error message.	2002-05-21 15:14:57 +00:00
Tim Peters	5de9842b34	Repair widespread misuse of _PyString_Resize. Since it's clear people don't understand how this function works, also beefed up the docs. The most common usage error is of this form (often spread out across gotos): if (_PyString_Resize(&s, n) < 0) { Py_DECREF(s); s = NULL; goto outtahere; } The error is that if _PyString_Resize runs out of memory, it automatically decrefs the input string object s (which also deallocates it, since its refcount must be 1 upon entry), and sets s to NULL. So if the "if" branch ever triggers, it's an error to call Py_DECREF(s): s is already NULL! A correct way to write the above is the simpler (and intended) if (_PyString_Resize(&s, n) < 0) goto outtahere; Bugfix candidate.	2002-04-27 18:44:32 +00:00
Tim Peters	602f740bc2	SF patch 549375: Compromise PyUnicode_EncodeUTF8 This implements ideas from Marc-Andre, Martin, Guido and me on Python-Dev. "Short" Unicode strings are encoded into a "big enough" stack buffer, then exactly as much string space as they turn out to need is allocated at the end. This should have speed benefits akin to Martin's "measure once, allocate once" strategy, but without needing a distinct measuring pass. "Long" Unicode strings allocate as much heap space as they could possibly need (4 x # Unicode chars), and do a realloc at the end to return the untouched excess. Since the overallocation is likely to be substantial, this shouldn't burden the platform realloc with unusably small excess blocks. Also simplified uses of the PyString_xyz functions. Also added a release- build check that 4*size doesn't overflow a C int. Sooner or later, that's going to happen.	2002-04-27 18:03:26 +00:00
Tim Peters	030a5cebf4	unicode_memchr(): Squashed gratuitous int-vs-size_t mismatch (which gives a compiler wng under MSVC because of the resulting signed-vs- unsigned comparison).	2002-04-22 19:00:10 +00:00
Walter Dörwald	de02bcb265	Apply patch diff.txt from SF feature request http://www.python.org/sf/444708 This adds the optional argument for str.strip to unicode.strip too and makes it possible to call str.strip with a unicode argument and unicode.strip with a str argument.	2002-04-22 17:42:37 +00:00
Tim Peters	0eca65c4c5	PyUnicode_EncodeUTF8(): tightened the memory asserts a bit, and at least tried to catch some possible arithmetic overflows in the debug build.	2002-04-21 17:28:06 +00:00
Martin v. Löwis	2a7ff35a07	Back out 2.140.	2002-04-21 09:59:45 +00:00
Tim Peters	7e3d961fc1	PyUnicode_EncodeUTF8: squash compiler wng. The difference of two pointers is a signed type. Changing "allocated" to a signed int makes undetected overflow more likely, but there was no overflow detection before either.	2002-04-21 03:26:37 +00:00
Martin v. Löwis	a4eb14b7a4	Patch #495401 : Count number of required bytes for encoding UTF-8 before allocating the target buffer.	2002-04-20 13:44:01 +00:00
Walter Dörwald	0fe940c862	Return the orginal string only if it's a real str or unicode instance, otherwise make a copy.	2002-04-15 18:42:15 +00:00
Walter Dörwald	068325ef92	Apply the second version of SF patch http://www.python.org/sf/536241 Add a method zfill to str, unicode and UserString and change Lib/string.py accordingly. This activates the zfill version in unicodeobject.c that was commented out and implements the same in stringobject.c. It also adds the test for unicode support in Lib/string.py back in and uses repr() instead() of str() (as it was before Lib/string.py 1.62)	2002-04-15 13:36:47 +00:00
Neil Schemenauer	58aa861fa2	Remove PyMalloc_*.	2002-04-12 03:07:20 +00:00
Marc-André Lemburg	68e69338ae	Bug fix for UTF-8 encoding bug (buffer overrun) #541828 .	2002-04-10 20:36:13 +00:00
Marc-André Lemburg	ce0b664af2	Added test case for UTF-8 encoding bug #541828 .	2002-04-10 17:18:02 +00:00
Guido van Rossum	77f6a65eb0	Add the 'bool' type and its values 'False' and 'True', as described in PEP 285. Everything described in the PEP is here, and there is even some documentation. I had to fix 12 unit tests; all but one of these were printing Boolean outcomes that changed from 0/1 to False/True. (The exception is test_unicode.py, which did a type(x) == type(y) style comparison. I could've fixed that with a single line using issubtype(x, type(y)), but instead chose to be explicit about those places where a bool is expected. Still to do: perhaps more documentation; change standard library modules to return False/True from predicates.	2002-04-03 22:41:51 +00:00
Walter Dörwald	8c077227f2	Fix whitespace.	2002-03-25 11:16:18 +00:00
Neil Schemenauer	dcc819a5c9	Use pymalloc if it's enabled.	2002-03-22 15:33:15 +00:00
Martin v. Löwis	047c05ebc4	Do not insert characters for unicode-escape decoders if the error mode is "ignore". Fixes #529104.	2002-03-21 08:55:28 +00:00
Andrew MacIntyre	5e9c80d906	%#x/%#X format conversion cleanup (see patch #450267 ): Objects/ stringobject.c unicodeobject.c	2002-02-28 11:38:24 +00:00
Andrew MacIntyre	c487439aa7	OS/2 EMX port changes (Objects part of patch #450267 ): Objects/ fileobject.c stringobject.c unicodeobject.c This commit doesn't include the cleanup patches for stringobject.c and unicodeobject.c which are shown separately in the patch manager. Those patches will be regenerated and applied in a subsequent commit, so as to preserve a fallback position (this commit to those files).	2002-02-26 11:36:35 +00:00
Marc-André Lemburg	bd3be8f0ca	Fix to the UTF-8 encoder: it failed on 0-length input strings. Fix for the UTF-8 decoder: it will now accept isolated surrogates (previously it raised an exception which causes round-trips to fail). Added new tests for UTF-8 round-trip safety (we rely on UTF-8 for marshalling Unicode objects, so we better make sure it works for all Unicode code points, including isolated surrogates). Bumped the PYC magic in a non-standard way -- please review. This was needed because the old PYC format used illegal UTF-8 sequences for isolated high surrogates which now raise an exception.	2002-02-07 11:33:49 +00:00
Marc-André Lemburg	dc724d6e35	Cosmetics.	2002-02-06 18:20:19 +00:00
Marc-André Lemburg	e7c6ee4b8a	Whitespace fixes.	2002-02-06 18:18:03 +00:00
Marc-André Lemburg	3688a882d3	Fix for the UTF-8 memory allocation bug and the UTF-8 encoding bug related to lone high surrogates.	2002-02-06 18:09:02 +00:00
Guido van Rossum	604ddf80d8	Fix for #489669 (Neil Norwitz): memory leak in test_descr (unicode). This is best reproduced by while 1: class U(unicode): pass U(u"xxxxxx") The unicode_dealloc() code wasn't properly freeing the str and defenc fields of the Unicode object when freeing a subtype instance. Fixed this by a subtle refactoring that actually reduces the amount of code slightly.	2001-12-06 20:03:56 +00:00
Barry Warsaw	e5c492d72a	formatfloat(), formatint(): Conversion of sprintf() to PyOS_snprintf() for buffer overrun avoidance.	2001-11-28 21:00:41 +00:00
Marc-André Lemburg	11326de657	Fix for bug #485951 : repr diff between string and unicode.	2001-11-28 12:56:20 +00:00
Marc-André Lemburg	72f8213ba4	Fix for bug #438164 : %-formatting using Unicode objects. This patch also does away with an incompatibility between Jython and CPython.	2001-11-20 15:18:49 +00:00
Marc-André Lemburg	b5507ecd3c	Additional test and documentation for the unicode() changes. This patch should also be applied to the 2.2b1 trunk.	2001-10-19 12:02:29 +00:00
Guido van Rossum	b8c65bc27f	SF patch #470578 : Fixes to synchronize unicode() and str() This patch implements what we have discussed on python-dev late in September: str(obj) and unicode(obj) should behave similar, while the old behaviour is retained for unicode(obj, encoding, errors). The patch also adds a new feature with which objects can provide unicode(obj) with input data: the __unicode__ method. Currently no new tp_unicode slot is implemented; this is left as option for the future. Note that PyUnicode_FromEncodedObject() no longer accepts Unicode objects as input. The API name already suggests that Unicode objects do not belong in the list of acceptable objects and the functionality was only needed because PyUnicode_FromEncodedObject() was being used directly by unicode(). The latter was changed in the discussed way: * unicode(obj) calls PyObject_Unicode() * unicode(obj, encoding, errors) calls PyUnicode_FromEncodedObject() One thing left open to discussion is whether to leave the PyUnicode_FromObject() API as a thin API extension on top of PyUnicode_FromEncodedObject() or to turn it into a (macro) alias for PyObject_Unicode() and deprecate it. Doing so would have some surprising consequences though, e.g. u"abc" + 123 would turn out as u"abc123"... [Marc-Andre didn't have time to check this in before the deadline. I hope this is OK, Marc-Andre! You can still make changes and commit them on the trunk after the branch has been made, but then please mail Barry a context diff if you want the change to be merged into the 2.2b1 release branch. GvR]	2001-10-19 02:01:31 +00:00
Guido van Rossum	9475a2310d	Enable GC for new-style instances. This touches lots of files, since many types were subclassable but had a xxx_dealloc function that called PyObject_DEL(self) directly instead of deferring to self->ob_type->tp_free(self). It is permissible to set tp_free in the type object directly to _PyObject_Del, for non-GC types, or to _PyObject_GC_Del, for GC types. Still, PyObject_DEL was a tad faster, so I'm fearing that our pystone rating is going down again. I'm not sure if doing something like void xxx_dealloc(PyObject *self) { if (PyXxxCheckExact(self)) PyObject_DEL(self); else self->ob_type->tp_free(self); } is any faster than always calling the else branch, so I haven't attempted that -- however those types whose own dealloc is fancier (int, float, unicode) do use this pattern.	2001-10-05 20:51:39 +00:00
Guido van Rossum	ad9744a67a	Fix a bug in rendering of \\ by repr() -- it rendered as \\\ instead of \\.	2001-09-21 15:38:17 +00:00
Marc-André Lemburg	3508e30861	Fix Unicode .join() method to raise a TypeError for sequence elements which are not Unicode objects or strings. (This matches the string.join() behaviour.) Fix a memory leak in the .join() method which occurs in case the Unicode resize fails. Restore the test_unicode output.	2001-09-20 17:22:58 +00:00
Marc-André Lemburg	6871f6ac57	Implement the changes proposed in patch #413333 . unicode(obj) now works just like str(obj) in that it tries __str__/tp_str on the object in case it finds that the object is not a string or buffer.	2001-09-20 12:53:16 +00:00
Marc-André Lemburg	c60e6f7771	Patch #435971 : UTF-7 codec by Brian Quinlan.	2001-09-20 10:35:46 +00:00
Tim Peters	af90b3e610	str_subtype_new, unicode_subtype_new: + These were leaving the hash fields at 0, which all string and unicode routines believe is a legitimate hash code. As a result, hash() applied to str and unicode subclass instances always returned 0, which in turn confused dict operations, etc. + Changed local names "new"; no point to antagonizing C++ compilers.	2001-09-12 05:18:58 +00:00
Tim Peters	7a29bd5861	More on bug 460020: disable many optimizations of unicode subclasses.	2001-09-12 03:03:31 +00:00
Tim Peters	78e0fc74bc	Possibly the end of SF [#460020 ] bug or feature: unicode() and subclasses. Changed unicode(i) to return a true Unicode object when i is an instance of a unicode subclass. Added PyUnicode_CheckExact macro.	2001-09-11 03:07:38 +00:00
Tim Peters	0ebeb584a4	PyUnicode_FromEncodedObject(): Repair memory leak in an error case.	2001-09-11 02:00:50 +00:00
Guido van Rossum	e023fe0eef	Make unicode subclassable.	2001-08-30 03:12:59 +00:00
Martin v. Löwis	e3eb1f2b23	Patch #427190 : Implement and use METH_NOARGS and METH_O.	2001-08-16 13:15:00 +00:00
Tim Peters	772747b3f1	SF patch #438013 Remove 2-byte Py_UCS2 assumptions Removed all instances of Py_UCS2 from the codebase, and so also (I hope) the last remaining reliance on the platform having an integral type with exactly 16 bits. PyUnicode_DecodeUTF16() and PyUnicode_EncodeUTF16() now read and write one byte at a time.	2001-08-09 22:21:55 +00:00
Tim Peters	6d6c1a35e0	Merge of descr-branch back into trunk.	2001-08-02 04:15:00 +00:00
Jeremy Hylton	3ce45389bd	Add _PyUnicode_AsDefaultEncodedString to unicodeobject.h. And remove all the extern decls in the middle of .c files. Apparently, it was excluded from the header file because it is intended for internal use by the interpreter. It's still intended for internal use and documented as such in the header file.	2001-07-30 22:34:24 +00:00
Marc-André Lemburg	80d1dd5f3b	Fix for bug #444493 : u'\U00010001' segfaults with current CVS on wide builds.	2001-07-25 16:05:59 +00:00
Marc-André Lemburg	6c6bfb7c70	Make the unicode-escape and the UTF-16 codecs handle surrogates correctly and thus roundtrip-safe. Some minor cleanups of the code. Added tests for the roundtrip-safety.	2001-07-20 17:39:11 +00:00
Guido van Rossum	0d42e0c54a	#ifdef out generation of \U escapes unless Py_UNICODE_WIDE. This #caused warnings with the VMS C compiler. (SF bug #442998, in part.) On a narrow system the current code should never be executed since ch will always be < 0x10000. Marc-Andre: you may end up fixing this a different way, since I believe you have plans to generate \U for surrogate pairs. I'll leave that to you.	2001-07-20 16:36:21 +00:00
Fredrik Lundh	8f4558583f	use Py_UNICODE_WIDE instead of USE_UCS4_STORAGE and Py_UNICODE_SIZE tests.	2001-06-27 18:59:43 +00:00
Martin v. Löwis	ce9b5a55e1	Encode surrogates in UTF-8 even for a wide Py_UNICODE. Implement sys.maxunicode. Explicitly wrap around upper/lower computations for wide Py_UNICODE. When decoding large characters with UTF-8, represent expected test results using the \U notation.	2001-06-27 06:28:56 +00:00
Martin v. Löwis	ac93bc2501	When decoding UTF-16, don't assume that the buffer is in native endianness when checking surrogates.	2001-06-26 22:43:40 +00:00

1 2 3

148 Commits