cpython

Commit Graph

Author	SHA1	Message	Date
Walter Dörwald	5c1ee17742	Change the unicode.translate docstring to document that Unicode strings (with arbitrary length) are allowed as entries in the unicode.translate mapping. Add a test case for multicharacter replacements. (Multicharacter replacements were enabled by the PEP 293 patch)	2002-09-04 20:31:32 +00:00
Guido van Rossum	2023c9b84a	Fix SF bug 599128, submitted by Inyeol Lee: .replace() would do the wrong thing for a unicode subclass when there were zero string replacements. The example given in the SF bug report was only one way to trigger this; replacing a string of length >= 2 that's not found is another. The code would actually write outside allocated memory if replacement string was longer than the search string. (I wonder how many more of these are lurking? The unicode code base is full of wonders.) Bugfix candidate; this same bug is present in 2.2.1.	2002-08-23 18:50:21 +00:00
Guido van Rossum	8b1a6d694f	Code by Inyeol Lee, submitted to SF bug 595350, to implement the string/unicode method .replace() with a zero-lengt first argument. Inyeol contributed tests for this too.	2002-08-23 18:21:28 +00:00
Guido van Rossum	76afbd9aa4	Fix some endcase bugs in unicode rfind()/rindex() and endswith(). These were reported and fixed by Inyeol Lee in SF bug 595350. The endswith() bug was already fixed in 2.3, but this adds some more test cases.	2002-08-20 17:29:29 +00:00
Marc-André Lemburg	cc8764ca9d	Add C API PyUnicode_FromOrdinal() which exposes unichr() at C level. u'%c' will now raise a ValueError in case the argument is an integer outside the valid range of Unicode code point ordinals. Closes SF bug #593581.	2002-08-11 12:23:04 +00:00
Guido van Rossum	f36921c4b0	Unicode replace() method with empty pattern argument should fail, like it does for 8-bit strings.	2002-08-09 15:36:48 +00:00
Raymond Hettinger	ca84d65ca7	Expanded the unittests for the new width sensitive PyUnicode_Contains().	2002-08-06 23:08:51 +00:00
Barry Warsaw	e06741704e	Added a test for PyUnicode_Contains() taking into account the width of Py_UNICODE.	2002-08-06 19:03:56 +00:00
Barry Warsaw	817918cc3c	Committing patch #591250 which provides "str1 in str2" when str1 is a string of longer than 1 character.	2002-08-06 16:58:21 +00:00
Martin v. Löwis	a729daf2e4	Add encoding declaration.	2002-08-04 17:28:33 +00:00
Barry Warsaw	04f357cffe	Get rid of relative imports in all unittests. Now anything that imports e.g. test_support must do so using an absolute package name such as "import test.test_support" or "from test import test_support". This also updates the README in Lib/test, and gets rid of the duplicate data dirctory in Lib/test/data (replaced by Lib/email/test/data). Now Tim and Jack can have at it. :)	2002-07-23 19:04:11 +00:00
Tim Peters	8ac1495a6a	Whitespace normalization.	2002-05-23 15:15:30 +00:00
Walter Dörwald	de02bcb265	Apply patch diff.txt from SF feature request http://www.python.org/sf/444708 This adds the optional argument for str.strip to unicode.strip too and makes it possible to call str.strip with a unicode argument and unicode.strip with a str argument.	2002-04-22 17:42:37 +00:00
Walter Dörwald	2ee4be0775	Apply diff3.txt from SF patch http://www.python.org/sf/536241 If a str or unicode method returns the original object, make sure that for str and unicode subclasses the original will not be returned. This should prevent SF bug http://www.python.org/sf/460020 from reappearing.	2002-04-17 21:34:05 +00:00
Tim Peters	863ac44b74	Whitespace normalization.	2002-04-16 01:38:40 +00:00
Walter Dörwald	068325ef92	Apply the second version of SF patch http://www.python.org/sf/536241 Add a method zfill to str, unicode and UserString and change Lib/string.py accordingly. This activates the zfill version in unicodeobject.c that was commented out and implements the same in stringobject.c. It also adds the test for unicode support in Lib/string.py back in and uses repr() instead() of str() (as it was before Lib/string.py 1.62)	2002-04-15 13:36:47 +00:00
Marc-André Lemburg	ce0b664af2	Added test case for UTF-8 encoding bug #541828 .	2002-04-10 17:18:02 +00:00
Guido van Rossum	77f6a65eb0	Add the 'bool' type and its values 'False' and 'True', as described in PEP 285. Everything described in the PEP is here, and there is even some documentation. I had to fix 12 unit tests; all but one of these were printing Boolean outcomes that changed from 0/1 to False/True. (The exception is test_unicode.py, which did a type(x) == type(y) style comparison. I could've fixed that with a single line using issubtype(x, type(y)), but instead chose to be explicit about those places where a bool is expected. Still to do: perhaps more documentation; change standard library modules to return False/True from predicates.	2002-04-03 22:41:51 +00:00
Andrew M. Kuchling	eddd68d56c	As part of fixing bug #536241 , add a test case for string.zfill() with Unicode	2002-03-29 16:21:44 +00:00
Martin v. Löwis	047c05ebc4	Do not insert characters for unicode-escape decoders if the error mode is "ignore". Fixes #529104.	2002-03-21 08:55:28 +00:00
Marc-André Lemburg	bd3be8f0ca	Fix to the UTF-8 encoder: it failed on 0-length input strings. Fix for the UTF-8 decoder: it will now accept isolated surrogates (previously it raised an exception which causes round-trips to fail). Added new tests for UTF-8 round-trip safety (we rely on UTF-8 for marshalling Unicode objects, so we better make sure it works for all Unicode code points, including isolated surrogates). Bumped the PYC magic in a non-standard way -- please review. This was needed because the old PYC format used illegal UTF-8 sequences for isolated high surrogates which now raise an exception.	2002-02-07 11:33:49 +00:00
Marc-André Lemburg	3688a882d3	Fix for the UTF-8 memory allocation bug and the UTF-8 encoding bug related to lone high surrogates.	2002-02-06 18:09:02 +00:00
Finn Bock	2b29cb2593	Skipping some tests by adding the usual jython conditional test around: - the repr of unicode. Jython only add the u'' if the string contains char values > 255. - A unicode arg to unicode() is perfectly valid in jython. - A test buffer() test. No buffer() on Jython This closes patch "[ #490920 ] Jython and test_unicode".	2001-12-10 20:57:34 +00:00
Tim Peters	82285dad8e	Whitespace normalization.	2001-12-01 04:11:16 +00:00
Marc-André Lemburg	41f01994c4	Adding test for Unicode repr()-output.	2001-11-28 14:03:14 +00:00
Marc-André Lemburg	72f8213ba4	Fix for bug #438164 : %-formatting using Unicode objects. This patch also does away with an incompatibility between Jython and CPython.	2001-11-20 15:18:49 +00:00
Marc-André Lemburg	0c4d8d05a8	Fix for bug #480188 : printing unicode objects	2001-11-20 15:17:25 +00:00
Marc-André Lemburg	b5507ecd3c	Additional test and documentation for the unicode() changes. This patch should also be applied to the 2.2b1 trunk.	2001-10-19 12:02:29 +00:00
Tim Peters	527e64fd68	Whitespace normalization.	2001-10-04 05:36:56 +00:00
Guido van Rossum	11310bf867	Add tests for repr() of strings containing string quotes as well.	2001-09-21 15:46:41 +00:00
Guido van Rossum	e4874aeab0	Test basic functioning of unicode repr(). (If this breaks Jython, please let me know and we'll figure out how to fix the test.)	2001-09-21 15:36:41 +00:00
Marc-André Lemburg	6871f6ac57	Implement the changes proposed in patch #413333 . unicode(obj) now works just like str(obj) in that it tries __str__/tp_str on the object in case it finds that the object is not a string or buffer.	2001-09-20 12:53:16 +00:00
Marc-André Lemburg	c60e6f7771	Patch #435971 : UTF-7 codec by Brian Quinlan.	2001-09-20 10:35:46 +00:00
Marc-André Lemburg	80d1dd5f3b	Fix for bug #444493 : u'\U00010001' segfaults with current CVS on wide builds.	2001-07-25 16:05:59 +00:00
Marc-André Lemburg	6c6bfb7c70	Make the unicode-escape and the UTF-16 codecs handle surrogates correctly and thus roundtrip-safe. Some minor cleanups of the code. Added tests for the roundtrip-safety.	2001-07-20 17:39:11 +00:00
Martin v. Löwis	ce9b5a55e1	Encode surrogates in UTF-8 even for a wide Py_UNICODE. Implement sys.maxunicode. Explicitly wrap around upper/lower computations for wide Py_UNICODE. When decoding large characters with UTF-8, represent expected test results using the \U notation.	2001-06-27 06:28:56 +00:00
Tim Peters	2f228e75e4	Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask". The comment following used to say: /* We use ~hash instead of hash, as degenerate hash functions, such as for ints <sigh>, can have lots of leading zeros. It's not really a performance risk, but better safe than sorry. 12-Dec-00 tim: so ~hash produces lots of leading ones instead -- what's the gain? / That is, there was never a good reason for doing it. And to the contrary, as explained on Python-Dev last December, it tended to make the sum* (i + incr) & mask (which is the first table index examined in case of collison) the same "too often" across distinct hashes. Changing to the simpler "i = hash & mask" reduced the number of string-dict collisions (== # number of times we go around the lookup for-loop) from about 6 million to 5 million during a full run of the test suite (these are approximate because the test suite does some random stuff from run to run). The number of collisions in non-string dicts also decreased, but not as dramatically. Note that this may, for a given dict, change the order (wrt previous releases) of entries exposed by .keys(), .values() and .items(). A number of std tests suffered bogus failures as a result. For dicts keyed by small ints, or (less so) by characters, the order is much more likely to be in increasing order of key now; e.g., >>> d = {} >>> for i in range(10): ... d[i] = i ... >>> d {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} >>> Unfortunately. people may latch on to that in small examples and draw a bogus conclusion. test_support.py Moved test_extcall's sortdict() into test_support, made it stronger, and imported sortdict into other std tests that needed it. test_unicode.py Excluced cp875 from the "roundtrip over range(128)" test, because cp875 doesn't have a well-defined inverse for unicode("?", "cp875"). See Python-Dev for excruciating details. Cookie.py Chaged various output functions to sort dicts before building strings from them. test_extcall Fiddled the expected-result file. This remains sensitive to native dict ordering, because, e.g., if there are multiple errors in a keyword-arg dict (and test_extcall sets up many cases like that), the specific error Python complains about first depends on native dict ordering.	2001-05-13 00:19:31 +00:00
Marc-André Lemburg	542fe56cb9	Fix for bug #417030 : "print '%*s' fails for unicode string"	2001-05-02 14:21:53 +00:00
Marc-André Lemburg	ef0a032883	Patch by Finn Bock to make test_unicode.py work for Jython.	2001-02-10 14:09:31 +00:00
Marc-André Lemburg	fde66e1bcc	Fixed .capitalize() method of Unicode objects to work like the corresponding string method. Added tests for this too. Patch written by Marc-Andre Lemburg. Copyright assigned to Guido van Rossum.	2001-01-29 11:14:16 +00:00
Guido van Rossum	a1374e429b	Change verify() function to raise TestFailed, not AssertionError. (I realize that I didn't really test this, because all the tests succeed, so verify() never raised an AssertionError -- but the test suite still succeeds, so I'm not too worried.)	2001-01-19 19:01:56 +00:00
Tim Peters	d2bf3b7ca6	Whitespace normalization. Leaving tokenize_tests.py alone for now.	2001-01-18 02:22:22 +00:00
Marc-André Lemburg	3661908a6a	This patch removes all uses of "assert" in the regression test suite and replaces them with a new API verify(). As a result the regression suite will also perform its tests in optimization mode. Written by Marc-Andre Lemburg. Copyright assigned to Guido van Rossum.	2001-01-17 19:11:13 +00:00
Marc-André Lemburg	3a645e4dd4	Added checks to prevent PyUnicode_Count() from dumping core in case the parameters are out of bounds and fixes error handling for .count(), .startswith() and .endswith() for the case of mixed string/Unicode objects. This patch adds Python style index semantics to PyUnicode_Count() indices (including the special handling of negative indices). The patch is an extended version of patch #103249 submitted by Michael Hudson (mwh) on SF. It also includes new test cases.	2001-01-16 11:54:12 +00:00
Marc-André Lemburg	a866df806d	This patch changes the default behaviour of the builtin charmap codec to not apply Latin-1 mappings for keys which are not found in the mapping dictionaries, but instead treat them as undefined mappings. The patch was originally written by Martin v. Loewis with some additional (cosmetic) changes and an updated test script by Marc-Andre Lemburg. The standard codecs were recreated from the most current files available at the Unicode.org site using the Tools/scripts/gencodec.py tool. This patch closes the bugs #116285 and #119960.	2001-01-03 21:29:14 +00:00
Guido van Rossum	8b26454273	Test more split argument combinations: 1) multi-char separator 2) multi-char separator that only occurs at last position 3) all of the above with mixed Unicode and 8-bit-string arguments	2000-12-19 02:22:31 +00:00
Guido van Rossum	15ffc71c0f	Slight improvement to Unicode test suite, inspired by patch #102563 : also test join method of 8-bit strings. Also changed the test() function to (1) compare the types of the expected and actual result, and (2) in verbose mode, print the repr() of the output.	2000-11-29 12:13:59 +00:00
Fred Drake	004d5e6880	Make reindent.py happy (convert everything to 4-space indents!).	2000-10-23 17:22:08 +00:00
Marc-André Lemburg	b96d80201c	Updated test with a case which checks for the bug reported in	2000-10-07 08:52:45 +00:00
Marc-André Lemburg	e5034378cc	Removing UTF-16 aware Unicode comparison code. This kind of compare function (together with other locale aware ones) should into a new collation support module. See python-dev for a discussion of this removal. Note: This patch should also be applied to the 1.6 branch.	2000-08-08 08:04:29 +00:00
Marc-André Lemburg	d6d06ade26	Tests for new surrogate support in the UTF-8 codec. By Bill Tutt.	2000-07-07 17:48:52 +00:00
Marc-André Lemburg	b6d78fcd9c	Tests for new instance support in unicode().	2000-07-07 13:46:19 +00:00
Marc-André Lemburg	9d4674168f	Added tests for the new .isalpha() and .isalnum() methods.	2000-07-05 09:46:40 +00:00
Marc-André Lemburg	af69f15d21	Marc-Andre Lemburg <mal@lemburg.com>: Moved tests of new Unicode Char Name support to a separate test.	2000-06-30 09:13:35 +00:00
Marc-André Lemburg	a6f73d64c5	Marc-Andre Lemburg <mal@lemburg.com>: Added tests for the new Unicode character name support in the standard unicode-escape codec.	2000-06-28 16:41:23 +00:00
Marc-André Lemburg	bddf502a1f	Marc-Andre Lemburg <mal@lemburg.com>: Removed a test which can fail when the default locale setting uses a Latin-1 encoding. The test case is not applicable anymore.	2000-06-14 09:17:25 +00:00
Marc-André Lemburg	8462573826	Marc-Andre Lemburg <mal@lemburg.com>: Fixed some tests to not cause the script to fail, but rather output a warning (which then is caught by regrtest.py as wrong output). This is needed to make test_unicode.py run through on JPython. Thanks to Finn Bock.	2000-06-13 12:05:36 +00:00
Marc-André Lemburg	59a044b7d2	Marc-Andre Lemburg <mal@lemburg.com>: Updated to the fix in %c formatting: it now always checks for a one character argument.	2000-06-08 17:50:55 +00:00
Fred Drake	774c931c12	M.-A. Lemburg <mal@lemburg.com>: Added another test for string formatting (the one that produced the core dump now fixed in unicodeobject.c).	2000-05-09 19:57:46 +00:00
Guido van Rossum	6650320349	Get rid of memory leak caused by assingning sys.exc_info() to a local. Store sys.exc_info()[:2] instead.	2000-04-28 20:39:58 +00:00
Fred Drake	e0243e24be	M.-A. Lemburg <mal@lemburg.com>: Added test for Unicode string concatenation.	2000-04-13 14:11:56 +00:00
Guido van Rossum	7ee801d6af	Marc-Andre Lemburg: Modified .splitlines() tests according to the changes in unicodeobject.c.	2000-04-11 15:37:02 +00:00
Guido van Rossum	9706486b9f	Marc-Andre Lemburg: * '...%s...' % u"abc" now coerces to Unicode just like string methods. Care is taken not to reevaluate already formatted arguments -- only the first Unicode object appearing in the argument mapping is looked up twice. Added test cases for this to test_unicode.py.	2000-04-10 13:52:48 +00:00
Guido van Rossum	9e896b37c7	Marc-Andre's third try at this bulk patch seems to work (except that his copy of test_contains.py seems to be broken -- the lines he deleted were already absent). Checkin messages: New Unicode support for int(), float(), complex() and long(). - new APIs PyInt_FromUnicode() and PyLong_FromUnicode() - added support for Unicode to PyFloat_FromString() - new encoding API PyUnicode_EncodeDecimal() which converts Unicode to a decimal char* string (used in the above new APIs) - shortcuts for calls like int(<int object>) and float(<float obj>) - tests for all of the above Unicode compares and contains checks: - comparing Unicode and non-string types now works; TypeErrors are masked, all other errors such as ValueError during Unicode coercion are passed through (note that PyUnicode_Compare does not implement the masking -- PyObject_Compare does this) - contains now works for non-string types too; TypeErrors are masked and 0 returned; all other errors are passed through Better testing support for the standard codecs. Misc minor enhancements, such as an alias dbcs for the mbcs codec. Changes: - PyLong_FromString() now applies the same error checks as does PyInt_FromString(): trailing garbage is reported as error and not longer silently ignored. The only characters which may be trailing the digits are 'L' and 'l' -- these are still silently ignored. - string.ato?() now directly interface to int(), long() and float(). The error strings are now a little different, but the type still remains the same. These functions are now ready to get declared obsolete ;-) - PyNumber_Int() now also does a check for embedded NULL chars in the input string; PyNumber_Long() already did this (and still does) Followed by: Looks like I've gone a step too far there... (and test_contains.py seem to have a bug too). I've changed back to reporting all errors in PyUnicode_Contains() and added a few more test cases to test_contains.py (plus corrected the join() NameError).	2000-04-05 20:11:21 +00:00
Guido van Rossum	24bdb0474f	Marc-Andre Lemburg: The attached patch set includes a workaround to get Python with Unicode compile on BSDI 4.x (courtesy Thomas Wouters; the cause is a bug in the BSDI wchar.h header file) and Python interfaces for the MBCS codec donated by Mark Hammond. Also included are some minor corrections w/r to the docs of the new "es" and "es#" parser markers (use PyMem_Free() instead of free(); thanks to Mark Hammond for finding these). The unicodedata tests are now in a separate file (test_unicodedata.py) to avoid problems if the module cannot be found.	2000-03-28 20:29:59 +00:00
Guido van Rossum	d8855fde88	Marc-Andre Lemburg: Attached you find the latest update of the Unicode implementation. The patch is against the current CVS version. It includes the fix I posted yesterday for the core dump problem in codecs.c (was introduced by my previous patch set -- sorry), adds more tests for the codecs and two new parser markers "es" and "es#".	2000-03-24 22:14:19 +00:00
Barry Warsaw	51ac58039f	On 17-Mar-2000, Marc-Andre Lemburg said: Attached you find an update of the Unicode implementation. The patch is against the current CVS version. I would appreciate if someone with CVS checkin permissions could check the changes in. The patch contains all bugs and patches sent this week and also fixes a leak in the codecs code and a bug in the free list code for Unicode objects (which only shows up when compiling Python with Py_DEBUG; thanks to MarkH for spotting this one).	2000-03-20 16:36:48 +00:00
Guido van Rossum	d4d2684240	Marc-Andre Lemburg: Add tests for mixed use of char in string.	2000-03-13 23:21:48 +00:00
Guido van Rossum	a831cac7a8	Marc-Andre Lemburg: test script for Unicode implementation.	2000-03-10 23:23:21 +00:00

1 2 3 4

169 Commits