Commit Graph

71 Commits

Author SHA1 Message Date
Martin v. Löwis 1ce4ae3268 Don't test whether surrogate sequences round-trip in UTF-8. 2.2.2 candidate. 2002-09-14 09:19:53 +00:00
Martin v. Löwis 766e300eaa Use integer above sys.maxunicode for range test. Fixes #608884.
2.2.2 candidate.
2002-09-14 09:10:04 +00:00
Walter Dörwald 5c1ee17742 Change the unicode.translate docstring to document that
Unicode strings (with arbitrary length) are allowed
as entries in the unicode.translate mapping.

Add a test case for multicharacter replacements.

(Multicharacter replacements were enabled by the
PEP 293 patch)
2002-09-04 20:31:32 +00:00
Guido van Rossum 2023c9b84a Fix SF bug 599128, submitted by Inyeol Lee: .replace() would do the
wrong thing for a unicode subclass when there were zero string
replacements.  The example given in the SF bug report was only one way
to trigger this; replacing a string of length >= 2 that's not found is
another.  The code would actually write outside allocated memory if
replacement string was longer than the search string.

(I wonder how many more of these are lurking?  The unicode code base
is full of wonders.)

Bugfix candidate; this same bug is present in 2.2.1.
2002-08-23 18:50:21 +00:00
Guido van Rossum 8b1a6d694f Code by Inyeol Lee, submitted to SF bug 595350, to implement
the string/unicode method .replace() with a zero-lengt first argument.
Inyeol contributed tests for this too.
2002-08-23 18:21:28 +00:00
Guido van Rossum 76afbd9aa4 Fix some endcase bugs in unicode rfind()/rindex() and endswith().
These were reported and fixed by Inyeol Lee in SF bug 595350.  The
endswith() bug was already fixed in 2.3, but this adds some more test
cases.
2002-08-20 17:29:29 +00:00
Marc-André Lemburg cc8764ca9d Add C API PyUnicode_FromOrdinal() which exposes unichr() at C level.
u'%c' will now raise a ValueError in case the argument is an
integer outside the valid range of Unicode code point ordinals.

Closes SF bug #593581.
2002-08-11 12:23:04 +00:00
Guido van Rossum f36921c4b0 Unicode replace() method with empty pattern argument should fail, like
it does for 8-bit strings.
2002-08-09 15:36:48 +00:00
Raymond Hettinger ca84d65ca7 Expanded the unittests for the new width sensitive PyUnicode_Contains(). 2002-08-06 23:08:51 +00:00
Barry Warsaw e06741704e Added a test for PyUnicode_Contains() taking into account the width of
Py_UNICODE.
2002-08-06 19:03:56 +00:00
Barry Warsaw 817918cc3c Committing patch #591250 which provides "str1 in str2" when str1 is a
string of longer than 1 character.
2002-08-06 16:58:21 +00:00
Martin v. Löwis a729daf2e4 Add encoding declaration. 2002-08-04 17:28:33 +00:00
Barry Warsaw 04f357cffe Get rid of relative imports in all unittests. Now anything that
imports e.g. test_support must do so using an absolute package name
such as "import test.test_support" or "from test import test_support".

This also updates the README in Lib/test, and gets rid of the
duplicate data dirctory in Lib/test/data (replaced by
Lib/email/test/data).

Now Tim and Jack can have at it. :)
2002-07-23 19:04:11 +00:00
Tim Peters 8ac1495a6a Whitespace normalization. 2002-05-23 15:15:30 +00:00
Walter Dörwald de02bcb265 Apply patch diff.txt from SF feature request
http://www.python.org/sf/444708

This adds the optional argument for str.strip
to unicode.strip too and makes it possible
to call str.strip with a unicode argument
and unicode.strip with a str argument.
2002-04-22 17:42:37 +00:00
Walter Dörwald 2ee4be0775 Apply diff3.txt from SF patch http://www.python.org/sf/536241
If a str or unicode method returns the original object,
make sure that for str and unicode subclasses the original
will not be returned.

This should prevent SF bug http://www.python.org/sf/460020
from reappearing.
2002-04-17 21:34:05 +00:00
Tim Peters 863ac44b74 Whitespace normalization. 2002-04-16 01:38:40 +00:00
Walter Dörwald 068325ef92 Apply the second version of SF patch http://www.python.org/sf/536241
Add a method zfill to str, unicode and UserString and change
Lib/string.py accordingly.

This activates the zfill version in unicodeobject.c that was
commented out and implements the same in stringobject.c. It also
adds the test for unicode support in Lib/string.py back in and
uses repr() instead() of str() (as it was before Lib/string.py 1.62)
2002-04-15 13:36:47 +00:00
Marc-André Lemburg ce0b664af2 Added test case for UTF-8 encoding bug #541828. 2002-04-10 17:18:02 +00:00
Guido van Rossum 77f6a65eb0 Add the 'bool' type and its values 'False' and 'True', as described in
PEP 285.  Everything described in the PEP is here, and there is even
some documentation.  I had to fix 12 unit tests; all but one of these
were printing Boolean outcomes that changed from 0/1 to False/True.
(The exception is test_unicode.py, which did a type(x) == type(y)
style comparison.  I could've fixed that with a single line using
issubtype(x, type(y)), but instead chose to be explicit about those
places where a bool is expected.

Still to do: perhaps more documentation; change standard library
modules to return False/True from predicates.
2002-04-03 22:41:51 +00:00
Andrew M. Kuchling eddd68d56c As part of fixing bug #536241, add a test case for string.zfill() with Unicode 2002-03-29 16:21:44 +00:00
Martin v. Löwis 047c05ebc4 Do not insert characters for unicode-escape decoders if the error mode
is "ignore". Fixes #529104.
2002-03-21 08:55:28 +00:00
Marc-André Lemburg bd3be8f0ca Fix to the UTF-8 encoder: it failed on 0-length input strings.
Fix for the UTF-8 decoder: it will now accept isolated surrogates
(previously it raised an exception which causes round-trips to
fail).

Added new tests for UTF-8 round-trip safety (we rely on UTF-8 for
marshalling Unicode objects, so we better make sure it works for
all Unicode code points, including isolated surrogates).

Bumped the PYC magic in a non-standard way -- please review. This
was needed because the old PYC format used illegal UTF-8 sequences
for isolated high surrogates which now raise an exception.
2002-02-07 11:33:49 +00:00
Marc-André Lemburg 3688a882d3 Fix for the UTF-8 memory allocation bug and the UTF-8 encoding
bug related to lone high surrogates.
2002-02-06 18:09:02 +00:00
Finn Bock 2b29cb2593 Skipping some tests by adding the usual jython conditional test around:
- the repr of unicode. Jython only add the u'' if the string contains char
  values > 255.
- A unicode arg to unicode() is perfectly valid in jython.
- A test buffer() test. No buffer() on Jython

This closes patch "[ #490920 ] Jython and test_unicode".
2001-12-10 20:57:34 +00:00
Tim Peters 82285dad8e Whitespace normalization. 2001-12-01 04:11:16 +00:00
Marc-André Lemburg 41f01994c4 Adding test for Unicode repr()-output. 2001-11-28 14:03:14 +00:00
Marc-André Lemburg 72f8213ba4 Fix for bug #438164: %-formatting using Unicode objects.
This patch also does away with an incompatibility between Jython
and CPython.
2001-11-20 15:18:49 +00:00
Marc-André Lemburg 0c4d8d05a8 Fix for bug #480188: printing unicode objects 2001-11-20 15:17:25 +00:00
Marc-André Lemburg b5507ecd3c Additional test and documentation for the unicode() changes.
This patch should also be applied to the 2.2b1 trunk.
2001-10-19 12:02:29 +00:00
Tim Peters 527e64fd68 Whitespace normalization. 2001-10-04 05:36:56 +00:00
Guido van Rossum 11310bf867 Add tests for repr() of strings containing string quotes as well. 2001-09-21 15:46:41 +00:00
Guido van Rossum e4874aeab0 Test basic functioning of unicode repr(). (If this breaks Jython,
please let me know and we'll figure out how to fix the test.)
2001-09-21 15:36:41 +00:00
Marc-André Lemburg 6871f6ac57 Implement the changes proposed in patch #413333. unicode(obj) now
works just like str(obj) in that it tries __str__/tp_str on the object
in case it finds that the object is not a string or buffer.
2001-09-20 12:53:16 +00:00
Marc-André Lemburg c60e6f7771 Patch #435971: UTF-7 codec by Brian Quinlan. 2001-09-20 10:35:46 +00:00
Marc-André Lemburg 80d1dd5f3b Fix for bug #444493: u'\U00010001' segfaults with current CVS on
wide builds.
2001-07-25 16:05:59 +00:00
Marc-André Lemburg 6c6bfb7c70 Make the unicode-escape and the UTF-16 codecs handle surrogates
correctly and thus roundtrip-safe.

Some minor cleanups of the code.

Added tests for the roundtrip-safety.
2001-07-20 17:39:11 +00:00
Martin v. Löwis ce9b5a55e1 Encode surrogates in UTF-8 even for a wide Py_UNICODE.
Implement sys.maxunicode.
Explicitly wrap around upper/lower computations for wide Py_UNICODE.
When decoding large characters with UTF-8, represent expected test
results using the \U notation.
2001-06-27 06:28:56 +00:00
Tim Peters 2f228e75e4 Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask".
The comment following used to say:
	/* We use ~hash instead of hash, as degenerate hash functions, such
	   as for ints <sigh>, can have lots of leading zeros. It's not
	   really a performance risk, but better safe than sorry.
	   12-Dec-00 tim:  so ~hash produces lots of leading ones instead --
	   what's the gain? */
That is, there was never a good reason for doing it.  And to the contrary,
as explained on Python-Dev last December, it tended to make the *sum*
(i + incr) & mask (which is the first table index examined in case of
collison) the same "too often" across distinct hashes.

Changing to the simpler "i = hash & mask" reduced the number of string-dict
collisions (== # number of times we go around the lookup for-loop) from about
6 million to 5 million during a full run of the test suite (these are
approximate because the test suite does some random stuff from run to run).
The number of collisions in non-string dicts also decreased, but not as
dramatically.

Note that this may, for a given dict, change the order (wrt previous
releases) of entries exposed by .keys(), .values() and .items().  A number
of std tests suffered bogus failures as a result.  For dicts keyed by
small ints, or (less so) by characters, the order is much more likely to be
in increasing order of key now; e.g.,

>>> d = {}
>>> for i in range(10):
...    d[i] = i
...
>>> d
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
>>>

Unfortunately. people may latch on to that in small examples and draw a
bogus conclusion.

test_support.py
    Moved test_extcall's sortdict() into test_support, made it stronger,
    and imported sortdict into other std tests that needed it.
test_unicode.py
    Excluced cp875 from the "roundtrip over range(128)" test, because
    cp875 doesn't have a well-defined inverse for unicode("?", "cp875").
    See Python-Dev for excruciating details.
Cookie.py
    Chaged various output functions to sort dicts before building
    strings from them.
test_extcall
    Fiddled the expected-result file.  This remains sensitive to native
    dict ordering, because, e.g., if there are multiple errors in a
    keyword-arg dict (and test_extcall sets up many cases like that), the
    specific error Python complains about first depends on native dict
    ordering.
2001-05-13 00:19:31 +00:00
Marc-André Lemburg 542fe56cb9 Fix for bug #417030: "print '%*s' fails for unicode string" 2001-05-02 14:21:53 +00:00
Marc-André Lemburg ef0a032883 Patch by Finn Bock to make test_unicode.py work for Jython. 2001-02-10 14:09:31 +00:00
Marc-André Lemburg fde66e1bcc Fixed .capitalize() method of Unicode objects to work like the
corresponding string method. Added tests for this too.

Patch written by Marc-Andre Lemburg. Copyright assigned to Guido van Rossum.
2001-01-29 11:14:16 +00:00
Guido van Rossum a1374e429b Change verify() function to raise TestFailed, not AssertionError.
(I realize that I didn't really test this, because all the tests
succeed, so verify() never raised an AssertionError -- but the test
suite still succeeds, so I'm not too worried.)
2001-01-19 19:01:56 +00:00
Tim Peters d2bf3b7ca6 Whitespace normalization. Leaving tokenize_tests.py alone for now. 2001-01-18 02:22:22 +00:00
Marc-André Lemburg 3661908a6a This patch removes all uses of "assert" in the regression test suite
and replaces them with a new API verify(). As a result the regression
suite will also perform its tests in optimization mode.

Written by Marc-Andre Lemburg. Copyright assigned to Guido van Rossum.
2001-01-17 19:11:13 +00:00
Marc-André Lemburg 3a645e4dd4 Added checks to prevent PyUnicode_Count() from dumping core
in case the parameters are out of bounds and fixes error handling
for .count(), .startswith() and .endswith() for the case of
mixed string/Unicode objects.

This patch adds Python style index semantics to PyUnicode_Count()
indices (including the special handling of negative indices).

The patch is an extended version of patch #103249 submitted
by Michael Hudson (mwh) on SF. It also includes new test cases.
2001-01-16 11:54:12 +00:00
Marc-André Lemburg a866df806d This patch changes the default behaviour of the builtin charmap
codec to not apply Latin-1 mappings for keys which are not found
in the mapping dictionaries, but instead treat them as undefined
mappings.

The patch was originally written by Martin v. Loewis with some
additional (cosmetic) changes and an updated test script
by Marc-Andre Lemburg.

The standard codecs were recreated from the most current files
available at the Unicode.org site using the Tools/scripts/gencodec.py
tool.

This patch closes the bugs #116285 and #119960.
2001-01-03 21:29:14 +00:00
Guido van Rossum 8b26454273 Test more split argument combinations:
1) multi-char separator
2) multi-char separator that only occurs at last position
3) all of the above with mixed Unicode and 8-bit-string arguments
2000-12-19 02:22:31 +00:00
Guido van Rossum 15ffc71c0f Slight improvement to Unicode test suite, inspired by patch #102563:
also test join method of 8-bit strings.

Also changed the test() function to (1) compare the types of the
expected and actual result, and (2) in verbose mode, print the repr()
of the output.
2000-11-29 12:13:59 +00:00
Fred Drake 004d5e6880 Make reindent.py happy (convert everything to 4-space indents!). 2000-10-23 17:22:08 +00:00