Commit Graph

138 Commits

Author SHA1 Message Date
Ezio Melotti 370d85cee4 Python 2 can encode/decode surrogates to utf-8. Add a test for this. 2011-02-28 01:42:29 +00:00
Antoine Pitrou b27ddc72ea Merged revisions 85861 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r85861 | antoine.pitrou | 2010-10-27 20:52:48 +0200 (mer., 27 oct. 2010) | 3 lines

  Recode modules from latin-1 to utf-8
........
2010-10-27 18:58:04 +00:00
Florent Xicluna c0c0b14671 Strengthen test_unicode with explicit type checking for assertEqual tests. 2010-09-13 08:53:00 +00:00
Florent Xicluna 60d512c3b0 Check PendingDeprecationWarning after issue #7994. 2010-09-13 08:21:43 +00:00
Florent Xicluna 9b90cd1f7b Merged revisions 84470-84471,84566-84567,84759 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r84470 | florent.xicluna | 2010-09-03 22:00:37 +0200 (ven., 03 sept. 2010) | 1 line

  Strengthen BytesWarning tests.
........
  r84471 | florent.xicluna | 2010-09-03 22:23:40 +0200 (ven., 03 sept. 2010) | 1 line

  Typo
........
  r84566 | florent.xicluna | 2010-09-06 22:27:15 +0200 (lun., 06 sept. 2010) | 1 line

  typo
........
  r84567 | florent.xicluna | 2010-09-06 22:27:55 +0200 (lun., 06 sept. 2010) | 1 line

  typo
........
  r84759 | florent.xicluna | 2010-09-13 04:28:18 +0200 (lun., 13 sept. 2010) | 1 line

  Reenable test_ucs4 and remove some duplicated lines.
........
2010-09-13 07:46:37 +00:00
Stefan Krah 0b9201fa1c Sub-issue of #9036: Fix incorrect use of Py_CHARMASK. 2010-07-19 18:06:46 +00:00
Benjamin Peterson eabdeba25e use unicode literals 2010-06-07 22:33:09 +00:00
Benjamin Peterson 13e934acc0 correctly overflow when indexes are too large 2010-06-07 22:23:23 +00:00
Ezio Melotti ab2eb0ee84 Add a NEWS entry for r81758 and clarify a comment. 2010-06-05 19:21:32 +00:00
Ezio Melotti e57e50c8e7 Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.
1) #8271: when a byte sequence is invalid, only the start byte and all the
   valid continuation bytes are now replaced by U+FFFD, instead of replacing
   the number of bytes specified by the start byte.
   See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
   in behavior);
3) Add code and tests to reject surrogates (U+D800-U+DFFF) as defined in
   RFC 3629, but leave it commented out since it's not backward compatible;
4) Change the error messages "unexpected code byte" to "invalid start byte"
   and "invalid data" to "invalid continuation byte";
5) Add an extensive set of tests in test_unicode;
6) Fix test_codeccallbacks because it was failing after this change.
2010-06-05 17:51:07 +00:00
Georg Brandl f0757a2937 #8016: add the CP858 codec (approved by Benjamin). (Also add CP720 to the tests, it was missing there.) 2010-05-24 21:29:07 +00:00
Victor Stinner c7790ed163 Fix the NEWS about my last commit: an unicode subclass can now override the
__unicode__ method (and not the __str__ method).

Simplify also the testcase.
2010-03-22 12:36:28 +00:00
Victor Stinner 95affc4449 Issue #1583863: An unicode subclass can now override the __str__ method 2010-03-22 12:24:37 +00:00
Florent Xicluna 6de9e938a5 Issue #7849: Now the utility ``check_warnings`` verifies if the warnings are
effectively raised.  A new utility ``check_py3k_warnings`` deals with py3k warnings.
2010-03-07 12:18:33 +00:00
Victor Stinner f20f9c299e Issue #7649: Fix u'%c' % char for character in range 0x80..0xFF
=> raise an UnicodeDecodeError. Patch written by Ezio Melotti.
2010-02-23 23:16:07 +00:00
Ezio Melotti aa98058cc4 use assert[Not]In where appropriate 2010-01-23 23:04:36 +00:00
Antoine Pitrou 5b7139aab4 Issue #7462: Implement the stringlib fast search algorithm for the `rfind`,
`rindex`, `rsplit` and `rpartition` methods.  Patch by Florent Xicluna.
2010-01-02 21:12:58 +00:00
R. David Murray 0a0a1a842c Issue #1680159: unicode coercion during an 'in' operation was masking
any errors that might occur during coercion of the left operand and
turning them into a TypeError with a message text that was confusing in
the given context.  This patch lets any errors through, as was already
done during coercion of the right hand side.
2009-12-14 16:28:26 +00:00
Benjamin Peterson 332d721750 add keyword arguments support to str/unicode encode and decode #6300 2009-09-18 21:14:55 +00:00
Benjamin Peterson 5c8da86f3a convert usage of fail* to assert* 2009-06-30 22:57:08 +00:00
Eric Smith 4b94b192ff Issue 6089: str.format raises SystemError. 2009-05-23 13:56:13 +00:00
Antoine Pitrou 653dece278 Issue #4426: The UTF-7 decoder was too strict and didn't accept some legal sequences.
Patch by Nick Barnes and Victor Stinner.
2009-05-04 18:32:32 +00:00
Eric Smith 2ace4cf813 Unicode format tests weren't actually testing unicode. This was probably due to the original backport from py3k. 2009-03-14 14:37:38 +00:00
Eric Smith 6f42edb682 Issue 5237, Allow auto-numbered replacement fields in str.format() strings.
For simple uses for str.format(), this makes the typing easier. Hopfully this
will help in the adoption of str.format().

For example:
'The {} is {}'.format('sky', 'blue')

You can mix and matcth auto-numbering and named replacement fields:
'The {} is {color}'.format('sky', color='blue')

But you can't mix and match auto-numbering and specified numbering:
'The {0} is {}'.format('sky', 'blue')
ValueError: cannot switch from manual field specification to automatic field numbering

Will port to 3.1.
2009-03-14 11:57:26 +00:00
Antoine Pitrou 187ac1bda4 #3601: test_unicode.test_raiseMemError fails in UCS4
Reviewed by Benjamin Peterson on IRC.
2008-09-05 22:04:54 +00:00
Antoine Pitrou fd7c43e7be #3556: test_raiseMemError consumes an insane amount of memory 2008-08-17 17:01:49 +00:00
Amaury Forgeot d'Arc 06847b13ca Correct a crash when two successive unicode allocations fail with a MemoryError:
the freelist contained half-initialized objects with freed pointers.

The comment
/* XXX UNREF/NEWREF interface should be more symmetrical */
was copied from tupleobject.c, and appears in some other places.
I sign the petition.
2008-07-31 23:39:05 +00:00
Antoine Pitrou 4982d5d04a #2242: utf7 decoding crashes on bogus input on some Windows/MSVC versions 2008-07-25 17:45:59 +00:00
Amaury Forgeot d'Arc 9a0d3462fc #1477: ur'\U0010FFFF' raised in narrow unicode builds.
Corrected the raw-unicode-escape codec to use UTF-16 surrogates in
this case, just like the unicode-escape codec.
2008-03-23 09:55:29 +00:00
Christian Heimes c5f05e45cf Patch #2167 from calvin: Remove unused imports 2008-02-23 17:40:11 +00:00
Eric Smith bc32fee029 Added code to correct combining str and unicode in ''.format(). Added test case. 2008-02-18 18:02:34 +00:00
Eric Smith a9f7d62480 Backport of PEP 3101, Advanced String Formatting, from py3k.
Highlights:
 - Adding PyObject_Format.
 - Adding string.Format class.
 - Adding __format__ for str, unicode, int, long, float, datetime.
 - Adding builtin format.
 - Adding ''.format and u''.format.
 - str/unicode fixups for formatters.

The files in Objects/stringlib that implement PEP 3101 (stringdefs.h,
unicodedefs.h, formatter.h, string_format.h) are identical in trunk
and py3k.  Any changes from here on should be made to trunk, and
changes will propogate to py3k).
2008-02-17 19:46:49 +00:00
Kurt B. Kaiser db98f3632a Fix failing unicode test caused by change to ast.c at r56441 2007-07-18 19:58:42 +00:00
Neal Norwitz ba965deea8 Prevent these tests from running on Win64 since they don\'t apply there either 2007-06-11 02:14:39 +00:00
Neal Norwitz 7dbd2a3720 Prevent expandtabs() on string and unicode objects from causing a segfault when
a large width is passed on 32-bit platforms.  Found by Google.

It would be good for people to review this especially carefully and verify
I don't have an off by one error and there is no other way to cause overflow.
2007-06-09 03:36:34 +00:00
Collin Winter c2898c5a67 Standardize on test.test_support.run_unittest() (as opposed to a mix of run_unittest() and run_suite()). Also, add functionality to run_unittest() that admits usage of unittest.TestLoader.loadTestsFromModule(). 2007-04-25 17:29:52 +00:00
Neal Norwitz 17753ecbfa Patch #1541585: fix buffer overrun when performing repr() on
a unicode string in a build with wide unicode (UCS-4) support.

This code could be improved, so add an XXX comment.
2006-08-21 22:21:19 +00:00
Tim Peters 4511a713d5 Whitespace normalization. 2006-05-03 04:46:14 +00:00
Georg Brandl de9b624fb9 Bug #1473625: stop cPickle making float dumps locale dependent in protocol 0.
On the way, add a decorator to test_support to facilitate running single
test functions in different locales with automatic cleanup.
2006-04-30 11:13:56 +00:00
Anthony Baxter 67b6d516ce Fixed bug #1459029 - unicode reprs were double-escaped. 2006-03-30 10:54:07 +00:00
Georg Brandl da6b107745 Checkin the test of patch #1400181. 2006-01-20 17:48:54 +00:00
Hye-Shik Chang 835b243c71 Bug #1379994: Fix *unicode_escape codecs to encode r'\' as r'\\'
just like string codecs.
2005-12-17 04:38:31 +00:00
Neal Norwitz 430f68b447 Move registration of the codec search function to the module scope
so it is only executed once.  Otherwise the same search function is
repeated added to the codec search path when regrtest is run with -R
and leaks are reported.
2005-11-24 22:00:56 +00:00
Neil Schemenauer cf52c07843 Change the %s format specifier for str objects so that it returns a
unicode instance if the argument is not an instance of basestring and
calling __str__ on the argument returns a unicode instance.
2005-08-12 17:34:58 +00:00
Brett Cannon c3647ac93e Make subclasses of int, long, complex, float, and unicode perform type
conversion using the proper magic slot (e.g., __int__()).  Also move conversion
code out of PyNumber_*() functions in the C API into the nb_* function.

Applied patch #1109424.  Thanks Walter Doewald.
2005-04-26 03:45:26 +00:00
Walter Dörwald 57d88e5abd Move test_bug1001011() to string_tests.MixinStrUnicodeTest so that
it can be used for str and unicode. Drop the test for
   "".join([s]) is s
because this is an implementation detail (and doesn't work for unicode)
2004-08-26 16:53:04 +00:00
Hye-Shik Chang e9ddfbb412 SF #989185: Drop unicode.iswide() and unicode.width() and add
unicodedata.east_asian_width().  You can still implement your own
simple width() function using it like this:
    def width(u):
        w = 0
        for c in unicodedata.normalize('NFC', u):
            cwidth = unicodedata.east_asian_width(c)
            if cwidth in ('W', 'F'): w += 2
            else: w += 1
        return w
2004-08-04 07:38:35 +00:00
Marc-André Lemburg d25c650461 Let u'%s' % obj try obj.__unicode__() first and fallback to obj.__str__(). 2004-07-23 16:13:25 +00:00
Hye-Shik Chang 3c145449da Reuse width/iswide tests from strings_test. (Suggested by Walter Dörwald) 2004-06-04 04:24:54 +00:00
Hye-Shik Chang 7bd860655f Fix typo. 2004-06-04 03:19:17 +00:00