makeunicodedata.py: download all data files from unicode.org,
switch to extracting Unihan data from zip file.
Read linebreakprops and derivednormalizationprops even for
old versions, even though they are not used in delta records.
test:unicode.py: U+11000 is now assigned, use U+14000 instead.
if the format string is not empty. Manually merge r79596 and r84772
from 2.x.
Also, apparently test_format() from test_builtin never made it into
3.x. I've added it as well. It tests the basic format()
infrastructure.
Database (Py_UNICODE_TOLOWER, Py_UNICODE_ISDECIMAL, and others) now accept
and return characters from the full Unicode range (Py_UCS4).
The differences from Python code are few:
- unicodedata.numeric(), unicodedata.decimal() and unicodedata.digit()
now return the correct value for large code points
- repr() may consider more characters as printable.
1) #8271: when a byte sequence is invalid, only the start byte and all the
valid continuation bytes are now replaced by U+FFFD, instead of replacing
the number of bytes specified by the start byte.
See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
in behavior);
3) Change the error messages "unexpected code byte" to "invalid start byte"
and "invalid data" to "invalid continuation byte";
4) Add an extensive set of tests in test_unicode;
5) Fix test_codeccallbacks because it was failing after this change.
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81499 | georg.brandl | 2010-05-24 16:29:07 -0500 (Mon, 24 May 2010) | 1 line
#8016: add the CP858 codec (approved by Benjamin). (Also add CP720 to the tests, it was missing there.)
........
r81506 | benjamin.peterson | 2010-05-24 17:04:53 -0500 (Mon, 24 May 2010) | 1 line
set svn:eol-style
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81820 | benjamin.peterson | 2010-06-07 17:23:23 -0500 (Mon, 07 Jun 2010) | 1 line
correctly overflow when indexes are too large
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r79278 | victor.stinner | 2010-03-22 13:24:37 +0100 (lun., 22 mars 2010) | 2 lines
Issue #1583863: An unicode subclass can now override the __str__ method
........
r79280 | victor.stinner | 2010-03-22 13:36:28 +0100 (lun., 22 mars 2010) | 5 lines
Fix the NEWS about my last commit: an unicode subclass can now override the
__unicode__ method (and not the __str__ method).
Simplify also the testcase.
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r72283 | antoine.pitrou | 2009-05-04 20:32:32 +0200 (lun., 04 mai 2009) | 4 lines
Issue #4426: The UTF-7 decoder was too strict and didn't accept some legal sequences.
Patch by Nick Barnes and Victor Stinner.
........
r72284 | antoine.pitrou | 2009-05-04 20:32:50 +0200 (lun., 04 mai 2009) | 3 lines
Add Nick Barnes to ACKS.
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r70364 | eric.smith | 2009-03-14 07:57:26 -0400 (Sat, 14 Mar 2009) | 17 lines
Issue 5237, Allow auto-numbered replacement fields in str.format() strings.
For simple uses for str.format(), this makes the typing easier. Hopfully this
will help in the adoption of str.format().
For example:
'The {} is {}'.format('sky', 'blue')
You can mix and matcth auto-numbering and named replacement fields:
'The {} is {color}'.format('sky', color='blue')
But you can't mix and match auto-numbering and specified numbering:
'The {0} is {}'.format('sky', 'blue')
ValueError: cannot switch from manual field specification to automatic field numbering
Will port to 3.1.
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r65339 | amaury.forgeotdarc | 2008-07-31 23:28:03 +0200 (jeu., 31 juil. 2008) | 5 lines
#3479: unichr(2**32) used to return u'\x00'.
The argument was fetched in a long, but PyUnicode_FromOrdinal takes an int.
(why doesn't gcc issue a truncation warning in this case?)
........
r65340 | amaury.forgeotdarc | 2008-07-31 23:35:03 +0200 (jeu., 31 juil. 2008) | 2 lines
Remove a dummy test that was checked in by mistake
........
r65342 | amaury.forgeotdarc | 2008-08-01 01:39:05 +0200 (ven., 01 août 2008) | 8 lines
Correct a crash when two successive unicode allocations fail with a MemoryError:
the freelist contained half-initialized objects with freed pointers.
The comment
/* XXX UNREF/NEWREF interface should be more symmetrical */
was copied from tupleobject.c, and appears in some other places.
I sign the petition.
........
Added '#' formatting to integers. This adds the 0b, 0o, or 0x prefix for bin, oct, hex. There's still one failing case, and I need to finish the docs. I hope to finish those today.
The repr() of a string now contains printable Unicode characters unescaped.
The new ascii() builtin can be used to get a repr() with only ASCII characters in it.
PEP and patch were written by Atsuo Ishimoto.