the locale encoding. If the LANG (and LC_ALL and LC_CTYPE) environment variable
is not set, the locale encoding is ISO-8859-1, whereas most programs (including
Python) expect UTF-8. Python already uses UTF-8 for the filesystem encoding and
to encode command line arguments on this OS.
_Py_char2wchar() callers usually need the result size in characters. Since it's
trivial to compute it in _Py_char2wchar() (O(1) whereas wcslen() is O(n)), add
an option to get it.
* PyUnicode_EncodeFSDefault(), PyUnicode_DecodeFSDefaultAndSize() and
PyUnicode_DecodeFSDefault() use the locale encoding instead of UTF-8 if
Py_FileSystemDefaultEncoding is NULL
* redecode_filenames() functions and _Py_code_object_list (issue #9630)
are no more needed: remove them
makeunicodedata.py: download all data files from unicode.org,
switch to extracting Unihan data from zip file.
Read linebreakprops and derivednormalizationprops even for
old versions, even though they are not used in delta records.
test:unicode.py: U+11000 is now assigned, use U+14000 instead.
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r85154 | benjamin.peterson | 2010-10-01 19:03:31 -0500 (Fri, 01 Oct 2010) | 1 line
type.__abstractmethods__ should raise an AttributeError #10006
........
r85182 | benjamin.peterson | 2010-10-02 12:55:47 -0500 (Sat, 02 Oct 2010) | 1 line
add a test and a note about metaclasses now being abcs
........
Redecode the filenames of:
- all modules: __file__ and __path__ attributes
- all code objects: co_filename attribute
- sys.path
- sys.meta_path
- sys.executable
- sys.path_importer_cache (keys)
Keep weak references to all code objects until initfsencoding() is called, to
be able to redecode co_filename attribute of all code objects.
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r84984 | mark.dickinson | 2010-09-23 21:11:19 +0100 (Thu, 23 Sep 2010) | 5 lines
Issue #9930: Remove an unnecessary type check in wrap_binaryfunc_r;
this was causing reversed method calls like float.__radd__(3.0, 1) to
return NotImplemented instead of the expected numeric value.
........
if the format string is not empty. Manually merge r79596 and r84772
from 2.x.
Also, apparently test_format() from test_builtin never made it into
3.x. I've added it as well. It tests the basic format()
infrastructure.
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r84629 | senthil.kumaran | 2010-09-08 18:20:29 +0530 (Wed, 08 Sep 2010) | 3 lines
Revert the doc change done in r83880. str.replace with negative count value is not a feature.
........
Database (Py_UNICODE_TOLOWER, Py_UNICODE_ISDECIMAL, and others) now accept
and return characters from the full Unicode range (Py_UCS4).
The differences from Python code are few:
- unicodedata.numeric(), unicodedata.decimal() and unicodedata.digit()
now return the correct value for large code points
- repr() may consider more characters as printable.
... to get the filename as a unicode object, instead of a byte string. Function
needed to support unencodable filenames. Deprecate PyModule_GetFilename() in
favor on the new function.
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r84070 | antoine.pitrou | 2010-08-15 19:12:55 +0200 (dim., 15 août 2010) | 5 lines
Fix some compilation warnings under 64-bit Windows (issue #9566).
Some of these are genuine bugs with objects bigger than 2GB, but
my system doesn't allow me to write tests for it.
........
r84074 | antoine.pitrou | 2010-08-15 19:41:31 +0200 (dim., 15 août 2010) | 3 lines
Fix (harmless) warning with MSVC.
........
It's a ParseTuple converter: decode bytes objects to unicode using
PyUnicode_DecodeFSDefaultAndSize(); str objects are output as-is.
* Don't specify surrogateescape error handler in the comments nor the
documentation, but PyUnicode_DecodeFSDefaultAndSize() and
PyUnicode_EncodeFSDefault() because these functions use strict error handler
for the mbcs encoding (on Windows).
* Remove PyUnicode_FSConverter() comment in unicodeobject.c to avoid
inconsistency with unicodeobject.h.
svn+ssh://svn.python.org/python/branches/py3k
........
r83444 | georg.brandl | 2010-08-01 22:51:02 +0200 (So, 01 Aug 2010) | 1 line
Revert r83395, it introduces test failures and is not necessary anyway since we now have to nul-terminate the string anyway.
........
svn+ssh://svn.python.org/python/branches/py3k
........
r83395 | georg.brandl | 2010-08-01 10:49:18 +0200 (So, 01 Aug 2010) | 1 line
#8821: do not rely on Unicode strings being terminated with a \u0000, rather explicitly check range before looking for a second surrogate character.
........
r83417 | georg.brandl | 2010-08-01 20:38:26 +0200 (So, 01 Aug 2010) | 1 line
#5776: fix mistakes in python specfile. (Nobody probably uses it anyway.)
........
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r83400 | mark.dickinson | 2010-08-01 11:41:49 +0100 (Sun, 01 Aug 2010) | 7 lines
Issue #9416: Fix some issues with complex formatting where the
output with no type specifier failed to match the str output:
- format(complex(-0.0, 2.0), '-') omitted the real part from the output,
- format(complex(0.0, 2.0), '-') included a sign and parentheses.
........
output with no type specifier failed to match the str output:
- format(complex(-0.0, 2.0), '-') omitted the real part from the output,
- format(complex(0.0, 2.0), '-') included a sign and parentheses.
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r82413 | ezio.melotti | 2010-07-01 10:32:02 +0300 (Thu, 01 Jul 2010) | 13 lines
Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.
1) #8271: when a byte sequence is invalid, only the start byte and all the
valid continuation bytes are now replaced by U+FFFD, instead of replacing
the number of bytes specified by the start byte.
See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
in behavior);
3) Change the error messages "unexpected code byte" to "invalid start byte"
and "invalid data" to "invalid continuation byte";
4) Add an extensive set of tests in test_unicode;
5) Fix test_codeccallbacks because it was failing after this change.
........
r82468 | ezio.melotti | 2010-07-03 07:52:19 +0300 (Sat, 03 Jul 2010) | 1 line
Update comment about surrogates.
........
1) #8271: when a byte sequence is invalid, only the start byte and all the
valid continuation bytes are now replaced by U+FFFD, instead of replacing
the number of bytes specified by the start byte.
See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
in behavior);
3) Change the error messages "unexpected code byte" to "invalid start byte"
and "invalid data" to "invalid continuation byte";
4) Add an extensive set of tests in test_unicode;
5) Fix test_codeccallbacks because it was failing after this change.
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81465 | georg.brandl | 2010-05-22 06:29:19 -0500 (Sat, 22 May 2010) | 2 lines
Issue #3924: Ignore cookies with invalid "version" field in cookielib.
........
r81466 | georg.brandl | 2010-05-22 06:31:16 -0500 (Sat, 22 May 2010) | 1 line
Underscore the name of an internal utility function.
........
r81468 | georg.brandl | 2010-05-22 06:43:25 -0500 (Sat, 22 May 2010) | 1 line
#8635: document enumerate() start parameter in docstring.
........
r81679 | benjamin.peterson | 2010-06-03 16:21:03 -0500 (Thu, 03 Jun 2010) | 1 line
use a set for membership testing
........
r81735 | michael.foord | 2010-06-05 06:46:59 -0500 (Sat, 05 Jun 2010) | 1 line
Extract error message truncating into a method (unittest.TestCase._truncateMessage).
........
r81760 | michael.foord | 2010-06-05 14:38:42 -0500 (Sat, 05 Jun 2010) | 1 line
Issue 8302. SkipTest exception is setUpClass or setUpModule is now reported as a skip rather than an error.
........
r81868 | benjamin.peterson | 2010-06-09 14:45:04 -0500 (Wed, 09 Jun 2010) | 1 line
fix code formatting
........
r82183 | benjamin.peterson | 2010-06-23 15:29:26 -0500 (Wed, 23 Jun 2010) | 1 line
cpython only gc tests
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r82157 | benjamin.peterson | 2010-06-22 14:16:37 -0500 (Tue, 22 Jun 2010) | 1 line
remove INT_MAX assertions; they can fail with large Py_ssize_t #9058
........
mode raises unicode errors. The encoder only supports "strict" and "replace"
error handlers, the decoder only supports "strict" and "ignore" error handlers.
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81907 | antoine.pitrou | 2010-06-11 23:42:26 +0200 (ven., 11 juin 2010) | 5 lines
Issue #8941: decoding big endian UTF-32 data in UCS-2 builds could crash
the interpreter with characters outside the Basic Multilingual Plane
(higher than 0x10000).
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81820 | benjamin.peterson | 2010-06-07 17:23:23 -0500 (Mon, 07 Jun 2010) | 1 line
correctly overflow when indexes are too large
........
(instances of int, float, complex, decimal.Decimal and
fractions.Fraction) that makes it easy to maintain the invariant that
hash(x) == hash(y) whenever x and y have equal value.
objects. (1) The comparison could incorrectly return True in some cases
(2**53+1 == complex(2**53) == 2**53), breaking transivity of equality.
(2) The comparison raised an OverflowError for large integers, leading
to unpredictable exceptions when combining integers and complex objects
in sets or dicts.
Patch by Meador Inge.
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r81250 | victor.stinner | 2010-05-17 03:13:37 +0200 (lun., 17 mai 2010) | 2 lines
Issue #6697: Fix a crash if code of "python -c code" contains surrogates
........
r81251 | victor.stinner | 2010-05-17 03:26:01 +0200 (lun., 17 mai 2010) | 3 lines
PyObject_Dump() encodes unicode objects to utf8 with backslashreplace (instead
of strict) error handler to escape surrogates
........
r81252 | victor.stinner | 2010-05-17 10:58:51 +0200 (lun., 17 mai 2010) | 6 lines
handle_system_exit() flushs files to warranty the output order
PyObject_Print() writes into the C object stderr, whereas PySys_WriteStderr()
writes into the Python object sys.stderr. Each object has its own buffer, so
call sys.stderr.flush() and fflush(stderr).
........
r81253 | victor.stinner | 2010-05-17 11:33:42 +0200 (lun., 17 mai 2010) | 6 lines
Fix refleak in internal_print() introduced by myself in r81251
_PyUnicode_AsDefaultEncodedString() uses a magical PyUnicode attribute to
automatically destroy PyUnicode_EncodeUTF8() result when the unicode string is
destroyed.
........
_PyUnicode_AsDefaultEncodedString() uses a magical PyUnicode attribute to
automatically destroy PyUnicode_EncodeUTF8() result when the unicode string is
destroyed.
object to Py_FileSystemDefaultEncoding with the "surrogateescape" error
handler, return a bytes object. If Py_FileSystemDefaultEncoding is not set,
fall back to UTF-8.
http://code.google.com/p/data-race-test/wiki/ThreadSanitizer is a dynamic data
race detector that runs on top of valgrind. With this patch, the binaries at
http://code.google.com/p/data-race-test/wiki/ThreadSanitizer#Binaries pass many
but not all of the Python tests. All of regrtest still passes outside of tsan.
I've implemented part of the C1x atomic types so that we can explicitly mark
variables that are used across threads, and get defined behavior as compilers
advance.
I've added tsan's client header and implementation to the codebase in
dynamic_annotations.{h,c} (docs at
http://code.google.com/p/data-race-test/wiki/DynamicAnnotations).
Unfortunately, I haven't been able to get helgrind and drd to give sensible
error messages, even when I use their client annotations, so I'm not supporting
them.
This function is only used to decode Python module filenames, but Python
doesn't support surrogates in modules filenames yet. So nobody noticed this
minor bug.
svn+ssh://pythondev@svn.python.org/python/trunk
........
r79843 | mark.dickinson | 2010-04-06 17:46:09 +0100 (Tue, 06 Apr 2010) | 4 lines
Issue #8259: Get rid of 'outrageous left shift count' error when
left-shifting an integer by more than 2**31 on a 64-bit machine. Also
convert shift counts to a Py_ssize_t instead of a C long.
........
r79844 | mark.dickinson | 2010-04-06 17:47:55 +0100 (Tue, 06 Apr 2010) | 1 line
Misc/NEWS entry for r79843.
........
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r78918 | mark.dickinson | 2010-03-13 11:34:40 +0000 (Sat, 13 Mar 2010) | 4 lines
Issue #8014: Fix PyLong_As<c-integer-type> methods not to produce an
internal error on non-integer input: they now raise TypeError instead.
This is needed for attributes declared via PyMemberDefs.
........
r78920 | mark.dickinson | 2010-03-13 13:23:05 +0000 (Sat, 13 Mar 2010) | 3 lines
Issue #8014: Fix incorrect error checks in structmember.c, and re-enable
previously failing test_structmember.py tests.
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r79809 | mark.dickinson | 2010-04-05 19:54:51 +0100 (Mon, 05 Apr 2010) | 1 line
Use a better NaN test in _Py_HashDouble as well.
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r79804 | mark.dickinson | 2010-04-05 19:07:51 +0100 (Mon, 05 Apr 2010) | 5 lines
Use a more robust infinity check in _Py_HashDouble.
This fixes a test_decimal failure on FreeBSD 8.0. (modf apparently
doesn't follow C99 Annex F on FreeBSD.)
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r79494 | florent.xicluna | 2010-03-30 10:24:06 +0200 (mar, 30 mar 2010) | 2 lines
#7643: Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according to Unicode Standard Annex #14.
........
r79496 | florent.xicluna | 2010-03-30 18:29:03 +0200 (mar, 30 mar 2010) | 2 lines
Highlight the change of behavior related to r79494. Now VT and FF are linebreaks.
........