subinterpreter to fix a bootstrap issue with codecs implemented in Python, as
the ISO-8859-15 codec.
Add fscodec_initialized attribute to the PyInterpreterState structure.
subinterpreter to fix a bootstrap issue with codecs implemented in Python, as
the ISO-8859-15 codec.
Add fscodec_initialized attribute to the PyInterpreterState structure.
When a string is encoded to UTF-8 in strict mode, the result is cached into the
object. Examples: str.encode(), str.encode('utf-8'), PyUnicode_AsUTF8String()
and PyUnicode_AsEncodedString(unicode, "utf-8", NULL).
Issue #11187: Remove bootstrap code (use ASCII) of
PyUnicode_AsEncodedString(), it was replaced by a better fallback (use
the locale encoding) in PyUnicode_EncodeFSDefault().
Prepare also empty sections in NEWS.
types. Added a new API function, PyUnicode_TransformDecimalToASCII(),
which transforms non-ASCII decimal digits in a Unicode string to their
ASCII equivalents.
* Specify the default encoding: write 'utf-8' instead of
sys.getdefaultencoding(), because the default encoding is now constant
* Specify the default errors value
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r86277 | eric.smith | 2010-11-06 15:27:37 -0400 (Sat, 06 Nov 2010) | 1 line
Added more to docstrings for str.format, format_map, and __format__.
........
the locale encoding. If the LANG (and LC_ALL and LC_CTYPE) environment variable
is not set, the locale encoding is ISO-8859-1, whereas most programs (including
Python) expect UTF-8. Python already uses UTF-8 for the filesystem encoding and
to encode command line arguments on this OS.
_Py_char2wchar() callers usually need the result size in characters. Since it's
trivial to compute it in _Py_char2wchar() (O(1) whereas wcslen() is O(n)), add
an option to get it.
* PyUnicode_EncodeFSDefault(), PyUnicode_DecodeFSDefaultAndSize() and
PyUnicode_DecodeFSDefault() use the locale encoding instead of UTF-8 if
Py_FileSystemDefaultEncoding is NULL
* redecode_filenames() functions and _Py_code_object_list (issue #9630)
are no more needed: remove them
Redecode the filenames of:
- all modules: __file__ and __path__ attributes
- all code objects: co_filename attribute
- sys.path
- sys.meta_path
- sys.executable
- sys.path_importer_cache (keys)
Keep weak references to all code objects until initfsencoding() is called, to
be able to redecode co_filename attribute of all code objects.
It's a ParseTuple converter: decode bytes objects to unicode using
PyUnicode_DecodeFSDefaultAndSize(); str objects are output as-is.
* Don't specify surrogateescape error handler in the comments nor the
documentation, but PyUnicode_DecodeFSDefaultAndSize() and
PyUnicode_EncodeFSDefault() because these functions use strict error handler
for the mbcs encoding (on Windows).
* Remove PyUnicode_FSConverter() comment in unicodeobject.c to avoid
inconsistency with unicodeobject.h.
svn+ssh://svn.python.org/python/branches/py3k
........
r83444 | georg.brandl | 2010-08-01 22:51:02 +0200 (So, 01 Aug 2010) | 1 line
Revert r83395, it introduces test failures and is not necessary anyway since we now have to nul-terminate the string anyway.
........
svn+ssh://svn.python.org/python/branches/py3k
........
r83395 | georg.brandl | 2010-08-01 10:49:18 +0200 (So, 01 Aug 2010) | 1 line
#8821: do not rely on Unicode strings being terminated with a \u0000, rather explicitly check range before looking for a second surrogate character.
........
r83417 | georg.brandl | 2010-08-01 20:38:26 +0200 (So, 01 Aug 2010) | 1 line
#5776: fix mistakes in python specfile. (Nobody probably uses it anyway.)
........
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r82413 | ezio.melotti | 2010-07-01 10:32:02 +0300 (Thu, 01 Jul 2010) | 13 lines
Update PyUnicode_DecodeUTF8 from RFC 2279 to RFC 3629.
1) #8271: when a byte sequence is invalid, only the start byte and all the
valid continuation bytes are now replaced by U+FFFD, instead of replacing
the number of bytes specified by the start byte.
See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
in behavior);
3) Change the error messages "unexpected code byte" to "invalid start byte"
and "invalid data" to "invalid continuation byte";
4) Add an extensive set of tests in test_unicode;
5) Fix test_codeccallbacks because it was failing after this change.
........
r82468 | ezio.melotti | 2010-07-03 07:52:19 +0300 (Sat, 03 Jul 2010) | 1 line
Update comment about surrogates.
........
1) #8271: when a byte sequence is invalid, only the start byte and all the
valid continuation bytes are now replaced by U+FFFD, instead of replacing
the number of bytes specified by the start byte.
See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
in behavior);
3) Change the error messages "unexpected code byte" to "invalid start byte"
and "invalid data" to "invalid continuation byte";
4) Add an extensive set of tests in test_unicode;
5) Fix test_codeccallbacks because it was failing after this change.
mode raises unicode errors. The encoder only supports "strict" and "replace"
error handlers, the decoder only supports "strict" and "ignore" error handlers.
svn+ssh://pythondev@svn.python.org/python/trunk
........
r81907 | antoine.pitrou | 2010-06-11 23:42:26 +0200 (ven., 11 juin 2010) | 5 lines
Issue #8941: decoding big endian UTF-32 data in UCS-2 builds could crash
the interpreter with characters outside the Basic Multilingual Plane
(higher than 0x10000).
........
object to Py_FileSystemDefaultEncoding with the "surrogateescape" error
handler, return a bytes object. If Py_FileSystemDefaultEncoding is not set,
fall back to UTF-8.
This function is only used to decode Python module filenames, but Python
doesn't support surrogates in modules filenames yet. So nobody noticed this
minor bug.
svn+ssh://pythondev@svn.python.org/python/trunk
........
r79494 | florent.xicluna | 2010-03-30 10:24:06 +0200 (mar, 30 mar 2010) | 2 lines
#7643: Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according to Unicode Standard Annex #14.
........
r79496 | florent.xicluna | 2010-03-30 18:29:03 +0200 (mar, 30 mar 2010) | 2 lines
Highlight the change of behavior related to r79494. Now VT and FF are linebreaks.
........
svn+ssh://pythondev@svn.python.org/python/branches/py3k
................
r79281 | victor.stinner | 2010-03-22 13:50:40 +0100 (lun., 22 mars 2010) | 16 lines
Merged revisions 79278,79280 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r79278 | victor.stinner | 2010-03-22 13:24:37 +0100 (lun., 22 mars 2010) | 2 lines
Issue #1583863: An unicode subclass can now override the __str__ method
........
r79280 | victor.stinner | 2010-03-22 13:36:28 +0100 (lun., 22 mars 2010) | 5 lines
Fix the NEWS about my last commit: an unicode subclass can now override the
__unicode__ method (and not the __str__ method).
Simplify also the testcase.
........
................
svn+ssh://pythondev@svn.python.org/python/trunk
........
r79278 | victor.stinner | 2010-03-22 13:24:37 +0100 (lun., 22 mars 2010) | 2 lines
Issue #1583863: An unicode subclass can now override the __str__ method
........
r79280 | victor.stinner | 2010-03-22 13:36:28 +0100 (lun., 22 mars 2010) | 5 lines
Fix the NEWS about my last commit: an unicode subclass can now override the
__unicode__ method (and not the __str__ method).
Simplify also the testcase.
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r77461 | antoine.pitrou | 2010-01-13 08:55:48 +0100 (mer., 13 janv. 2010) | 5 lines
Issue #7622: Improve the split(), rsplit(), splitlines() and replace()
methods of bytes, bytearray and unicode objects by using a common
implementation based on stringlib's fast search. Patch by Florent Xicluna.
........
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r77395 | benjamin.peterson | 2010-01-09 15:45:28 -0600 (Sat, 09 Jan 2010) | 2 lines
Python strings ending with '\0' should not be equivalent to their C counterparts in PyUnicode_CompareWithASCIIString
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r73871 | alexandre.vassalotti | 2009-07-06 22:17:30 -0400 (Mon, 06 Jul 2009) | 7 lines
Grow the allocated buffer in PyUnicode_EncodeUTF7 to avoid buffer overrun.
Without this change, test_unicode.UnicodeTest.test_codecs_utf7 crashes in
debug mode. What happens is the unicode string u'\U000abcde' with a length
of 1 encodes to the string '+2m/c3g-' of length 8. Since only 5 bytes is
reserved in the buffer, a buffer overrun occurs.
........
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r73698 | amaury.forgeotdarc | 2009-06-30 00:36:49 +0200 (mar., 30 juin 2009) | 7 lines
#6373: SystemError in str.encode('latin1', 'surrogateescape')
if the string contains unpaired surrogates.
(In debug build, crash in assert())
This can happen with normal processing, if python starts with utf-8,
then calls sys.setfilesystemencoding('latin-1')
........
if the string contains unpaired surrogates.
(In debug build, crash in assert())
This can happen with normal processing, if python starts with utf-8,
then calls sys.setfilesystemencoding('latin-1')
svn+ssh://pythondev@svn.python.org/python/trunk
........
r73190 | georg.brandl | 2009-06-04 01:23:45 +0200 (Do, 04 Jun 2009) | 2 lines
Avoid PendingDeprecationWarnings emitted by deprecated unittest methods.
........
r73213 | georg.brandl | 2009-06-04 12:15:57 +0200 (Do, 04 Jun 2009) | 1 line
#5967: note that the C slicing APIs do not support negative indices.
........
r73257 | georg.brandl | 2009-06-06 19:50:05 +0200 (Sa, 06 Jun 2009) | 1 line
#6211: elaborate a bit on ways to call the function.
........
r73258 | georg.brandl | 2009-06-06 19:51:31 +0200 (Sa, 06 Jun 2009) | 1 line
#6204: use a real reference instead of "see later".
........
r73260 | georg.brandl | 2009-06-06 20:21:58 +0200 (Sa, 06 Jun 2009) | 1 line
#6224: s/JPython/Jython/, and remove one link to a module nine years old.
........
r73275 | georg.brandl | 2009-06-07 22:37:52 +0200 (So, 07 Jun 2009) | 1 line
Add Ezio.
........
r73294 | georg.brandl | 2009-06-08 15:34:52 +0200 (Mo, 08 Jun 2009) | 1 line
#6194: O_SHLOCK/O_EXLOCK are not really more platform independent than lockf().
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r72283 | antoine.pitrou | 2009-05-04 20:32:32 +0200 (lun., 04 mai 2009) | 4 lines
Issue #4426: The UTF-7 decoder was too strict and didn't accept some legal sequences.
Patch by Nick Barnes and Victor Stinner.
........
r72284 | antoine.pitrou | 2009-05-04 20:32:50 +0200 (lun., 04 mai 2009) | 3 lines
Add Nick Barnes to ACKS.
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r72260 | walter.doerwald | 2009-05-04 00:36:33 +0200 (Mo, 04 Mai 2009) | 5 lines
Issue #5108: Handle %s like %S and %R in PyUnicode_FromFormatV(): Call
PyUnicode_DecodeUTF8() once, remember the result and output it in a second
step. This avoids problems with counting UTF-8 bytes that ignores the effect
of using the replace error handler in PyUnicode_DecodeUTF8().
........
Addresses the float -> string conversion, using David Gay's code which
was added in Mark Dickinson's checkin r71663.
Also addresses these, which are intertwined with the short repr
changes:
- Issue #5772: format(1e100, '<') produces '1e+100', not '1.0e+100'
- Issue #5515: 'n' formatting with commas no longer works poorly
with leading zeros.
- PEP 378 Format Specifier for Thousands Separator: implemented
for floats.
This is incomplete, but I want to get some version into the next alpha. I am still working on:
Documentation.
More tests.
Implement for floats.
In addition, there's an existing bug with 'n' formatting that carries forward to thousands grouping (issue 5515).
svn+ssh://pythondev@svn.python.org/python/trunk
........
r70499 | hirokazu.yamamoto | 2009-03-21 19:32:52 +0900 | 1 line
There is no macro named SIZEOF_SSIZE_T. Should use SIZEOF_SIZE_T instead.
........
sizeof(Py_UNICODE) == 2, PyUnicode_FromWideChar now converts
each character outside the BMP to the appropriate surrogate pair.
Thanks Victor Stinner for the patch.
common cases are optimized thanks to a dedicated fast path and a moderate
amount of loop unrolling.
This will especially help text I/O (we already register a 30% speedup on large
reads on the io-c branch).
Also fix len() to return number of items rather than length in bytes.
I'm sorry it was not possible for me to work on this without reindenting
a bit some stuff around. The indentation in memoryobject.c is a mess,
I'll open a separate bug for it.
The approach used is similiar to what is currently used in the version
of unicodeobject.c in Python 2.x. The only difference is we use
_PyBytes_Resize instead of _PyString_Resize.
svn+ssh://pythondev@svn.python.org/python/trunk
........
r67932 | alexandre.vassalotti | 2008-12-27 01:36:10 -0500 (Sat, 27 Dec 2008) | 5 lines
Remove unnecessary casts related to unicode_decode_call_errorhandler.
Make the _PyUnicode_Resize macro a static function.
These changes are needed to avoid breaking strict aliasing rules.
........
The patch fixes several issues with Py_NewInterpreter as well as the demo for multiple subinterpreters.
Most of the patch was written by MvL with help from Benjamin, Amaury and me. Graham Dumpleton has verified that this patch fixes an issue with mod_wsgi.
svn+ssh://pythondev@svn.python.org/python/trunk
........
r66748 | christian.heimes | 2008-10-02 21:47:50 +0200 (Thu, 02 Oct 2008) | 1 line
Fixed a couple more C99 comments and one occurence of inline.
........
+ another // comment in bytesobject
svn+ssh://pythondev@svn.python.org/python/trunk
........
r65654 | martin.v.loewis | 2008-08-12 16:49:50 +0200 (Tue, 12 Aug 2008) | 6 lines
Issue #3139: Make buffer-interface thread-safe wrt. PyArg_ParseTuple,
by denying s# to parse objects that have a releasebuffer procedure,
and introducing s*.
More module might need to get converted to use s*.
........
PyUnicode_AsStringAndSize -> _PyUnicode_AsStringAndSize to mark
them for interpreter internal use only.
We'll have to rework these APIs or create new ones for the
purpose of accessing the UTF-8 representation of Unicode objects
for 3.1.
svn+ssh://pythondev@svn.python.org/python/trunk
........
r65339 | amaury.forgeotdarc | 2008-07-31 23:28:03 +0200 (jeu., 31 juil. 2008) | 5 lines
#3479: unichr(2**32) used to return u'\x00'.
The argument was fetched in a long, but PyUnicode_FromOrdinal takes an int.
(why doesn't gcc issue a truncation warning in this case?)
........
r65340 | amaury.forgeotdarc | 2008-07-31 23:35:03 +0200 (jeu., 31 juil. 2008) | 2 lines
Remove a dummy test that was checked in by mistake
........
r65342 | amaury.forgeotdarc | 2008-08-01 01:39:05 +0200 (ven., 01 août 2008) | 8 lines
Correct a crash when two successive unicode allocations fail with a MemoryError:
the freelist contained half-initialized objects with freed pointers.
The comment
/* XXX UNREF/NEWREF interface should be more symmetrical */
was copied from tupleobject.c, and appears in some other places.
I sign the petition.
........
The repr() of a string now contains printable Unicode characters unescaped.
The new ascii() builtin can be used to get a repr() with only ASCII characters in it.
PEP and patch were written by Atsuo Ishimoto.
Use faster PyUnicode_FromEncodedObject() for bytes/bytearray.decode().
Add new PyCodec_KnownEncoding() API.
Add new PyUnicode_AsDecodedUnicode() and PyUnicode_AsEncodedUnicode() APIs.
Add missing PyUnicode_AsDecodedObject() to unicodeobject.h
Fix punicode codec to also work on memoryviews.
svn+ssh://pythondev@svn.python.org/python/trunk
When forward porting this, I added _PyUnicode_InsertThousandsGrouping.
........
r63078 | eric.smith | 2008-05-11 15:52:48 -0400 (Sun, 11 May 2008) | 14 lines
Addresses issue 2802: 'n' formatting for integers.
Adds 'n' as a format specifier for integers, to mirror the same
specifier which is already available for floats. 'n' is the same as
'd', but inserts the current locale-specific thousands grouping.
I added this as a stringlib function, but it's only used by str type,
not unicode. This is because of an implementation detail in
unicode.format(), which does its own str->unicode conversion. But the
unicode version will be needed in 3.0, and it may be needed by other
code eventually in 2.6 (maybe decimal?), so I left it as a stringlib
implementation. As long as the unicode version isn't instantiated,
there's no overhead for this.
........
svn+ssh://pythondev@svn.python.org/python/trunk
........
r62260 | gregory.p.smith | 2008-04-10 01:11:56 +0200 (Thu, 10 Apr 2008) | 2 lines
better diagnostics
........
r62261 | gregory.p.smith | 2008-04-10 01:16:37 +0200 (Thu, 10 Apr 2008) | 3 lines
Raise SystemError when size < 0 is passed into PyString_FromStringAndSize,
PyBytes_FromStringAndSize or PyUnicode_FromStringAndSize. [issue2587]
........
r62266 | neal.norwitz | 2008-04-10 07:46:39 +0200 (Thu, 10 Apr 2008) | 5 lines
Remove the test file before writing it in case there is no write permission.
This might help fix some of the failures on Windows box(es). It doesn't hurt
either way and ensure the tests are a little more self contained (ie have
less assumptions).
........
r62271 | gregory.p.smith | 2008-04-10 21:50:36 +0200 (Thu, 10 Apr 2008) | 2 lines
get rid of assert (size >= 0) now that an explicit if (size < 0) is in the code.
........
r62277 | andrew.kuchling | 2008-04-10 23:27:10 +0200 (Thu, 10 Apr 2008) | 1 line
Remove forward-looking statement
........
r62278 | andrew.kuchling | 2008-04-10 23:28:51 +0200 (Thu, 10 Apr 2008) | 1 line
Add punctuation
........
r62279 | andrew.kuchling | 2008-04-10 23:29:01 +0200 (Thu, 10 Apr 2008) | 1 line
Use issue directive
........
r62289 | thomas.heller | 2008-04-11 15:05:38 +0200 (Fri, 11 Apr 2008) | 3 lines
Move backwards compatibility macro to the correct place;
PyIndex_Check() was introduced in Python 2.5.
........
r62290 | thomas.heller | 2008-04-11 16:20:26 +0200 (Fri, 11 Apr 2008) | 2 lines
Performance improvements.
........
r62293 | christian.heimes | 2008-04-12 15:03:03 +0200 (Sat, 12 Apr 2008) | 2 lines
Applied patch #2617 from Frank Wierzbicki wit some extras from me
-J and -X are now reserved for Jython and non-standard arguments (e.g. IronPython). I've added some extra comments to make sure the reservation don't get missed in the future.
........
r62294 | georg.brandl | 2008-04-12 20:11:18 +0200 (Sat, 12 Apr 2008) | 2 lines
Use absolute path in sys.path.
........
r62295 | georg.brandl | 2008-04-12 20:36:09 +0200 (Sat, 12 Apr 2008) | 2 lines
#2615: small consistency update by Jeroen Ruigrok van der Werven.
........
r62296 | georg.brandl | 2008-04-12 21:00:20 +0200 (Sat, 12 Apr 2008) | 2 lines
Add Jeroen.
........
r62297 | georg.brandl | 2008-04-12 21:05:37 +0200 (Sat, 12 Apr 2008) | 2 lines
Don't offend snake lovers.
........
r62298 | gregory.p.smith | 2008-04-12 22:37:48 +0200 (Sat, 12 Apr 2008) | 2 lines
fix compiler warnings
........
r62302 | gregory.p.smith | 2008-04-13 00:24:04 +0200 (Sun, 13 Apr 2008) | 3 lines
socket.error inherits from IOError, it no longer needs listing in
the all_errors tuple.
........
r62303 | brett.cannon | 2008-04-13 01:44:07 +0200 (Sun, 13 Apr 2008) | 8 lines
Re-implement the 'warnings' module in C. This allows for usage of the
'warnings' code in places where it was previously not possible (e.g., the
parser). It could also potentially lead to a speed-up in interpreter start-up
if the C version of the code (_warnings) is imported over the use of the
Python version in key places.
Closes issue #1631171.
........
r62304 | gregory.p.smith | 2008-04-13 02:03:25 +0200 (Sun, 13 Apr 2008) | 3 lines
Adds a profile-opt target for easy compilation of a python binary using
gcc's profile guided optimization.
........
r62305 | brett.cannon | 2008-04-13 02:18:44 +0200 (Sun, 13 Apr 2008) | 3 lines
Fix a bug in PySys_HasWarnOption() where it was not properly checking the
length of the list storing the warning options.
........
r62306 | brett.cannon | 2008-04-13 02:25:15 +0200 (Sun, 13 Apr 2008) | 2 lines
Fix an accidental bug of an non-existent init function.
........
r62308 | andrew.kuchling | 2008-04-13 03:05:59 +0200 (Sun, 13 Apr 2008) | 1 line
Mention -J, -X
........
r62311 | benjamin.peterson | 2008-04-13 04:20:05 +0200 (Sun, 13 Apr 2008) | 2 lines
Give the "Interactive Interpreter Changes" section in 2.6 whatsnew a unique link name
........
r62313 | brett.cannon | 2008-04-13 04:42:36 +0200 (Sun, 13 Apr 2008) | 3 lines
Fix test_warnings by making the state of things more consistent for each test
when it is run.
........
r62314 | skip.montanaro | 2008-04-13 05:17:30 +0200 (Sun, 13 Apr 2008) | 2 lines
spelling
........
r62315 | georg.brandl | 2008-04-13 09:07:44 +0200 (Sun, 13 Apr 2008) | 2 lines
Fix markup.
........
r62319 | christian.heimes | 2008-04-13 11:30:17 +0200 (Sun, 13 Apr 2008) | 1 line
Fix compiler warning Include/warnings.h:19:28: warning: no newline at end of file
........
r62320 | christian.heimes | 2008-04-13 11:33:24 +0200 (Sun, 13 Apr 2008) | 1 line
Use PyString_InternFromString instead of PyString_FromString for static vars
........
r62321 | christian.heimes | 2008-04-13 11:37:05 +0200 (Sun, 13 Apr 2008) | 1 line
Added new files to the pcbuild files
........