The fix for issue 4050 caused a regression: before that fix, source
lines in the linecache would eventually be found by inspect. After the
fix inspect reports an error earlier, and the source isn't found.
The fix for the fix is to have getsourcefile look in the linecache for
the file and return the psuedo-filename if the source is there, just as
it already returns it if there is a PEP 302 loader.
1) #8271: when a byte sequence is invalid, only the start byte and all the
valid continuation bytes are now replaced by U+FFFD, instead of replacing
the number of bytes specified by the start byte.
See http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf (pages 94-95);
2) 5- and 6-bytes-long UTF-8 sequences are now considered invalid (no changes
in behavior);
3) Add code and tests to reject surrogates (U+D800-U+DFFF) as defined in
RFC 3629, but leave it commented out since it's not backward compatible;
4) Change the error messages "unexpected code byte" to "invalid start byte"
and "invalid data" to "invalid continuation byte";
5) Add an extensive set of tests in test_unicode;
6) Fix test_codeccallbacks because it was failing after this change.
Previously, unexpected results occurred when email was passed, for example,
'utf8' as a charset name, since email would accept it but would *not* use
the 'utf-8' codec for it, even though Python itself recognises that as
an alias for utf-8. Now Charset checks with codecs for aliases as well
as its own internal table. Issue 8898 has been opened to change this
further in py3k so that all aliasing is routed through the codecs module.
FreeBSD doesn't have socket.EAI_NODATA. I rewrote the routine because
there's no easy way to conditionally include a context manager in a
with statement. As a side benefit, instead of a stack of context
managers there's now only one.
If a body part ended with \r\n, feedparser, using '$' to terminate its
search for the newline, would match on the \r\n, and think that it needed
to strip two characters in order to account for the line end before the
boundary. That made it chop one too many characters off the end of
the body part. Using \Z makes the match correct.
Patch and test by Tony Nelson.
the curses module must be linked against ncurses as well. Otherwise it
is not safe to load both the readline and curses modules in an application.
Thanks Thomas Dickey for answering questions about ncurses/ncursesw
and readline!
for extracting symbolic and hard link entries as regular files as a
work-around on platforms that do not support filesystem links.
This stopped working reliably after a change in r74571. I also added
a few tests for this functionality.
honor the MacOSX SDK when one is specified.
This is needed to be able to build using the 10.4u SDK while running
on OSX 10.6.
This is a fixed version of the patch in r80963, I've tested this patch
on OSX and Linux.
Fixes (mysterious, to the end user) UnicodeErrors when using utf-8 as
the charset and unicode as the _text argument. Also makes the way in
which unicode gets encoded to quoted printable for other charsets more
sane (it only worked by accident previously). The _payload now is encoded
to the charset.output_charset if it is unicode.
unquote is duplicated in the two files to avoid a circular reference.
(This is fixed in Python3.) Updates keep getting made to the public unquote
without fixing the urlparse one, however, so this fix syncs the two
and adds a comment to both to make sure changes are applied to both.
* Fix seek() method of codecs.open(), don't write the BOM twice after seek(0)
* Fix reset() method of codecs, UTF-16, UTF-32 and StreamWriter classes
* test_codecs: use "w+" mode instead of "wt+". "t" mode is not supported by
Solaris or Windows, but does it really exist? I found it the in the issue.
(e.g. from .os import sep) and it failed, import would still try the implicit
relative import semantics of an absolute import (from os import sep). That's
not right, so when level is negative, only do explicit relative import
semantics.
Fixes issue #7902. Thanks to Meador Inge for the patch.
Forward port some code from Python3:
* join surrogate pairs if sizeof(Py_UNICODE)==2
* Enable non-BMP test on narrow builds using u"\U0001D121" instead of
unichr(0x1D121)
interpreter shutdown semantics. Same issue goes for the methods that __del__
called. Now all the methods capture the global objects it needs as default
values to private parameters (could have stuck them on the class object itself,
but since the objects have nothing directly to do with the class that seemed
wrong).
There is no test as making one that works is hard. This patch was
verified against a consistently failing test in Mercurial's test suite, though,
so it has been tested in some regard.
Closes issue #5099. Thanks to Mary Stern for the bug report and Gabriel
Genellina for writing another patch for the same issue and attempting to write
a test.