- compile() didn't return a (empty) list of objects. Fixed.
- the various _fix_xxx_args() methods weren't called (are they new or did I overlook them?). Fixed.
write a BOM at the start of the stream and also to only read it as
BOM at the start of a stream.
Subsequent reading/writing of BOMs will read/write the BOM as ZWNBSP
character. This is in sync with the Unicode specifications.
Note that UTF-16 files will now *have* to start with a BOM mark
in order to be readable by the codec.
addition of interface for consistency with base64 module. Namely,
encodestring(), decodestring(): New functions which accept a string
object and return a string object. They just wrap the string in
StringIOs and pass them to the encode() and decode() methods
respectively. encodestring() accepts a default argument of quotetabs,
defaulting to zero, which is passed on straight through to encode().
encode(): Fix the bug where an extra newline would always be added to
the output, which prevented an idempotent roundtrip through
encode->decode. Now, if the source string doesn't end in a newline,
then the result string won't end in a newline.
Also, extend the quotetabs argument semantics to include quoting
embedded strings, which is also optional according to the RFC.
test() -> main()
"from quopri import *" also imports encodestring() and decodestring().
i_divmod: New and simpler algorithm. Old one returned gibberish on most
boxes when the numerator was -sys.maxint-1. Oddly enough, it worked in the
release (not debug) build on Windows, because the compiler optimized away
some tricky sign manipulations that were incorrect in this case.
Makes you wonder <wink> ...
Bugfix candidate.
Summary: NAMESPACE support in imaplib.py
Initial Comment:
Support for the IMAP NAMESPACE extension defined in rfc
2342. This is almost a necessity for working with
modern IMAP servers.
Unfortunately, the std-mode bBhHIL codes don't do any range-checking; if
and when some of those get fixed, remove their letters from the
IntTester.BUGGY_RANGE_CHECK string. In the meantime, a msg saying that
range-tests are getting skipped is printed to stdout whenever one is
skipped.
This completes the q/Q project.
longobject.c _PyLong_AsByteArray: The original code had a gross bug:
the most-significant Python digit doesn't necessarily have SHIFT
significant bits, and you really need to count how many copies of the sign
bit it has else spurious overflow errors result.
test_struct.py: This now does exhaustive std q/Q testing at, and on both
sides of, all relevant power-of-2 boundaries, both positive and negative.
NEWS: Added brief dict news while I was at it.
native mode, and only when config #defines HAVE_LONG_LONG. Standard mode
will eventually treat them as 8-byte ints across all platforms, but that
likely requires a new set of routines in longobject.c first (while
sizeof(long) >= 4 is guaranteed by C, there's nothing in C we can rely
on x-platform to hold 8 bytes of int, so we'll have to roll our own;
I'm thinking of a simple pair of conversion functions, Python long
to/from sized vector of unsigned bytes; that may be useful for GMP
conversions too; std q/Q would call them with size fixed at 8).
test_struct.py: In addition to adding some native-mode 'q' and 'Q' tests,
got rid of unused code, and repaired a non-portable assumption about
native sizeof(short) (it isn't 2 on some Cray boxes).
libstruct.tex: In addition to adding a bit of 'q'/'Q' docs (more needed
later), removed an erroneous footnote about 'I' behavior.
Patch from Michael Hundson.
format_exception_only() blew up when trying to report a SyntaxError
from a string input (line is None in this case, but it assumed a string).
Bugfix candidate.
Armin Rigo pointed out that the way the line-# table got built didn't work
for lines generating more than 255 bytes of bytecode. Fixed as he
suggested, plus corresponding changes to pyassem.py, plus added some
long overdue docs about this subtle table to compile.c.
Bugfix candidate (line numbers may be off in tracebacks under -O).
case of objects with equal types which support tp_compare. Give
type objects a tp_compare function.
Also add c<0 tests before a few PyErr_Occurred tests.
Ensure that all the default timers are called as functions, not an
expensive method wrapper around a variety of different functions.
Agressively avoid dictionary lookups.
Modify the dispatch scheme (Profile.trace_dispatch_*(), where * is not
'call', 'exception' or 'return') so that the callables dispatched to
are simple functions and not bound methods -- this reduces the number
of layers of Python call machinery that gets touched.
Remove a couple of duplicate imports from the "if __name__ == ..."
section.
This closes SF patch #430948.
be possible to provoke unbounded recursion now, but leaving that to someone
else to provoke and repair.
Bugfix candidate -- although this is getting harder to backstitch, and the
cases it's protecting against are mondo contrived.
random inputs: if you ran the test 100 times, you could expect it to
report a bogus failure. So loosened its expectations.
Also changed the way failing tests are printed, so that when run under
regrtest.py we get enough info to reproduce the failure.
exactly once. But the test code can't know that, as the number of times
__cmp__ is called depends on internal details of the dict implementation.
This is especially nasty because the __hash__ method returns the address
of the class object, so the hash codes seen by the dict can vary across
runs, causing the dict to use a different probe order across runs. I
just happened to see this test fail about 1 run in 7 today, but only
under a release build and when passing -O to Python. So, changed the test
to be predictable across runs.
name of the test, only write the output file if it already exists (and
tell the user to consider removing it). This avoids the generation of
unnecessary turds.
updatecache(): When using imputil, sys.path may contain things other than
strings. Ignore such things instead of blowing up.
Hard to say whether this is a bugfix or a feature ...
dictresize() was too aggressive about never ever resizing small dicts.
If a small dict is entirely full, it needs to rebuild it despite that
it won't actually resize it, in order to purge old dummy entries thus
creating at least one virgin slot (lookdict assumes at least one such
exists).
Also took the opportunity to add some high-level comments to dictresize.
setdefault() the empty string. In setdefault(), use + to join the value
to create the entry for the headers attribute so that TypeError is raised
if the value is of the wrong type.
When regrtest.py finds an attribute "test_main" in a test it imports,
regrtest runs the test's test_main after the import. test_threaded_import
needs this else the cross-thread import lock prevents it from making
progress. Other tests can use this hack too, but I doubt it will ever be
popular.
ICK ALERT: read the long comment block before run_the_test(). It was
almost impossible to get this to run without instant deadlock, and the
solution here sucks on several counts. If you can dream up a better way,
let me know!
basically accept <!...> where the dots can be single- or double-quoted
strings or any other character except >.
Background: I found a real-life example that failed to parse with
the old assumption: http://www.opensource.org/licenses/jabberpl.html
contains a few constructs of the form <![if !supportLists]>...<![endif]>.
derived from but not quite compatible with that of sgmllib, so it's a
new file. I suppose it needs documentation, and htmllib needs to be
changed to use this instead of sgmllib, and sgmllib needs to be
declared obsolete. But that can all be done later.
This code was first published as part of TAL (part of Zope Page
Templates), but that was strongly based on sgmllib anyway. Authors
are Fred drake and Guido van Rossum.
codec files to codecs.py and added logic so that multi mappings
in the decoding maps now result in mappings to None (undefined mapping)
in the encoding maps.
and introduces a new method .decode().
The major change is that strg.encode() will no longer try to convert
Unicode returns from the codec into a string, but instead pass along
the Unicode object as-is. The same is now true for all other codec
return types. The underlying C APIs were changed accordingly.
Note that even though this does have the potential of breaking
existing code, the chances are low since conversion from Unicode
previously took place using the default encoding which is normally
set to ASCII rendering this auto-conversion mechanism useless for
most Unicode encodings.
The good news is that you can now use .encode() and .decode() with
much greater ease and that the door was opened for better accessibility
of the builtin codecs.
As demonstration of the new feature, the patch includes a few new
codecs which allow string to string encoding and decoding (rot13,
hex, zip, uu, base64).
Written by Marc-Andre Lemburg. Copyright assigned to the PSF.
*are* obsolete; three variables and the maketrans() function are not
(yet) obsolete.
Add a compensating warnings.filterwarnings() call to test_strop.py.
Add this to the NEWS.
elements when crunching a list, dict or tuple. Now takes linear time
instead -- huge speedup for even moderately large containers, and the
code is notably simpler too.
Added some basic "is the output correct?" tests to test_pprint.
The comment following used to say:
/* We use ~hash instead of hash, as degenerate hash functions, such
as for ints <sigh>, can have lots of leading zeros. It's not
really a performance risk, but better safe than sorry.
12-Dec-00 tim: so ~hash produces lots of leading ones instead --
what's the gain? */
That is, there was never a good reason for doing it. And to the contrary,
as explained on Python-Dev last December, it tended to make the *sum*
(i + incr) & mask (which is the first table index examined in case of
collison) the same "too often" across distinct hashes.
Changing to the simpler "i = hash & mask" reduced the number of string-dict
collisions (== # number of times we go around the lookup for-loop) from about
6 million to 5 million during a full run of the test suite (these are
approximate because the test suite does some random stuff from run to run).
The number of collisions in non-string dicts also decreased, but not as
dramatically.
Note that this may, for a given dict, change the order (wrt previous
releases) of entries exposed by .keys(), .values() and .items(). A number
of std tests suffered bogus failures as a result. For dicts keyed by
small ints, or (less so) by characters, the order is much more likely to be
in increasing order of key now; e.g.,
>>> d = {}
>>> for i in range(10):
... d[i] = i
...
>>> d
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
>>>
Unfortunately. people may latch on to that in small examples and draw a
bogus conclusion.
test_support.py
Moved test_extcall's sortdict() into test_support, made it stronger,
and imported sortdict into other std tests that needed it.
test_unicode.py
Excluced cp875 from the "roundtrip over range(128)" test, because
cp875 doesn't have a well-defined inverse for unicode("?", "cp875").
See Python-Dev for excruciating details.
Cookie.py
Chaged various output functions to sort dicts before building
strings from them.
test_extcall
Fiddled the expected-result file. This remains sensitive to native
dict ordering, because, e.g., if there are multiple errors in a
keyword-arg dict (and test_extcall sets up many cases like that), the
specific error Python complains about first depends on native dict
ordering.
A Mystery: test_mutants ran amazingly slowly even before dictobject.c
"got fixed". I don't have a clue as to why. dict comparison was and
remains linear-time in the size of the dicts, and test_mutants only tries
100 dict pairs, of size averaging just 50. So "it should" run in less than
an eyeblink; but it takes at least a second on this 800MHz box.
Fixed a half dozen ways in which general dict comparison could crash
Python (even cause Win98SE to reboot) in the presence of kay and/or
value comparison routines that mutate the dict during dict comparison.
Bugfix candidate.
meaning infinity -- but at least warn about it in the code! I pissed
away a couple hours on this today, and don't wish the same on the next
in line.
Bugfix candidate.
means "replace everything". But the string module, string.replace()
amd test_string.py believe a 0 count means "replace nothing".
"Nothing" wins, strop loses.
Bugfix candidate.
Platform blew up on "123".replace("123", ""). Michael Hudson pinned the
blame on platform malloc(0) returning NULL.
This is a candidate for all bugfix releases.
Store floats and doubles to full precision in marshal.
Test that floats read from .pyc/.pyo closely match those read from .py.
Declare PyFloat_AsString() in floatobject header file.
Add new PyFloat_AsReprString() API function.
Document the functions declared in floatobject.h.
d1 == d2 and d1 != d2 now work even if the keys and values in d1 and d2
don't support comparisons other than ==, and testing dicts for equality
is faster now (especially when inequality obtains).
another change (to test_import.py, which simply imports the new file). I'm
checking this piece in now, though, to make it easier to distribute a patch
for x-platform checking.
NEEDS DOC CHANGES.
More AttributeErrors transmuted into TypeErrors, in test_b2.py, and,
again, this strikes me as a good thing.
This checkin completes the iterator generalization work that obviously
needed to be done. Can anyone think of others that should be changed?
safely together and don't duplicate logic (the common logic was factored
out into new private API function _PySequence_IterContains()).
Visible change:
some_complex_number in some_instance
no longer blows up if some_instance has __getitem__ but neither
__contains__ nor __iter__. test_iter changed to ensure that remains true.
NEEDS DOC CHANGES
A few more AttributeErrors turned into TypeErrors, but in test_contains
this time.
The full story for instance objects is pretty much unexplainable, because
instance_contains() tries its own flavor of iteration-based containment
testing first, and PySequence_Contains doesn't get a chance at it unless
instance_contains() blows up. A consequence is that
some_complex_number in some_instance
dies with a TypeError unless some_instance.__class__ defines __iter__ but
does not define __getitem__.
to string.join(), so that when the latter figures out in midstream that
it really needs unicode.join() instead, unicode.join() can actually get
all the sequence elements (i.e., there's no guarantee that the sequence
passed to string.join() can be iterated over *again* by unicode.join(),
so string.join() must not pass on the original sequence object anymore).
NEEDS DOC CHANGES.
This one surprised me! While I expected tuple() to be a no-brainer, turns
out it's actually dripping with consequences:
1. It will *allow* the popular PySequence_Fast() to work with any iterable
object (code for that not yet checked in, but should be trivial).
2. It caused two std tests to fail. This because some places used
PyTuple_Sequence() (the C spelling of tuple()) as an indirect way to test
whether something *is* a sequence. But tuple() code only looked for the
existence of sq->item to determine that, and e.g. an instance passed
that test whether or not it supported the other operations tuple()
needed (e.g., __len__). So some things the tests *expected* to fail
with an AttributeError now fail with a TypeError instead. This looks
like an improvement to me; e.g., test_coercion used to produce 559
TypeErrors and 2 AttributeErrors, and now they're all TypeErrors. The
error details are more informative too, because the places calling this
were *looking* for TypeErrors in order to replace the generic tuple()
"not a sequence" msg with their own more specific text, and
AttributeErrors snuck by that.
NEEDS DOC CHANGES.
Possibly contentious: The first time s.next() yields StopIteration (for
a given map argument s) is the last time map() *tries* s.next(). That
is, if other sequence args are longer, s will never again contribute
anything but None values to the result, even if trying s.next() again
could yield another result. This is the same behavior map() used to have
wrt IndexError, so it's the only way to be wholly backward-compatible.
I'm not a fan of letting StopIteration mean "try again later" anyway.