Commit Graph

12 Commits

Author SHA1 Message Date
Fred Drake 2a3d7db93e Added character data buffering to pyexpat parser objects.
Setting the buffer_text attribute to true causes the parser to collect
character data, waiting as long as possible to report it to the Python
callback.  This can save an enormous number of callbacks from C to
Python, which can be a substantial performance improvement.

buffer_text defaults to false.
2002-06-28 22:56:48 +00:00
Fred Drake 1add023b88 Integrate the tests for name interning from PyXML (test_pyexpat.py
revision 1.12 in PyXML).
2002-06-27 19:41:51 +00:00
Jeremy Hylton 3c19ec4eab Fix when pyexpat not built
Import pyexpat first so that import error occurs when it is not
available.
2001-07-30 21:47:25 +00:00
Tim Peters 2f228e75e4 Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask".
The comment following used to say:
	/* We use ~hash instead of hash, as degenerate hash functions, such
	   as for ints <sigh>, can have lots of leading zeros. It's not
	   really a performance risk, but better safe than sorry.
	   12-Dec-00 tim:  so ~hash produces lots of leading ones instead --
	   what's the gain? */
That is, there was never a good reason for doing it.  And to the contrary,
as explained on Python-Dev last December, it tended to make the *sum*
(i + incr) & mask (which is the first table index examined in case of
collison) the same "too often" across distinct hashes.

Changing to the simpler "i = hash & mask" reduced the number of string-dict
collisions (== # number of times we go around the lookup for-loop) from about
6 million to 5 million during a full run of the test suite (these are
approximate because the test suite does some random stuff from run to run).
The number of collisions in non-string dicts also decreased, but not as
dramatically.

Note that this may, for a given dict, change the order (wrt previous
releases) of entries exposed by .keys(), .values() and .items().  A number
of std tests suffered bogus failures as a result.  For dicts keyed by
small ints, or (less so) by characters, the order is much more likely to be
in increasing order of key now; e.g.,

>>> d = {}
>>> for i in range(10):
...    d[i] = i
...
>>> d
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
>>>

Unfortunately. people may latch on to that in small examples and draw a
bogus conclusion.

test_support.py
    Moved test_extcall's sortdict() into test_support, made it stronger,
    and imported sortdict into other std tests that needed it.
test_unicode.py
    Excluced cp875 from the "roundtrip over range(128)" test, because
    cp875 doesn't have a well-defined inverse for unicode("?", "cp875").
    See Python-Dev for excruciating details.
Cookie.py
    Chaged various output functions to sort dicts before building
    strings from them.
test_extcall
    Fiddled the expected-result file.  This remains sensitive to native
    dict ordering, because, e.g., if there are multiple errors in a
    keyword-arg dict (and test_extcall sets up many cases like that), the
    specific error Python complains about first depends on native dict
    ordering.
2001-05-13 00:19:31 +00:00
Fred Drake 8f42e2b1fa Update test to accomodate the change to the namespace_separator parameter
of ParserCreate().

Added assignment tests for the ordered_attributes and specified_attributes
values, similar to the checks for the returns_unicode attribute.
2001-04-25 16:03:54 +00:00
Fred Drake 1e0611b208 The "context" parameter to the ExternalEntityRefParameter exposes internal
information from the Expat library that is not part of its public API.
Do not print this information as the format of the string may (and will)
change as Expat evolves.

Add additional tests to make sure the ParserCreate() function raises the
right exceptions on illegal parameters.
2000-12-23 22:12:07 +00:00
Fred Drake 004d5e6880 Make reindent.py happy (convert everything to 4-space indents!). 2000-10-23 17:22:08 +00:00
Fred Drake 7fbc85c5c5 Rename the public interface from "pyexpat" to "xml.parsers.expat". 2000-09-23 04:47:56 +00:00
Fred Drake 265a804af2 Revise the test case for pyexpat to avoid using asserts. Conform better
to the Python style guide, and remove unneeded imports.
2000-09-21 20:32:13 +00:00
Andrew M. Kuchling 7fd7e36b08 Change pyexpat test suite to exercise the .returns_unicode attribute,
parsing the sample data once with 8-bit strings and once with Unicode.
2000-06-27 00:37:25 +00:00
Andrew M. Kuchling e188d52a7e Untabified file to fix problems reported by tabnanny 2000-04-02 05:15:38 +00:00
Andrew M. Kuchling b17664ddf0 Added test case for pyexpat module that tries to exercise all the handlers 2000-03-31 15:44:52 +00:00