Small: Always generate a NL or NEWLINE token following
a COMMENT token. The old code did not generate an NL token if
the comment was on a line by itself.
Large: The output of untokenize() will now match the
input exactly if it is passed the full token sequence. The
old, crufty output is still generated if a limited input
sequence is provided, where limited means that it does not
include position information for tokens.
Remaining bug: There is no CONTINUATION token (\) so there is no way
for untokenize() to handle such code.
Also, expanded the number of doctests in hopes of eventually removing
the old-style tests that compare against a golden file.
Bug fix candidate for Python 2.5.1. (Sigh.)
Should significantly enhance the utility of the module by supporting
the creation of tools that modify the token stream and writeback the
modified result.
the yield statement. I figure we have to have this in before I can
release 2.2a1 on Wednesday.
Note: test_generators is currently broken, I'm counting on Tim to fix
this.
committed.
tokenize.py: I like these changes, and have tested them extensively
without even realizing it, so I just updated the docstring and the docs.
tabnanny.py: Also liked this, but did a little code fiddling. I should
really rewrite this to *exploit* generators, but that's near the bottom
of my effort/benefit scale so doubt I'll get to it anytime soon (it
would be most useful as a non-trivial example of ideal use of generators;
but test_generators.py has already grown plenty of food-for-thought
examples).
inspect.py: I'm sure Ping intended for this to continue running even
under 1.5.2, so I reverted this to the last pre-gen-branch version. The
"bugfix" I checked in in-between was actually repairing a bug *introduced*
by the conversion to generators, so it's OK that the reverted version
doesn't reflect that checkin.
tokenize.py has always used naive regexps for matching string literals,
and that appears to trigger the sre recursion limit on Skip's platform (he
has very long single-line string literals). Replaced all of tokenize.py's
string regexps with the "unrolled" forms used in IDLE, where they're known to
handle even absurd (multi-megabyte!) string literals without trouble. See
Friedl's book for explanation (at heart, the naive regexps create a backtracking
choice point for each character in the literal, while the unrolled forms create
none).
# Ignore now accepts \f as whitespace. Operator now includes '**'.
# Ignore and Special now accept \n or \r\n at the end of a line.
# Imagnumber is new. Expfloat is corrected to reject '0e4'.
* test_grammar.py, testall.out: added test for funny things in string literals
* token.py, symbol.py: definitions used with built-in parser module.
* tokenize.py: added double-quote recognition