Turns out Neil didn't intend for *all* of his gen-branch work to get

committed.

tokenize.py:  I like these changes, and have tested them extensively
without even realizing it, so I just updated the docstring and the docs.

tabnanny.py:  Also liked this, but did a little code fiddling.  I should
really rewrite this to *exploit* generators, but that's near the bottom
of my effort/benefit scale so doubt I'll get to it anytime soon (it
would be most useful as a non-trivial example of ideal use of generators;
but test_generators.py has already grown plenty of food-for-thought
examples).

inspect.py:  I'm sure Ping intended for this to continue running even
under 1.5.2, so I reverted this to the last pre-gen-branch version.  The
"bugfix" I checked in in-between was actually repairing a bug *introduced*
by the conversion to generators, so it's OK that the reverted version
doesn't reflect that checkin.
This commit is contained in:
Tim Peters 2001-06-29 23:51:08 +00:00
parent 88e66254f9
commit 4efb6e9643
4 changed files with 79 additions and 47 deletions

View File

@ -12,12 +12,33 @@ source code, implemented in Python. The scanner in this module
returns comments as tokens as well, making it useful for implementing returns comments as tokens as well, making it useful for implementing
``pretty-printers,'' including colorizers for on-screen displays. ``pretty-printers,'' including colorizers for on-screen displays.
The scanner is exposed by a single function: The primary entry point is a generator:
\begin{funcdesc}{generate_tokens}{readline}
The \function{generate_tokens()} generator requires one argment,
\var{readline}, which must be a callable object which
provides the same interface as the \method{readline()} method of
built-in file objects (see section~\ref{bltin-file-objects}). Each
call to the function should return one line of input as a string.
The generator produces 5-tuples with these members:
the token type;
the token string;
a 2-tuple \code{(\var{srow}, \var{scol})} of ints specifying the
row and column where the token begins in the source;
a 2-tuple \code{(\var{erow}, \var{ecol})} of ints specifying the
row and column where the token ends in the source;
and the line on which the token was found.
The line passed is the \emph{logical} line;
continuation lines are included.
\versionadded{2.2}
\end{funcdesc}
An older entry point is retained for backward compatibility:
\begin{funcdesc}{tokenize}{readline\optional{, tokeneater}} \begin{funcdesc}{tokenize}{readline\optional{, tokeneater}}
The \function{tokenize()} function accepts two parameters: one The \function{tokenize()} function accepts two parameters: one
representing the input stream, and one providing an output mechanism representing the input stream, and one providing an output mechanism
for \function{tokenize()}. for \function{tokenize()}.
The first parameter, \var{readline}, must be a callable object which The first parameter, \var{readline}, must be a callable object which
@ -26,17 +47,13 @@ The scanner is exposed by a single function:
call to the function should return one line of input as a string. call to the function should return one line of input as a string.
The second parameter, \var{tokeneater}, must also be a callable The second parameter, \var{tokeneater}, must also be a callable
object. It is called with five parameters: the token type, the object. It is called once for each token, with five arguments,
token string, a tuple \code{(\var{srow}, \var{scol})} specifying the corresponding to the tuples generated by \function{generate_tokens()}.
row and column where the token begins in the source, a tuple
\code{(\var{erow}, \var{ecol})} giving the ending position of the
token, and the line on which the token was found. The line passed
is the \emph{logical} line; continuation lines are included.
\end{funcdesc} \end{funcdesc}
All constants from the \refmodule{token} module are also exported from All constants from the \refmodule{token} module are also exported from
\module{tokenize}, as are two additional token type values that might be \module{tokenize}, as are two additional token type values that might be
passed to the \var{tokeneater} function by \function{tokenize()}: passed to the \var{tokeneater} function by \function{tokenize()}:
\begin{datadesc}{COMMENT} \begin{datadesc}{COMMENT}

View File

@ -349,28 +349,32 @@ class ListReader:
return self.lines[i] return self.lines[i]
else: return '' else: return ''
class EndOfBlock(Exception): pass
class BlockFinder:
"""Provide a tokeneater() method to detect the end of a code block."""
def __init__(self):
self.indent = 0
self.started = 0
self.last = 0
def tokeneater(self, type, token, (srow, scol), (erow, ecol), line):
if not self.started:
if type == tokenize.NAME: self.started = 1
elif type == tokenize.NEWLINE:
self.last = srow
elif type == tokenize.INDENT:
self.indent = self.indent + 1
elif type == tokenize.DEDENT:
self.indent = self.indent - 1
if self.indent == 0: raise EndOfBlock, self.last
def getblock(lines): def getblock(lines):
"""Extract the block of code at the top of the given list of lines.""" """Extract the block of code at the top of the given list of lines."""
try:
indent = 0 tokenize.tokenize(ListReader(lines).readline, BlockFinder().tokeneater)
started = 0 except EndOfBlock, eob:
last = 0 return lines[:eob.args[0]]
tokens = tokenize.generate_tokens(ListReader(lines).readline)
for (type, token, (srow, scol), (erow, ecol), line) in tokens:
if not started:
if type == tokenize.NAME:
started = 1
elif type == tokenize.NEWLINE:
last = srow
elif type == tokenize.INDENT:
indent = indent + 1
elif type == tokenize.DEDENT:
indent = indent - 1
if indent == 0:
return lines[:last]
else:
raise ValueError, "unable to find block"
def getsourcelines(object): def getsourcelines(object):
"""Return a list of source lines and starting line number for an object. """Return a list of source lines and starting line number for an object.

View File

@ -14,6 +14,8 @@ import os
import sys import sys
import getopt import getopt
import tokenize import tokenize
if not hasattr(tokenize, 'NL'):
raise ValueError("tokenize.NL doesn't exist -- tokenize module too old")
__all__ = ["check"] __all__ = ["check"]
@ -243,15 +245,11 @@ def format_witnesses(w):
prefix = prefix + "s" prefix = prefix + "s"
return prefix + " " + string.join(firsts, ', ') return prefix + " " + string.join(firsts, ', ')
# Need Guido's enhancement def process_tokens(tokens):
assert hasattr(tokenize, 'NL'), "tokenize module too old" INDENT = tokenize.INDENT
DEDENT = tokenize.DEDENT
def process_tokens(tokens, NEWLINE = tokenize.NEWLINE
INDENT=tokenize.INDENT, JUNK = tokenize.COMMENT, tokenize.NL
DEDENT=tokenize.DEDENT,
NEWLINE=tokenize.NEWLINE,
JUNK=(tokenize.COMMENT, tokenize.NL)):
indents = [Whitespace("")] indents = [Whitespace("")]
check_equal = 0 check_equal = 0

View File

@ -1,13 +1,26 @@
"""Tokenization help for Python programs. """Tokenization help for Python programs.
This module exports a function called 'tokenize()' that breaks a stream of generate_tokens(readline) is a generator that breaks a stream of
text into Python tokens. It accepts a readline-like method which is called text into Python tokens. It accepts a readline-like method which is called
repeatedly to get the next line of input (or "" for EOF) and a "token-eater" repeatedly to get the next line of input (or "" for EOF). It generates
function which is called once for each token found. The latter function is 5-tuples with these members:
passed the token type, a string containing the token, the starting and
ending (row, column) coordinates of the token, and the original line. It is the token type (see token.py)
designed to match the working of the Python tokenizer exactly, except that the token (a string)
it produces COMMENT tokens for comments and gives type OP for all operators.""" the starting (row, column) indices of the token (a 2-tuple of ints)
the ending (row, column) indices of the token (a 2-tuple of ints)
the original line (string)
It is designed to match the working of the Python tokenizer exactly, except
that it produces COMMENT tokens for comments and gives type OP for all
operators
Older entry points
tokenize_loop(readline, tokeneater)
tokenize(readline, tokeneater=printtoken)
are the same, except instead of generating tokens, tokeneater is a callback
function to which the 5 fields described above are passed as 5 arguments,
each time a new token is found."""
__author__ = 'Ka-Ping Yee <ping@lfw.org>' __author__ = 'Ka-Ping Yee <ping@lfw.org>'
__credits__ = \ __credits__ = \
@ -111,7 +124,7 @@ def tokenize(readline, tokeneater=printtoken):
except StopTokenizing: except StopTokenizing:
pass pass
# backwards compatible interface, probably not used # backwards compatible interface
def tokenize_loop(readline, tokeneater): def tokenize_loop(readline, tokeneater):
for token_info in generate_tokens(readline): for token_info in generate_tokens(readline):
apply(tokeneater, token_info) apply(tokeneater, token_info)