Turns out Neil didn't intend for *all* of his gen-branch work to get

committed.

tokenize.py:  I like these changes, and have tested them extensively
without even realizing it, so I just updated the docstring and the docs.

tabnanny.py:  Also liked this, but did a little code fiddling.  I should
really rewrite this to *exploit* generators, but that's near the bottom
of my effort/benefit scale so doubt I'll get to it anytime soon (it
would be most useful as a non-trivial example of ideal use of generators;
but test_generators.py has already grown plenty of food-for-thought
examples).

inspect.py:  I'm sure Ping intended for this to continue running even
under 1.5.2, so I reverted this to the last pre-gen-branch version.  The
"bugfix" I checked in in-between was actually repairing a bug *introduced*
by the conversion to generators, so it's OK that the reverted version
doesn't reflect that checkin.
This commit is contained in:
Tim Peters 2001-06-29 23:51:08 +00:00
parent 88e66254f9
commit 4efb6e9643
4 changed files with 79 additions and 47 deletions

View File

@ -12,12 +12,33 @@ source code, implemented in Python. The scanner in this module
returns comments as tokens as well, making it useful for implementing
``pretty-printers,'' including colorizers for on-screen displays.
The scanner is exposed by a single function:
The primary entry point is a generator:
\begin{funcdesc}{generate_tokens}{readline}
The \function{generate_tokens()} generator requires one argment,
\var{readline}, which must be a callable object which
provides the same interface as the \method{readline()} method of
built-in file objects (see section~\ref{bltin-file-objects}). Each
call to the function should return one line of input as a string.
The generator produces 5-tuples with these members:
the token type;
the token string;
a 2-tuple \code{(\var{srow}, \var{scol})} of ints specifying the
row and column where the token begins in the source;
a 2-tuple \code{(\var{erow}, \var{ecol})} of ints specifying the
row and column where the token ends in the source;
and the line on which the token was found.
The line passed is the \emph{logical} line;
continuation lines are included.
\versionadded{2.2}
\end{funcdesc}
An older entry point is retained for backward compatibility:
\begin{funcdesc}{tokenize}{readline\optional{, tokeneater}}
The \function{tokenize()} function accepts two parameters: one
representing the input stream, and one providing an output mechanism
representing the input stream, and one providing an output mechanism
for \function{tokenize()}.
The first parameter, \var{readline}, must be a callable object which
@ -26,17 +47,13 @@ The scanner is exposed by a single function:
call to the function should return one line of input as a string.
The second parameter, \var{tokeneater}, must also be a callable
object. It is called with five parameters: the token type, the
token string, a tuple \code{(\var{srow}, \var{scol})} specifying the
row and column where the token begins in the source, a tuple
\code{(\var{erow}, \var{ecol})} giving the ending position of the
token, and the line on which the token was found. The line passed
is the \emph{logical} line; continuation lines are included.
object. It is called once for each token, with five arguments,
corresponding to the tuples generated by \function{generate_tokens()}.
\end{funcdesc}
All constants from the \refmodule{token} module are also exported from
\module{tokenize}, as are two additional token type values that might be
All constants from the \refmodule{token} module are also exported from
\module{tokenize}, as are two additional token type values that might be
passed to the \var{tokeneater} function by \function{tokenize()}:
\begin{datadesc}{COMMENT}

View File

@ -349,28 +349,32 @@ class ListReader:
return self.lines[i]
else: return ''
class EndOfBlock(Exception): pass
class BlockFinder:
"""Provide a tokeneater() method to detect the end of a code block."""
def __init__(self):
self.indent = 0
self.started = 0
self.last = 0
def tokeneater(self, type, token, (srow, scol), (erow, ecol), line):
if not self.started:
if type == tokenize.NAME: self.started = 1
elif type == tokenize.NEWLINE:
self.last = srow
elif type == tokenize.INDENT:
self.indent = self.indent + 1
elif type == tokenize.DEDENT:
self.indent = self.indent - 1
if self.indent == 0: raise EndOfBlock, self.last
def getblock(lines):
"""Extract the block of code at the top of the given list of lines."""
indent = 0
started = 0
last = 0
tokens = tokenize.generate_tokens(ListReader(lines).readline)
for (type, token, (srow, scol), (erow, ecol), line) in tokens:
if not started:
if type == tokenize.NAME:
started = 1
elif type == tokenize.NEWLINE:
last = srow
elif type == tokenize.INDENT:
indent = indent + 1
elif type == tokenize.DEDENT:
indent = indent - 1
if indent == 0:
return lines[:last]
else:
raise ValueError, "unable to find block"
try:
tokenize.tokenize(ListReader(lines).readline, BlockFinder().tokeneater)
except EndOfBlock, eob:
return lines[:eob.args[0]]
def getsourcelines(object):
"""Return a list of source lines and starting line number for an object.

View File

@ -14,6 +14,8 @@ import os
import sys
import getopt
import tokenize
if not hasattr(tokenize, 'NL'):
raise ValueError("tokenize.NL doesn't exist -- tokenize module too old")
__all__ = ["check"]
@ -243,15 +245,11 @@ def format_witnesses(w):
prefix = prefix + "s"
return prefix + " " + string.join(firsts, ', ')
# Need Guido's enhancement
assert hasattr(tokenize, 'NL'), "tokenize module too old"
def process_tokens(tokens,
INDENT=tokenize.INDENT,
DEDENT=tokenize.DEDENT,
NEWLINE=tokenize.NEWLINE,
JUNK=(tokenize.COMMENT, tokenize.NL)):
def process_tokens(tokens):
INDENT = tokenize.INDENT
DEDENT = tokenize.DEDENT
NEWLINE = tokenize.NEWLINE
JUNK = tokenize.COMMENT, tokenize.NL
indents = [Whitespace("")]
check_equal = 0

View File

@ -1,13 +1,26 @@
"""Tokenization help for Python programs.
This module exports a function called 'tokenize()' that breaks a stream of
generate_tokens(readline) is a generator that breaks a stream of
text into Python tokens. It accepts a readline-like method which is called
repeatedly to get the next line of input (or "" for EOF) and a "token-eater"
function which is called once for each token found. The latter function is
passed the token type, a string containing the token, the starting and
ending (row, column) coordinates of the token, and the original line. It is
designed to match the working of the Python tokenizer exactly, except that
it produces COMMENT tokens for comments and gives type OP for all operators."""
repeatedly to get the next line of input (or "" for EOF). It generates
5-tuples with these members:
the token type (see token.py)
the token (a string)
the starting (row, column) indices of the token (a 2-tuple of ints)
the ending (row, column) indices of the token (a 2-tuple of ints)
the original line (string)
It is designed to match the working of the Python tokenizer exactly, except
that it produces COMMENT tokens for comments and gives type OP for all
operators
Older entry points
tokenize_loop(readline, tokeneater)
tokenize(readline, tokeneater=printtoken)
are the same, except instead of generating tokens, tokeneater is a callback
function to which the 5 fields described above are passed as 5 arguments,
each time a new token is found."""
__author__ = 'Ka-Ping Yee <ping@lfw.org>'
__credits__ = \
@ -111,7 +124,7 @@ def tokenize(readline, tokeneater=printtoken):
except StopTokenizing:
pass
# backwards compatible interface, probably not used
# backwards compatible interface
def tokenize_loop(readline, tokeneater):
for token_info in generate_tokens(readline):
apply(tokeneater, token_info)