Remove second parser module example; it referred to non-readily-available example files, and this kind of discovery is much better done with the AST nowadays anyway.

2010-10-17 10:22:28 +00:00 · 2010-10-17 10:22:28 +00:00 · 047e486c45
parent fc9794a8fc
commit 047e486c45
1 changed files with 2 additions and 333 deletions
--- a/Doc/library/parser.rst
+++ b/Doc/library/parser.rst
@ -317,22 +317,8 @@ ST objects have the following methods:
   Same as ``st2tuple(st, line_info, col_info)``.


-.. _st-examples:
-
-Examples
--------
-
-.. index:: builtin: compile
-
-The parser modules allows operations to be performed on the parse tree of Python
-source code before the :term:`bytecode` is generated, and provides for inspection of the
-parse tree for information gathering purposes. Two examples are presented.  The
-simple example demonstrates emulation of the :func:`compile` built-in function
-and the complex example shows the use of a parse tree for information discovery.
-
-
-Emulation of :func:`compile`
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Example: Emulation of :func:`compile`
+-------------------------------------

 While many useful operations may take place between parsing and bytecode
 generation, the simplest operation is to do nothing.  For this purpose, using
@ -366,320 +352,3 @@ readily available functions::
   def load_expression(source_string):
       st = parser.expr(source_string)
       return st, st.compile()
-
-
-Information Discovery
-^^^^^^^^^^^^^^^^^^^^^
-
-.. index::
-   single: string; documentation
-   single: docstrings
-
-Some applications benefit from direct access to the parse tree.  The remainder
-of this section demonstrates how the parse tree provides access to module
-documentation defined in docstrings without requiring that the code being
-examined be loaded into a running interpreter via :keyword:`import`.  This can
-be very useful for performing analyses of untrusted code.
-
-Generally, the example will demonstrate how the parse tree may be traversed to
-distill interesting information.  Two functions and a set of classes are
-developed which provide programmatic access to high level function and class
-definitions provided by a module.  The classes extract information from the
-parse tree and provide access to the information at a useful semantic level, one
-function provides a simple low-level pattern matching capability, and the other
-function defines a high-level interface to the classes by handling file
-operations on behalf of the caller.  All source files mentioned here which are
-not part of the Python installation are located in the :file:`Demo/parser/`
-directory of the distribution.
-
-The dynamic nature of Python allows the programmer a great deal of flexibility,
-but most modules need only a limited measure of this when defining classes,
-functions, and methods.  In this example, the only definitions that will be
-considered are those which are defined in the top level of their context, e.g.,
-a function defined by a :keyword:`def` statement at column zero of a module, but
-not a function defined within a branch of an :keyword:`if` ... :keyword:`else`
-construct, though there are some good reasons for doing so in some situations.
-Nesting of definitions will be handled by the code developed in the example.
-
-To construct the upper-level extraction methods, we need to know what the parse
-tree structure looks like and how much of it we actually need to be concerned
-about.  Python uses a moderately deep parse tree so there are a large number of
-intermediate nodes.  It is important to read and understand the formal grammar
-used by Python.  This is specified in the file :file:`Grammar/Grammar` in the
-distribution. Consider the simplest case of interest when searching for
-docstrings: a module consisting of a docstring and nothing else.  (See file
-:file:`docstring.py`.) ::
-
-   """Some documentation.
-   """
-
-Using the interpreter to take a look at the parse tree, we find a bewildering
-mass of numbers and parentheses, with the documentation buried deep in nested
-tuples. ::
-
-   >>> import parser
-   >>> import pprint
-   >>> st = parser.suite(open('docstring.py').read())
-   >>> tup = st.totuple()
-   >>> pprint.pprint(tup)
-   (257,
-    (264,
-     (265,
-      (266,
-       (267,
-        (307,
-         (287,
-          (288,
-           (289,
-            (290,
-             (292,
-              (293,
-               (294,
-                (295,
-                 (296,
-                  (297,
-                   (298,
-                    (299,
-                     (300, (3, '"""Some documentation.\n"""'))))))))))))))))),
-      (4, ''))),
-    (4, ''),
-    (0, ''))
-
-The numbers at the first element of each node in the tree are the node types;
-they map directly to terminal and non-terminal symbols in the grammar.
-Unfortunately, they are represented as integers in the internal representation,
-and the Python structures generated do not change that.  However, the
-:mod:`symbol` and :mod:`token` modules provide symbolic names for the node types
-and dictionaries which map from the integers to the symbolic names for the node
-types.
-
-In the output presented above, the outermost tuple contains four elements: the
-integer ``257`` and three additional tuples.  Node type ``257`` has the symbolic
-name :const:`file_input`.  Each of these inner tuples contains an integer as the
-first element; these integers, ``264``, ``4``, and ``0``, represent the node
-types :const:`stmt`, :const:`NEWLINE`, and :const:`ENDMARKER`, respectively.
-Note that these values may change depending on the version of Python you are
-using; consult :file:`symbol.py` and :file:`token.py` for details of the
-mapping.  It should be fairly clear that the outermost node is related primarily
-to the input source rather than the contents of the file, and may be disregarded
-for the moment.  The :const:`stmt` node is much more interesting.  In
-particular, all docstrings are found in subtrees which are formed exactly as
-this node is formed, with the only difference being the string itself.  The
-association between the docstring in a similar tree and the defined entity
-(class, function, or module) which it describes is given by the position of the
-docstring subtree within the tree defining the described structure.
-
-By replacing the actual docstring with something to signify a variable component
-of the tree, we allow a simple pattern matching approach to check any given
-subtree for equivalence to the general pattern for docstrings.  Since the
-example demonstrates information extraction, we can safely require that the tree
-be in tuple form rather than list form, allowing a simple variable
-representation to be ``['variable_name']``.  A simple recursive function can
-implement the pattern matching, returning a Boolean and a dictionary of variable
-name to value mappings.  (See file :file:`example.py`.) ::
-
-   def match(pattern, data, vars=None):
-       if vars is None:
-           vars = {}
-       if isinstance(pattern, list):
-           vars[pattern[0]] = data
-           return True, vars
-       if not instance(pattern, tuple):
-           return (pattern == data), vars
-       if len(data) != len(pattern):
-           return False, vars
-       for pattern, data in zip(pattern, data):
-           same, vars = match(pattern, data, vars)
-           if not same:
-               break
-       return same, vars
-
-Using this simple representation for syntactic variables and the symbolic node
-types, the pattern for the candidate docstring subtrees becomes fairly readable.
-(See file :file:`example.py`.) ::
-
-   import symbol
-   import token
-
-   DOCSTRING_STMT_PATTERN = (
-       symbol.stmt,
-       (symbol.simple_stmt,
-        (symbol.small_stmt,
-         (symbol.expr_stmt,
-          (symbol.testlist,
-           (symbol.test,
-            (symbol.and_test,
-             (symbol.not_test,
-              (symbol.comparison,
-               (symbol.expr,
-                (symbol.xor_expr,
-                 (symbol.and_expr,
-                  (symbol.shift_expr,
-                   (symbol.arith_expr,
-                    (symbol.term,
-                     (symbol.factor,
-                      (symbol.power,
-                       (symbol.atom,
-                        (token.STRING, ['docstring'])
-                        )))))))))))))))),
-        (token.NEWLINE, '')
-        ))
-
-Using the :func:`match` function with this pattern, extracting the module
-docstring from the parse tree created previously is easy::
-
-   >>> found, vars = match(DOCSTRING_STMT_PATTERN, tup[1])
-   >>> found
-   True
-   >>> vars
-   {'docstring': '"""Some documentation.\n"""'}
-
-Once specific data can be extracted from a location where it is expected, the
-question of where information can be expected needs to be answered.  When
-dealing with docstrings, the answer is fairly simple: the docstring is the first
-:const:`stmt` node in a code block (:const:`file_input` or :const:`suite` node
-types).  A module consists of a single :const:`file_input` node, and class and
-function definitions each contain exactly one :const:`suite` node.  Classes and
-functions are readily identified as subtrees of code block nodes which start
-with ``(stmt, (compound_stmt, (classdef, ...`` or ``(stmt, (compound_stmt,
-(funcdef, ...``.  Note that these subtrees cannot be matched by :func:`match`
-since it does not support multiple sibling nodes to match without regard to
-number.  A more elaborate matching function could be used to overcome this
-limitation, but this is sufficient for the example.
-
-Given the ability to determine whether a statement might be a docstring and
-extract the actual string from the statement, some work needs to be performed to
-walk the parse tree for an entire module and extract information about the names
-defined in each context of the module and associate any docstrings with the
-names.  The code to perform this work is not complicated, but bears some
-explanation.
-
-The public interface to the classes is straightforward and should probably be
-somewhat more flexible.  Each "major" block of the module is described by an
-object providing several methods for inquiry and a constructor which accepts at
-least the subtree of the complete parse tree which it represents.  The
-:class:`ModuleInfo` constructor accepts an optional *name* parameter since it
-cannot otherwise determine the name of the module.
-
-The public classes include :class:`ClassInfo`, :class:`FunctionInfo`, and
-:class:`ModuleInfo`.  All objects provide the methods :meth:`get_name`,
-:meth:`get_docstring`, :meth:`get_class_names`, and :meth:`get_class_info`.  The
-:class:`ClassInfo` objects support :meth:`get_method_names` and
-:meth:`get_method_info` while the other classes provide
-:meth:`get_function_names` and :meth:`get_function_info`.
-
-Within each of the forms of code block that the public classes represent, most
-of the required information is in the same form and is accessed in the same way,
-with classes having the distinction that functions defined at the top level are
-referred to as "methods." Since the difference in nomenclature reflects a real
-semantic distinction from functions defined outside of a class, the
-implementation needs to maintain the distinction. Hence, most of the
-functionality of the public classes can be implemented in a common base class,
-:class:`SuiteInfoBase`, with the accessors for function and method information
-provided elsewhere. Note that there is only one class which represents function
-and method information; this parallels the use of the :keyword:`def` statement
-to define both types of elements.
-
-Most of the accessor functions are declared in :class:`SuiteInfoBase` and do not
-need to be overridden by subclasses.  More importantly, the extraction of most
-information from a parse tree is handled through a method called by the
-:class:`SuiteInfoBase` constructor.  The example code for most of the classes is
-clear when read alongside the formal grammar, but the method which recursively
-creates new information objects requires further examination.  Here is the
-relevant part of the :class:`SuiteInfoBase` definition from :file:`example.py`::
-
-   class SuiteInfoBase:
-       _docstring = ''
-       _name = ''
-
-       def __init__(self, tree = None):
-           self._class_info = {}
-           self._function_info = {}
-           if tree:
-               self._extract_info(tree)
-
-       def _extract_info(self, tree):
-           # extract docstring
-           if len(tree) == 2:
-               found, vars = match(DOCSTRING_STMT_PATTERN[1], tree[1])
-           else:
-               found, vars = match(DOCSTRING_STMT_PATTERN, tree[3])
-           if found:
-               self._docstring = eval(vars['docstring'])
-           # discover inner definitions
-           for node in tree[1:]:
-               found, vars = match(COMPOUND_STMT_PATTERN, node)
-               if found:
-                   cstmt = vars['compound']
-                   if cstmt[0] == symbol.funcdef:
-                       name = cstmt[2][1]
-                       self._function_info[name] = FunctionInfo(cstmt)
-                   elif cstmt[0] == symbol.classdef:
-                       name = cstmt[2][1]
-                       self._class_info[name] = ClassInfo(cstmt)
-
-After initializing some internal state, the constructor calls the
-:meth:`_extract_info` method.  This method performs the bulk of the information
-extraction which takes place in the entire example.  The extraction has two
-distinct phases: the location of the docstring for the parse tree passed in, and
-the discovery of additional definitions within the code block represented by the
-parse tree.
-
-The initial :keyword:`if` test determines whether the nested suite is of the
-"short form" or the "long form."  The short form is used when the code block is
-on the same line as the definition of the code block, as in ::
-
-   def square(x): "Square an argument."; return x ** 2
-
-while the long form uses an indented block and allows nested definitions::
-
-   def make_power(exp):
-       "Make a function that raises an argument to the exponent `exp`."
-       def raiser(x, y=exp):
-           return x ** y
-       return raiser
-
-When the short form is used, the code block may contain a docstring as the
-first, and possibly only, :const:`small_stmt` element.  The extraction of such a
-docstring is slightly different and requires only a portion of the complete
-pattern used in the more common case.  As implemented, the docstring will only
-be found if there is only one :const:`small_stmt` node in the
-:const:`simple_stmt` node. Since most functions and methods which use the short
-form do not provide a docstring, this may be considered sufficient.  The
-extraction of the docstring proceeds using the :func:`match` function as
-described above, and the value of the docstring is stored as an attribute of the
-:class:`SuiteInfoBase` object.
-
-After docstring extraction, a simple definition discovery algorithm operates on
-the :const:`stmt` nodes of the :const:`suite` node.  The special case of the
-short form is not tested; since there are no :const:`stmt` nodes in the short
-form, the algorithm will silently skip the single :const:`simple_stmt` node and
-correctly not discover any nested definitions.
-
-Each statement in the code block is categorized as a class definition, function
-or method definition, or something else.  For the definition statements, the
-name of the element defined is extracted and a representation object appropriate
-to the definition is created with the defining subtree passed as an argument to
-the constructor.  The representation objects are stored in instance variables
-and may be retrieved by name using the appropriate accessor methods.
-
-The public classes provide any accessors required which are more specific than
-those provided by the :class:`SuiteInfoBase` class, but the real extraction
-algorithm remains common to all forms of code blocks.  A high-level function can
-be used to extract the complete set of information from a source file.  (See
-file :file:`example.py`.) ::
-
-   def get_docs(fileName):
-       import os
-       import parser
-
-       source = open(fileName).read()
-       basename = os.path.basename(os.path.splitext(fileName)[0])
-       st = parser.suite(source)
-       return ModuleInfo(st.totuple(), basename)
-
-This provides an easy-to-use interface to the documentation of a module.  If
-information is required which is not extracted by the code of this example, the
-code may be extended at clearly defined points to provide additional
-capabilities.
-