Just another intermediate version...

This commit is contained in:
Guido van Rossum 1992-01-16 17:49:21 +00:00
parent 1c462adaa8
commit 7b632a6073
2 changed files with 178 additions and 78 deletions

View File

@ -1,5 +1,5 @@
% Format this file with latex. % Format this file with latex.
\documentstyle[myformat]{report} \documentstyle[myformat]{report}
\title{\bf \title{\bf
@ -65,17 +65,18 @@ rather than formal specifications for everything except syntax and
lexical analysis. This should make the document better understandable lexical analysis. This should make the document better understandable
to the average reader, but will leave room for ambiguities. to the average reader, but will leave room for ambiguities.
Consequently, if you were coming from Mars and tried to re-implement Consequently, if you were coming from Mars and tried to re-implement
Python from this document alone, you might in fact be implementing Python from this document alone, you might have to guess things and in
quite a different language. On the other hand, if you are using fact you would be implementing quite a different language.
On the other hand, if you are using
Python and wonder what the precise rules about a particular area of Python and wonder what the precise rules about a particular area of
the language are, you should be able to find it here. the language are, you should definitely be able to find it here.
It is dangerous to add too many implementation details to a language It is dangerous to add too many implementation details to a language
reference document -- the implementation may change, and other reference document -- the implementation may change, and other
implementations of the same language may work differently. On the implementations of the same language may work differently. On the
other hand, there is currently only one Python implementation, and other hand, there is currently only one Python implementation, and
particular quirks of it are sometimes worth mentioning, especially its particular quirks are sometimes worth being mentioned, especially
where it differs from the ``ideal'' specification. where the implementation imposes additional limitations.
Every Python implementation comes with a number of built-in and Every Python implementation comes with a number of built-in and
standard modules. These are not documented here, but in the separate standard modules. These are not documented here, but in the separate
@ -93,20 +94,20 @@ name: lcletter (lcletter | "_")*
lcletter: "a"..."z" lcletter: "a"..."z"
\end{verbatim} \end{verbatim}
The first line says that a \verb\name\ is a \verb\lcletter\ followed by The first line says that a \verb\name\ is an \verb\lcletter\ followed by
a sequence of zero or more \verb\lcletter\s and underscores. A a sequence of zero or more \verb\lcletter\s and underscores. An
\verb\lcletter\ in turn is any of the single characters `a' through `z'. \verb\lcletter\ in turn is any of the single characters `a' through `z'.
(This rule is actually adhered to for the names defined in syntax and (This rule is actually adhered to for the names defined in syntax and
grammar rules in this document.) grammar rules in this document.)
Each rule begins with a name (which is the name defined by the rule) Each rule begins with a name (which is the name defined by the rule)
followed by a colon. Each rule is wholly contained on one line. A and a colon, and is wholly contained on one line. A vertical bar
vertical bar (\verb\|\) is used to separate alternatives, it is the (\verb\|\) is used to separate alternatives; it is the least binding
least binding operator in this notation. A star (\verb\*\) means zero operator in this notation. A star (\verb\*\) means zero or more
or more repetitions of the preceding item; likewise, a plus (\verb\+\) repetitions of the preceding item; likewise, a plus (\verb\+\) means
means one or more repetitions and a question mark (\verb\?\) zero or one or more repetitions, and a question mark (\verb\?\) zero or one
one (in other words, the preceding item is optional). These three (in other words, the preceding item is optional). These three
operators bind as tight as possible; parentheses are used for operators bind as tightly as possible; parentheses are used for
grouping. Literal strings are enclosed in double quotes. White space grouping. Literal strings are enclosed in double quotes. White space
is only meaningful to separate tokens. is only meaningful to separate tokens.
@ -117,7 +118,7 @@ characters. A phrase between angular brackets (\verb\<...>\) gives an
informal description of the symbol defined; e.g., this could be used informal description of the symbol defined; e.g., this could be used
to describe the notion of `control character' if needed. to describe the notion of `control character' if needed.
Although the notation used is almost the same, there is a big Even though the notation used is almost the same, there is a big
difference between the meaning of lexical and syntactic definitions: difference between the meaning of lexical and syntactic definitions:
a lexical definition operates on the individual characters of the a lexical definition operates on the individual characters of the
input source, while a syntax definition operates on the stream of input source, while a syntax definition operates on the stream of
@ -131,22 +132,22 @@ chapter describes how the lexical analyzer breaks a file into tokens.
\section{Line structure} \section{Line structure}
A Python program is divided in a number of logical lines. Statements A Python program is divided in a number of logical lines. The end of
do not straddle logical line boundaries except where explicitly a logical line is represented by the token NEWLINE. Statements cannot
indicated by the syntax (i.e., for compound statements). To this cross logical line boundaries except where NEWLINE is allowed by the
purpose, the end of a logical line is represented by the token syntax (e.g., between statements in compound statements).
NEWLINE.
\subsection{Comments} \subsection{Comments}
A comment starts with a hash character (\verb\#\) that is not part of A comment starts with a hash character (\verb\#\) that is not part of
a string literal, and ends at the end of the physical line. Comments a string literal, and ends at the end of the physical line. A comment
are ignored by the syntax. always signifies the end of the logical line. Comments are ignored by
the syntax.
\subsection{Line joining} \subsection{Line joining}
Two or more physical lines may be joined into logical lines using Two or more physical lines may be joined into logical lines using
backslash characters (\verb/\/), as follows: When physical line ends backslash characters (\verb/\/), as follows: when a physical line ends
in a backslash that is not part of a string literal or comment, it is in a backslash that is not part of a string literal or comment, it is
joined with the following forming a single logical line, deleting the joined with the following forming a single logical line, deleting the
backslash and the following end-of-line character. backslash and the following end-of-line character.
@ -160,13 +161,14 @@ terminates a multi-line statement.
\subsection{Indentation} \subsection{Indentation}
Spaces and tabs at the beginning of a logical line are used to compute Leading whitespace (spaces and tabs) at the beginning of a logical
the indentation level of the line, which in turn is used to determine line is used to compute the indentation level of the line, which in
the grouping of statements. turn is used to determine the grouping of statements.
First, each tab is replaced by one to eight spaces such that the total First, tabs are replaced (from left to right) by one to eight spaces
number of spaces up to that point is a multiple of eight. The total such that the total number of characters up to there is a multiple of
number of spaces preceding the first non-blank character then eight (this is intended to be the same rule as used by UNIX). The
total number of spaces preceding the first non-blank character then
determines the line's indentation. Indentation cannot be split over determines the line's indentation. Indentation cannot be split over
multiple physical lines using backslashes. multiple physical lines using backslashes.
@ -185,6 +187,38 @@ popped off, and for each number popped off a DEDENT token is
generated. At the end of the file, a DEDENT token is generated for generated. At the end of the file, a DEDENT token is generated for
each number remaining on the stack that is larger than zero. each number remaining on the stack that is larger than zero.
Here is an example of a correctly (though confusingly) indented piece
of Python code:
\begin{verbatim}
def perm(l):
if len(l) <= 1:
return [l]
r = []
for i in range(len(l)):
s = l[:i] + l[i+1:]
p = perm(s)
for x in p:
r.append(l[i:i+1] + x)
return r
\end{verbatim}
The following example shows various indentation errors:
\begin{verbatim}
def perm(l): # error: first line indented
for i in range(len(l)): # error: not indented
s = l[:i] + l[i+1:]
p = perm(l[:i] + l[i+1:]) # error: unexpected indent
for x in p:
r.append(l[i:i+1] + x)
return r # error: inconsistent indent
\end{verbatim}
(Actually, the first three errors are detected by the parser; only the
last error is found by the lexical analyzer -- the indentation of
\verb\return r\ does not match a level popped off the stack.)
\section{Other tokens} \section{Other tokens}
Besides NEWLINE, INDENT and DEDENT, the following categories of tokens Besides NEWLINE, INDENT and DEDENT, the following categories of tokens
@ -205,12 +239,13 @@ uppercase: "A"..."Z"
digit: "0"..."9" digit: "0"..."9"
\end{verbatim} \end{verbatim}
Identifiers are unlimited in length. Case is significant. Identifiers are unlimited in length. Case is significant. Keywords
are not identifiers.
\section{Keywords} \section{Keywords}
The following identifiers are used as reserved words, or {\em The following identifiers are used as reserved words, or {\em
keywords} of the language, and may not be used as ordinary keywords} of the language, and cannot be used as ordinary
identifiers. They must be spelled exactly as written here: identifiers. They must be spelled exactly as written here:
\begin{verbatim} \begin{verbatim}
@ -260,7 +295,7 @@ are:
\verb/\'/ & Single quote (\verb/'/) \\ \verb/\'/ & Single quote (\verb/'/) \\
\verb/\a/ & ASCII Bell (BEL) \\ \verb/\a/ & ASCII Bell (BEL) \\
\verb/\b/ & ASCII Backspace (BS) \\ \verb/\b/ & ASCII Backspace (BS) \\
\verb/\E/ & ASCII Escape (ESC) \\ %\verb/\E/ & ASCII Escape (ESC) \\
\verb/\f/ & ASCII Formfeed (FF) \\ \verb/\f/ & ASCII Formfeed (FF) \\
\verb/\n/ & ASCII Linefeed (LF) \\ \verb/\n/ & ASCII Linefeed (LF) \\
\verb/\r/ & ASCII Carriage Return (CR) \\ \verb/\r/ & ASCII Carriage Return (CR) \\
@ -272,13 +307,13 @@ are:
\end{tabular} \end{tabular}
\end{center} \end{center}
For compatibility with in Standard C, up to three octal digits are In strict compatibility with in Standard C, up to three octal digits are
accepted, but an unlimited number of hex digits is taken to be part of accepted, but an unlimited number of hex digits is taken to be part of
the hex escape (and then the lower 8 bits of the resulting hex number the hex escape (and then the lower 8 bits of the resulting hex number
are used...). are used in all current implementations...).
All unrecognized escape sequences are left in the string {\em All unrecognized escape sequences are left in the string unchanged,
unchanged}, i.e., the backslash is left in the string. (This rule is i.e., {\em the backslash is left in the string.} (This rule is
useful when debugging: if an escape sequence is mistyped, the useful when debugging: if an escape sequence is mistyped, the
resulting output is more easily recognized as broken. It also helps a resulting output is more easily recognized as broken. It also helps a
great deal for string literals used as regular expressions or great deal for string literals used as regular expressions or
@ -313,6 +348,18 @@ fraction: "." digit+
exponent: ("e"|"E") ["+"|"-"] digit+ exponent: ("e"|"E") ["+"|"-"] digit+
\end{verbatim} \end{verbatim}
Some examples of numeric literals:
\begin{verbatim}
1 1234567890 0177777 0x80000
\end{verbatim}
Note that the definitions for literals do not include a sign; a phrase
like \verb\-1\ is actually an expression composed of the operator
\verb\-\ and the literal \verb\1\.
\section{Operators} \section{Operators}
The following tokens are operators: The following tokens are operators:
@ -336,13 +383,16 @@ meaning:
; , : . ` = ; , : . ` =
\end{verbatim} \end{verbatim}
The following printing ASCII characters are currently not used; The following printing ASCII characters are not used in Python (except
their occurrence is an unconditional error: in string literals and in comments). Their occurrence is an
unconditional error:
\begin{verbatim} \begin{verbatim}
! @ $ " ? ! @ $ " ?
\end{verbatim} \end{verbatim}
They may be used by future versions of the language though!
\chapter{Execution model} \chapter{Execution model}
(XXX This chapter should explain the general model of the execution of (XXX This chapter should explain the general model of the execution of

View File

@ -1,5 +1,5 @@
% Format this file with latex. % Format this file with latex.
\documentstyle[myformat]{report} \documentstyle[myformat]{report}
\title{\bf \title{\bf
@ -65,17 +65,18 @@ rather than formal specifications for everything except syntax and
lexical analysis. This should make the document better understandable lexical analysis. This should make the document better understandable
to the average reader, but will leave room for ambiguities. to the average reader, but will leave room for ambiguities.
Consequently, if you were coming from Mars and tried to re-implement Consequently, if you were coming from Mars and tried to re-implement
Python from this document alone, you might in fact be implementing Python from this document alone, you might have to guess things and in
quite a different language. On the other hand, if you are using fact you would be implementing quite a different language.
On the other hand, if you are using
Python and wonder what the precise rules about a particular area of Python and wonder what the precise rules about a particular area of
the language are, you should be able to find it here. the language are, you should definitely be able to find it here.
It is dangerous to add too many implementation details to a language It is dangerous to add too many implementation details to a language
reference document -- the implementation may change, and other reference document -- the implementation may change, and other
implementations of the same language may work differently. On the implementations of the same language may work differently. On the
other hand, there is currently only one Python implementation, and other hand, there is currently only one Python implementation, and
particular quirks of it are sometimes worth mentioning, especially its particular quirks are sometimes worth being mentioned, especially
where it differs from the ``ideal'' specification. where the implementation imposes additional limitations.
Every Python implementation comes with a number of built-in and Every Python implementation comes with a number of built-in and
standard modules. These are not documented here, but in the separate standard modules. These are not documented here, but in the separate
@ -93,20 +94,20 @@ name: lcletter (lcletter | "_")*
lcletter: "a"..."z" lcletter: "a"..."z"
\end{verbatim} \end{verbatim}
The first line says that a \verb\name\ is a \verb\lcletter\ followed by The first line says that a \verb\name\ is an \verb\lcletter\ followed by
a sequence of zero or more \verb\lcletter\s and underscores. A a sequence of zero or more \verb\lcletter\s and underscores. An
\verb\lcletter\ in turn is any of the single characters `a' through `z'. \verb\lcletter\ in turn is any of the single characters `a' through `z'.
(This rule is actually adhered to for the names defined in syntax and (This rule is actually adhered to for the names defined in syntax and
grammar rules in this document.) grammar rules in this document.)
Each rule begins with a name (which is the name defined by the rule) Each rule begins with a name (which is the name defined by the rule)
followed by a colon. Each rule is wholly contained on one line. A and a colon, and is wholly contained on one line. A vertical bar
vertical bar (\verb\|\) is used to separate alternatives, it is the (\verb\|\) is used to separate alternatives; it is the least binding
least binding operator in this notation. A star (\verb\*\) means zero operator in this notation. A star (\verb\*\) means zero or more
or more repetitions of the preceding item; likewise, a plus (\verb\+\) repetitions of the preceding item; likewise, a plus (\verb\+\) means
means one or more repetitions and a question mark (\verb\?\) zero or one or more repetitions, and a question mark (\verb\?\) zero or one
one (in other words, the preceding item is optional). These three (in other words, the preceding item is optional). These three
operators bind as tight as possible; parentheses are used for operators bind as tightly as possible; parentheses are used for
grouping. Literal strings are enclosed in double quotes. White space grouping. Literal strings are enclosed in double quotes. White space
is only meaningful to separate tokens. is only meaningful to separate tokens.
@ -117,7 +118,7 @@ characters. A phrase between angular brackets (\verb\<...>\) gives an
informal description of the symbol defined; e.g., this could be used informal description of the symbol defined; e.g., this could be used
to describe the notion of `control character' if needed. to describe the notion of `control character' if needed.
Although the notation used is almost the same, there is a big Even though the notation used is almost the same, there is a big
difference between the meaning of lexical and syntactic definitions: difference between the meaning of lexical and syntactic definitions:
a lexical definition operates on the individual characters of the a lexical definition operates on the individual characters of the
input source, while a syntax definition operates on the stream of input source, while a syntax definition operates on the stream of
@ -131,22 +132,22 @@ chapter describes how the lexical analyzer breaks a file into tokens.
\section{Line structure} \section{Line structure}
A Python program is divided in a number of logical lines. Statements A Python program is divided in a number of logical lines. The end of
do not straddle logical line boundaries except where explicitly a logical line is represented by the token NEWLINE. Statements cannot
indicated by the syntax (i.e., for compound statements). To this cross logical line boundaries except where NEWLINE is allowed by the
purpose, the end of a logical line is represented by the token syntax (e.g., between statements in compound statements).
NEWLINE.
\subsection{Comments} \subsection{Comments}
A comment starts with a hash character (\verb\#\) that is not part of A comment starts with a hash character (\verb\#\) that is not part of
a string literal, and ends at the end of the physical line. Comments a string literal, and ends at the end of the physical line. A comment
are ignored by the syntax. always signifies the end of the logical line. Comments are ignored by
the syntax.
\subsection{Line joining} \subsection{Line joining}
Two or more physical lines may be joined into logical lines using Two or more physical lines may be joined into logical lines using
backslash characters (\verb/\/), as follows: When physical line ends backslash characters (\verb/\/), as follows: when a physical line ends
in a backslash that is not part of a string literal or comment, it is in a backslash that is not part of a string literal or comment, it is
joined with the following forming a single logical line, deleting the joined with the following forming a single logical line, deleting the
backslash and the following end-of-line character. backslash and the following end-of-line character.
@ -160,13 +161,14 @@ terminates a multi-line statement.
\subsection{Indentation} \subsection{Indentation}
Spaces and tabs at the beginning of a logical line are used to compute Leading whitespace (spaces and tabs) at the beginning of a logical
the indentation level of the line, which in turn is used to determine line is used to compute the indentation level of the line, which in
the grouping of statements. turn is used to determine the grouping of statements.
First, each tab is replaced by one to eight spaces such that the total First, tabs are replaced (from left to right) by one to eight spaces
number of spaces up to that point is a multiple of eight. The total such that the total number of characters up to there is a multiple of
number of spaces preceding the first non-blank character then eight (this is intended to be the same rule as used by UNIX). The
total number of spaces preceding the first non-blank character then
determines the line's indentation. Indentation cannot be split over determines the line's indentation. Indentation cannot be split over
multiple physical lines using backslashes. multiple physical lines using backslashes.
@ -185,6 +187,38 @@ popped off, and for each number popped off a DEDENT token is
generated. At the end of the file, a DEDENT token is generated for generated. At the end of the file, a DEDENT token is generated for
each number remaining on the stack that is larger than zero. each number remaining on the stack that is larger than zero.
Here is an example of a correctly (though confusingly) indented piece
of Python code:
\begin{verbatim}
def perm(l):
if len(l) <= 1:
return [l]
r = []
for i in range(len(l)):
s = l[:i] + l[i+1:]
p = perm(s)
for x in p:
r.append(l[i:i+1] + x)
return r
\end{verbatim}
The following example shows various indentation errors:
\begin{verbatim}
def perm(l): # error: first line indented
for i in range(len(l)): # error: not indented
s = l[:i] + l[i+1:]
p = perm(l[:i] + l[i+1:]) # error: unexpected indent
for x in p:
r.append(l[i:i+1] + x)
return r # error: inconsistent indent
\end{verbatim}
(Actually, the first three errors are detected by the parser; only the
last error is found by the lexical analyzer -- the indentation of
\verb\return r\ does not match a level popped off the stack.)
\section{Other tokens} \section{Other tokens}
Besides NEWLINE, INDENT and DEDENT, the following categories of tokens Besides NEWLINE, INDENT and DEDENT, the following categories of tokens
@ -205,12 +239,13 @@ uppercase: "A"..."Z"
digit: "0"..."9" digit: "0"..."9"
\end{verbatim} \end{verbatim}
Identifiers are unlimited in length. Case is significant. Identifiers are unlimited in length. Case is significant. Keywords
are not identifiers.
\section{Keywords} \section{Keywords}
The following identifiers are used as reserved words, or {\em The following identifiers are used as reserved words, or {\em
keywords} of the language, and may not be used as ordinary keywords} of the language, and cannot be used as ordinary
identifiers. They must be spelled exactly as written here: identifiers. They must be spelled exactly as written here:
\begin{verbatim} \begin{verbatim}
@ -260,7 +295,7 @@ are:
\verb/\'/ & Single quote (\verb/'/) \\ \verb/\'/ & Single quote (\verb/'/) \\
\verb/\a/ & ASCII Bell (BEL) \\ \verb/\a/ & ASCII Bell (BEL) \\
\verb/\b/ & ASCII Backspace (BS) \\ \verb/\b/ & ASCII Backspace (BS) \\
\verb/\E/ & ASCII Escape (ESC) \\ %\verb/\E/ & ASCII Escape (ESC) \\
\verb/\f/ & ASCII Formfeed (FF) \\ \verb/\f/ & ASCII Formfeed (FF) \\
\verb/\n/ & ASCII Linefeed (LF) \\ \verb/\n/ & ASCII Linefeed (LF) \\
\verb/\r/ & ASCII Carriage Return (CR) \\ \verb/\r/ & ASCII Carriage Return (CR) \\
@ -272,13 +307,13 @@ are:
\end{tabular} \end{tabular}
\end{center} \end{center}
For compatibility with in Standard C, up to three octal digits are In strict compatibility with in Standard C, up to three octal digits are
accepted, but an unlimited number of hex digits is taken to be part of accepted, but an unlimited number of hex digits is taken to be part of
the hex escape (and then the lower 8 bits of the resulting hex number the hex escape (and then the lower 8 bits of the resulting hex number
are used...). are used in all current implementations...).
All unrecognized escape sequences are left in the string {\em All unrecognized escape sequences are left in the string unchanged,
unchanged}, i.e., the backslash is left in the string. (This rule is i.e., {\em the backslash is left in the string.} (This rule is
useful when debugging: if an escape sequence is mistyped, the useful when debugging: if an escape sequence is mistyped, the
resulting output is more easily recognized as broken. It also helps a resulting output is more easily recognized as broken. It also helps a
great deal for string literals used as regular expressions or great deal for string literals used as regular expressions or
@ -313,6 +348,18 @@ fraction: "." digit+
exponent: ("e"|"E") ["+"|"-"] digit+ exponent: ("e"|"E") ["+"|"-"] digit+
\end{verbatim} \end{verbatim}
Some examples of numeric literals:
\begin{verbatim}
1 1234567890 0177777 0x80000
\end{verbatim}
Note that the definitions for literals do not include a sign; a phrase
like \verb\-1\ is actually an expression composed of the operator
\verb\-\ and the literal \verb\1\.
\section{Operators} \section{Operators}
The following tokens are operators: The following tokens are operators:
@ -336,13 +383,16 @@ meaning:
; , : . ` = ; , : . ` =
\end{verbatim} \end{verbatim}
The following printing ASCII characters are currently not used; The following printing ASCII characters are not used in Python (except
their occurrence is an unconditional error: in string literals and in comments). Their occurrence is an
unconditional error:
\begin{verbatim} \begin{verbatim}
! @ $ " ? ! @ $ " ?
\end{verbatim} \end{verbatim}
They may be used by future versions of the language though!
\chapter{Execution model} \chapter{Execution model}
(XXX This chapter should explain the general model of the execution of (XXX This chapter should explain the general model of the execution of