mirror of https://github.com/python/cpython
First round of corrections (lexer only).
This commit is contained in:
parent
01ebbb80ab
commit
4fc43bc377
263
Doc/ref.tex
263
Doc/ref.tex
|
@ -42,9 +42,8 @@ and MS-DOS.
|
|||
This reference manual describes the syntax and ``core semantics'' of
|
||||
the language. It is terse, but exact and complete. The semantics of
|
||||
non-essential built-in object types and of the built-in functions and
|
||||
modules are described in the {\em Library Reference} document. For an
|
||||
informal introduction to the language, see the {\em Tutorial}
|
||||
document.
|
||||
modules are described in the {\em Python Library Reference}. For an
|
||||
informal introduction to the language, see the {\em Python Tutorial}.
|
||||
|
||||
\end{abstract}
|
||||
|
||||
|
@ -63,132 +62,119 @@ It is not intended as a tutorial.
|
|||
|
||||
\chapter{Lexical analysis}
|
||||
|
||||
A Python program is read by a {\em parser}.
|
||||
Input to the parser is a stream of {\em tokens}, generated
|
||||
by the {\em lexical analyzer}.
|
||||
A Python program is read by a {\em parser}. Input to the parser is a
|
||||
stream of {\em tokens}, generated by the {\em lexical analyzer}. This
|
||||
chapter describes how the lexical analyzer breaks a file into tokens.
|
||||
|
||||
\section{Line structure}
|
||||
|
||||
A Python program is divided in a number of logical lines.
|
||||
Statements may not straddle logical line boundaries except where
|
||||
explicitly allowed by the syntax.
|
||||
To this purpose, the end of a logical line
|
||||
is represented by the token NEWLINE.
|
||||
A Python program is divided in a number of logical lines. Statements
|
||||
do not straddle logical line boundaries except where explicitly
|
||||
indicated by the syntax (i.e., for compound statements). To this
|
||||
purpose, the end of a logical line is represented by the token
|
||||
NEWLINE.
|
||||
|
||||
\subsection{Comments}
|
||||
|
||||
A comment starts with a hash character (\verb/#/) and ends at the end
|
||||
of the physical line. Comments are ignored by the syntax.
|
||||
A hash character in a string literal does not start a comment.
|
||||
A comment starts with a hash character (\verb\#\) that is not part of
|
||||
a string literal, and ends at the end of the physical line. Comments
|
||||
are ignored by the syntax.
|
||||
|
||||
\subsection{Line joining}
|
||||
|
||||
Physical lines may be joined into logical lines using backslash
|
||||
characters (\verb/\/), as follows.
|
||||
If a physical line ends in a backslash that is not part of a string
|
||||
literal or comment, it is joined with
|
||||
the following forming a single logical line, deleting the backslash
|
||||
and the following end-of-line character. More than two physical
|
||||
lines may be joined together in this way.
|
||||
Two or more physical lines may be joined into logical lines using
|
||||
backslash characters (\verb/\/), as follows: When physical line ends
|
||||
in a backslash that is not part of a string literal or comment, it is
|
||||
joined with the following forming a single logical line, deleting the
|
||||
backslash and the following end-of-line character.
|
||||
|
||||
\subsection{Blank lines}
|
||||
|
||||
A physical line that is not the continuation of the previous line
|
||||
and contains only spaces, tabs and possibly a comment, is ignored
|
||||
(i.e., no NEWLINE token is generated),
|
||||
except that during interactive input of statements, an empty
|
||||
physical line terminates a multi-line statement.
|
||||
A logical line that contains only spaces, tabs, and possibly a
|
||||
comment, is ignored (i.e., no NEWLINE token is generated), except that
|
||||
during interactive input of statements, an entirely blank logical line
|
||||
terminates a multi-line statement.
|
||||
|
||||
\subsection{Indentation}
|
||||
|
||||
Spaces and tabs at the beginning of a line are used to compute
|
||||
Spaces and tabs at the beginning of a logical line are used to compute
|
||||
the indentation level of the line, which in turn is used to determine
|
||||
the grouping of statements.
|
||||
|
||||
First, each tab is replaced by one to eight spaces such that the column number
|
||||
of the next character is a multiple of eight (counting from zero).
|
||||
The column number of the first non-space character then defines the
|
||||
line's indentation.
|
||||
Indentation cannot be split over multiple physical lines using
|
||||
backslashes.
|
||||
First, each tab is replaced by one to eight spaces such that the total
|
||||
number of spaces up to that point is a multiple of eight. The total
|
||||
number of spaces preceding the first non-blank character then
|
||||
determines the line's indentation. Indentation cannot be split over
|
||||
multiple physical lines using backslashes.
|
||||
|
||||
The indentation levels of consecutive lines are used to generate
|
||||
INDENT and DEDENT tokens, using a stack, as follows.
|
||||
|
||||
Before the first line of the file is read, a single zero is pushed on
|
||||
the stack; this will never be popped off again. The numbers pushed
|
||||
on the stack will always be strictly increasing from bottom to top.
|
||||
At the beginning of each logical line, the line's indentation level
|
||||
is compared to the top of the stack.
|
||||
If it is equal, nothing happens.
|
||||
If it larger, it is pushed on the stack, and one INDENT token is generated.
|
||||
If it is smaller, it {\em must} be one of the numbers occurring on the
|
||||
stack; all numbers on the stack that are larger are popped off,
|
||||
and for each number popped off a DEDENT token is generated.
|
||||
At the end of the file, a DEDENT token is generated for each number
|
||||
remaining on the stack that is larger than zero.
|
||||
the stack; this will never be popped off again. The numbers pushed on
|
||||
the stack will always be strictly increasing from bottom to top. At
|
||||
the beginning of each logical line, the line's indentation level is
|
||||
compared to the top of the stack. If it is equal, nothing happens.
|
||||
If it larger, it is pushed on the stack, and one INDENT token is
|
||||
generated. If it is smaller, it {\em must} be one of the numbers
|
||||
occurring on the stack; all numbers on the stack that are larger are
|
||||
popped off, and for each number popped off a DEDENT token is
|
||||
generated. At the end of the file, a DEDENT token is generated for
|
||||
each number remaining on the stack that is larger than zero.
|
||||
|
||||
\section{Other tokens}
|
||||
|
||||
Besides NEWLINE, INDENT and DEDENT, the following categories of tokens
|
||||
exist: identifiers, keywords, literals, operators, and delimiters.
|
||||
Spaces and tabs are not tokens, but serve to delimit tokens.
|
||||
Where ambiguity exists, a token comprises the longest possible
|
||||
string that forms a legal token, when reading from left to right.
|
||||
Spaces and tabs are not tokens, but serve to delimit tokens. Where
|
||||
ambiguity exists, a token comprises the longest possible string that
|
||||
forms a legal token, when read from left to right.
|
||||
|
||||
Tokens are described using an extended regular expression notation.
|
||||
This is similar to the extended BNF notation used later, except that
|
||||
the notation <...> is used to give an informal description of a character,
|
||||
and that spaces and tabs are not to be ignored.
|
||||
the notation \verb\<...>\ is used to give an informal description of a
|
||||
character, and that spaces and tabs are not to be ignored.
|
||||
|
||||
\section{Identifiers}
|
||||
|
||||
Identifiers are described by the following regular expressions:
|
||||
|
||||
\begin{verbatim}
|
||||
identifier: (letter|'_') (letter|digit|'_')*
|
||||
identifier: (letter|"_") (letter|digit|"_")*
|
||||
letter: lowercase | uppercase
|
||||
lowercase: 'a'|'b'|...|'z'
|
||||
uppercase: 'A'|'B'|...|'Z'
|
||||
digit: '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
|
||||
lowercase: "a"|"b"|...|"z"
|
||||
uppercase: "A"|"B"|...|"Z"
|
||||
digit: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
|
||||
\end{verbatim}
|
||||
|
||||
Identifiers are unlimited in length.
|
||||
Upper and lower case letters are different.
|
||||
Identifiers are unlimited in length. Case is significant.
|
||||
|
||||
\section{Keywords}
|
||||
|
||||
The following tokens are used as reserved words,
|
||||
or keywords of the language,
|
||||
and may not be used as ordinary identifiers.
|
||||
They must be spelled exactly as written here:
|
||||
The following identifiers are used as reserved words, or {\em
|
||||
keywords} of the language, and may not be used as ordinary
|
||||
identifiers. They must be spelled exactly as written here:
|
||||
|
||||
{\tt
|
||||
and
|
||||
break
|
||||
class
|
||||
continue
|
||||
def
|
||||
del
|
||||
elif
|
||||
else
|
||||
except
|
||||
finally
|
||||
for
|
||||
from
|
||||
if
|
||||
import
|
||||
in
|
||||
is
|
||||
not
|
||||
or
|
||||
pass
|
||||
print
|
||||
raise
|
||||
return
|
||||
try
|
||||
while
|
||||
}
|
||||
\begin{verbatim}
|
||||
and del for is raise
|
||||
break elif from not return
|
||||
class else if or try
|
||||
continue except import pass while
|
||||
def finally in print
|
||||
\end{verbatim}
|
||||
|
||||
% import string
|
||||
% l = []
|
||||
% try:
|
||||
% while 1:
|
||||
% l = l + string.split(raw_input())
|
||||
% except EOFError:
|
||||
% pass
|
||||
% l.sort()
|
||||
% for i in range((len(l)+4)/5):
|
||||
% for j in range(i, len(l), 5):
|
||||
% print string.ljust(l[j], 10),
|
||||
% print
|
||||
|
||||
\section{Literals}
|
||||
|
||||
|
@ -197,24 +183,47 @@ They must be spelled exactly as written here:
|
|||
String literals are described by the following regular expressions:
|
||||
|
||||
\begin{verbatim}
|
||||
stringliteral: '\'' stringitem* '\''
|
||||
stringliteral: "'" stringitem* "'"
|
||||
stringitem: stringchar | escapeseq
|
||||
stringchar: <any character except newline or '\\' or '\''>
|
||||
escapeseq: '\\' <any character except newline>
|
||||
stringchar: <any character except newline or "\" or "'">
|
||||
escapeseq: "'" <any character except newline>
|
||||
\end{verbatim}
|
||||
|
||||
String literals cannot span physical line boundaries.
|
||||
Escape sequences in strings are actually interpreted according to almost the
|
||||
same rules as used by Standard C
|
||||
(XXX which should be made explicit here),
|
||||
except that \verb/\E/ is equivalent to \verb/\033/,
|
||||
\verb/\"/ is not recognized,
|
||||
newline characters cannot be escaped, and
|
||||
{\em all unrecognized escape sequences are left in the string unchanged}.
|
||||
(The latter rule is useful when debugging: if an escape sequence is
|
||||
mistyped, the resulting output is more easily recognized as broken.
|
||||
It also helps somewhat for string literals used as regular expressions
|
||||
or otherwise passed to other modules that do their own escape handling.)
|
||||
String literals cannot span physical line boundaries. Escape
|
||||
sequences in strings are actually interpreted according to rules
|
||||
simular to those used by Standard C. The recognized escape sequences
|
||||
are:
|
||||
|
||||
\begin{center}
|
||||
\begin{tabular}{|l|l|}
|
||||
\hline
|
||||
\verb/\\/ & Backslash (\verb/\/) \\
|
||||
\verb/\'/ & Single quote (\verb/'/) \\
|
||||
\verb/\a/ & ASCII Bell (BEL) \\
|
||||
\verb/\b/ & ASCII Backspace (BS) \\
|
||||
\verb/\E/ & ASCII Escape (ESC) \\
|
||||
\verb/\f/ & ASCII Formfeed (FF) \\
|
||||
\verb/\n/ & ASCII Linefeed (LF) \\
|
||||
\verb/\r/ & ASCII Carriage Return (CR) \\
|
||||
\verb/\t/ & ASCII Horizontal Tab (TAB) \\
|
||||
\verb/\v/ & ASCII Vertical Tab (VT) \\
|
||||
\verb/\/{\em ooo} & ASCII character with octal value {\em ooo} \\
|
||||
\verb/\x/{em xx...} & ASCII character with hex value {\em xx} \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
|
||||
For compatibility with in Standard C, up to three octal digits are
|
||||
accepted, but an unlimited number of hex digits is taken to be part of
|
||||
the hex escape (and then the lower 8 bits of the resulting hex number
|
||||
are used...).
|
||||
|
||||
All unrecognized escape sequences are left in the string {\em
|
||||
unchanged}, i.e., the backslash is left in the string. (This rule is
|
||||
useful when debugging: if an escape sequence is mistyped, the
|
||||
resulting output is more easily recognized as broken. It also helps
|
||||
somewhat for string literals used as regular expressions or otherwise
|
||||
passed to other modules that do their own escape handling.)
|
||||
|
||||
\subsection{Numeric literals}
|
||||
|
||||
|
@ -224,24 +233,24 @@ and floating point numbers.
|
|||
Integers and long integers are described by the following regular expressions:
|
||||
|
||||
\begin{verbatim}
|
||||
longinteger: integer ('l'|'L')
|
||||
longinteger: integer ("l"|"L")
|
||||
integer: decimalinteger | octinteger | hexinteger
|
||||
decimalinteger: nonzerodigit digit* | '0'
|
||||
octinteger: '0' octdigit+
|
||||
hexinteger: '0' ('x'|'X') hexdigit+
|
||||
decimalinteger: nonzerodigit digit* | "0"
|
||||
octinteger: "0" octdigit+
|
||||
hexinteger: "0" ("x"|"X") hexdigit+
|
||||
|
||||
nonzerodigit: '1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
|
||||
octdigit: '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'
|
||||
hexdigit: digit|'a'|'b'|'c'|'d'|'e'|'f'|'A'|'B'|'C'|'D'|'E'|'F'
|
||||
nonzerodigit: "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
|
||||
octdigit: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"
|
||||
hexdigit: digit|"a"|"b"|"c"|"d"|"e"|"f"|"A"|"B"|"C"|"D"|"E"|"F"
|
||||
\end{verbatim}
|
||||
|
||||
Floating point numbers are described by the following regular expressions:
|
||||
|
||||
\begin{verbatim}
|
||||
floatnumber: [intpart] fraction [exponent] | intpart ['.'] exponent
|
||||
floatnumber: [intpart] fraction [exponent] | intpart ["."] exponent
|
||||
intpart: digit+
|
||||
fraction: '.' digit+
|
||||
exponent: ('e'|'E') ['+'|'-'] digit+
|
||||
fraction: "." digit+
|
||||
exponent: ("e"|"E") ["+"|"-"] digit+
|
||||
\end{verbatim}
|
||||
|
||||
\section{Operators}
|
||||
|
@ -292,15 +301,15 @@ conditions. Conditions are a superset of expressions, and a condition
|
|||
may be used where an expression is required by enclosing it in
|
||||
parentheses. The only place where an unparenthesized condition
|
||||
is not allowed is on the right-hand side of the assignment operator,
|
||||
because this operator is the same token (\verb/'='/) as used for
|
||||
because this operator is the same token (\verb\=\) as used for
|
||||
compasisons.
|
||||
|
||||
The comma plays a somewhat special role in Python's syntax.
|
||||
It is an operator with a lower precedence than all others, but
|
||||
occasionally serves other purposes as well (e.g., it has special
|
||||
semantics in print statements). When a comma is accepted by the
|
||||
syntax, one of the syntactic categories \verb/expression_list/
|
||||
or \verb/condition_list/ is always used.
|
||||
syntax, one of the syntactic categories \verb\expression_list\
|
||||
or \verb\condition_list\ is always used.
|
||||
|
||||
When (one alternative of) a syntax rule has the form
|
||||
|
||||
|
@ -308,8 +317,8 @@ When (one alternative of) a syntax rule has the form
|
|||
name: othername
|
||||
\end{verbatim}
|
||||
|
||||
and no semantics are given, the semantics of this form of \verb/name/
|
||||
are the same as for \verb/othername/.
|
||||
and no semantics are given, the semantics of this form of \verb\name\
|
||||
are the same as for \verb\othername\.
|
||||
|
||||
\section{Arithmetic conversions}
|
||||
|
||||
|
@ -414,11 +423,11 @@ key value prevails.
|
|||
A string conversion evaluates the contained condition list and converts the
|
||||
resulting object into a string according to rules specific to its type.
|
||||
|
||||
If the object is a string, a number, \verb/None/, or a tuple, list or
|
||||
If the object is a string, a number, \verb\None\, or a tuple, list or
|
||||
dictionary containing only objects whose type is in this list,
|
||||
the resulting
|
||||
string is a valid Python expression which can be passed to the
|
||||
built-in function \verb/eval()/ to yield an expression with the
|
||||
built-in function \verb\eval()\ to yield an expression with the
|
||||
same value (or an approximation, if floating point numbers are
|
||||
involved).
|
||||
|
||||
|
@ -459,11 +468,11 @@ Their syntax is:
|
|||
factor: primary | '-' factor | '+' factor | '~' factor
|
||||
\end{verbatim}
|
||||
|
||||
The unary \verb/'-'/ operator yields the negative of its numeric argument.
|
||||
The unary \verb\-\ operator yields the negative of its numeric argument.
|
||||
|
||||
The unary \verb/'+'/ operator yields its numeric argument unchanged.
|
||||
The unary \verb\+\ operator yields its numeric argument unchanged.
|
||||
|
||||
The unary \verb/'~'/ operator yields the bit-wise negation of its
|
||||
The unary \verb\~\ operator yields the bit-wise negation of its
|
||||
integral numerical argument.
|
||||
|
||||
In all three cases, if the argument does not have the proper type,
|
||||
|
@ -477,7 +486,7 @@ Terms represent the most tightly binding binary operators:
|
|||
term: factor | term '*' factor | term '/' factor | term '%' factor
|
||||
\end{verbatim}
|
||||
|
||||
The \verb/'*'/ operator yields the product of its arguments.
|
||||
The \verb\*\ operator yields the product of its arguments.
|
||||
The arguments must either both be numbers, or one argument must be
|
||||
a (short) integer and the other must be a string.
|
||||
In the former case, the numbers are converted to a common type
|
||||
|
@ -572,7 +581,7 @@ it is optional in all other cases (a single expression without
|
|||
a trailing comma doesn't create a tuple, but rather yields the
|
||||
value of that expression).
|
||||
|
||||
To create an empty tuple, use an empty pair of parentheses: \verb/()/.
|
||||
To create an empty tuple, use an empty pair of parentheses: \verb\()\.
|
||||
|
||||
\section{Comparisons}
|
||||
|
||||
|
@ -597,8 +606,8 @@ Note that $e_0 op_1 e_1 op_2 e_2$ does not imply any kind of comparison
|
|||
between $e_0$ and $e_2$, e.g., $x < y > z$ is perfectly legal.
|
||||
|
||||
For the benefit of C programmers,
|
||||
the comparison operators \verb/=/ and \verb/==/ are equivalent,
|
||||
and so are \verb/<>/ and \verb/!=/.
|
||||
the comparison operators \verb\=\ and \verb\==\ are equivalent,
|
||||
and so are \verb\<>\ and \verb\!=\.
|
||||
Use of the C variants is discouraged.
|
||||
|
||||
The operators {\tt '<', '>', '=', '>=', '<='}, and {\tt '<>'} compare
|
||||
|
@ -610,7 +619,7 @@ the value \verb\None\ compares smaller than the values of any other type.
|
|||
|
||||
(This unusual
|
||||
definition of comparison is done to simplify the definition of
|
||||
operations like sorting and the \verb/in/ and \verb/not in/ operators.)
|
||||
operations like sorting and the \verb\in\ and \verb\not in\ operators.)
|
||||
|
||||
Comparison of objects of the same type depends on the type:
|
||||
|
||||
|
@ -869,12 +878,12 @@ A space is written before each object is (converted and) written,
|
|||
unless the output system believes it is positioned at the beginning
|
||||
of a line. This is the case: (1) when no characters have been written
|
||||
to standard output; or (2) when the last character written to
|
||||
standard output is \verb/'\n'/;
|
||||
standard output is \verb/\n/;
|
||||
or (3) when the last I/O operation
|
||||
on standard output was not a \verb\print\ statement.
|
||||
|
||||
Finally,
|
||||
a \verb/'\n'/ character is written at the end,
|
||||
a \verb/\n/ character is written at the end,
|
||||
unless the \verb\print\ statement ends with a comma.
|
||||
This is the only action if the statement contains just the keyword
|
||||
\verb\print\.
|
||||
|
|
263
Doc/ref/ref.tex
263
Doc/ref/ref.tex
|
@ -42,9 +42,8 @@ and MS-DOS.
|
|||
This reference manual describes the syntax and ``core semantics'' of
|
||||
the language. It is terse, but exact and complete. The semantics of
|
||||
non-essential built-in object types and of the built-in functions and
|
||||
modules are described in the {\em Library Reference} document. For an
|
||||
informal introduction to the language, see the {\em Tutorial}
|
||||
document.
|
||||
modules are described in the {\em Python Library Reference}. For an
|
||||
informal introduction to the language, see the {\em Python Tutorial}.
|
||||
|
||||
\end{abstract}
|
||||
|
||||
|
@ -63,132 +62,119 @@ It is not intended as a tutorial.
|
|||
|
||||
\chapter{Lexical analysis}
|
||||
|
||||
A Python program is read by a {\em parser}.
|
||||
Input to the parser is a stream of {\em tokens}, generated
|
||||
by the {\em lexical analyzer}.
|
||||
A Python program is read by a {\em parser}. Input to the parser is a
|
||||
stream of {\em tokens}, generated by the {\em lexical analyzer}. This
|
||||
chapter describes how the lexical analyzer breaks a file into tokens.
|
||||
|
||||
\section{Line structure}
|
||||
|
||||
A Python program is divided in a number of logical lines.
|
||||
Statements may not straddle logical line boundaries except where
|
||||
explicitly allowed by the syntax.
|
||||
To this purpose, the end of a logical line
|
||||
is represented by the token NEWLINE.
|
||||
A Python program is divided in a number of logical lines. Statements
|
||||
do not straddle logical line boundaries except where explicitly
|
||||
indicated by the syntax (i.e., for compound statements). To this
|
||||
purpose, the end of a logical line is represented by the token
|
||||
NEWLINE.
|
||||
|
||||
\subsection{Comments}
|
||||
|
||||
A comment starts with a hash character (\verb/#/) and ends at the end
|
||||
of the physical line. Comments are ignored by the syntax.
|
||||
A hash character in a string literal does not start a comment.
|
||||
A comment starts with a hash character (\verb\#\) that is not part of
|
||||
a string literal, and ends at the end of the physical line. Comments
|
||||
are ignored by the syntax.
|
||||
|
||||
\subsection{Line joining}
|
||||
|
||||
Physical lines may be joined into logical lines using backslash
|
||||
characters (\verb/\/), as follows.
|
||||
If a physical line ends in a backslash that is not part of a string
|
||||
literal or comment, it is joined with
|
||||
the following forming a single logical line, deleting the backslash
|
||||
and the following end-of-line character. More than two physical
|
||||
lines may be joined together in this way.
|
||||
Two or more physical lines may be joined into logical lines using
|
||||
backslash characters (\verb/\/), as follows: When physical line ends
|
||||
in a backslash that is not part of a string literal or comment, it is
|
||||
joined with the following forming a single logical line, deleting the
|
||||
backslash and the following end-of-line character.
|
||||
|
||||
\subsection{Blank lines}
|
||||
|
||||
A physical line that is not the continuation of the previous line
|
||||
and contains only spaces, tabs and possibly a comment, is ignored
|
||||
(i.e., no NEWLINE token is generated),
|
||||
except that during interactive input of statements, an empty
|
||||
physical line terminates a multi-line statement.
|
||||
A logical line that contains only spaces, tabs, and possibly a
|
||||
comment, is ignored (i.e., no NEWLINE token is generated), except that
|
||||
during interactive input of statements, an entirely blank logical line
|
||||
terminates a multi-line statement.
|
||||
|
||||
\subsection{Indentation}
|
||||
|
||||
Spaces and tabs at the beginning of a line are used to compute
|
||||
Spaces and tabs at the beginning of a logical line are used to compute
|
||||
the indentation level of the line, which in turn is used to determine
|
||||
the grouping of statements.
|
||||
|
||||
First, each tab is replaced by one to eight spaces such that the column number
|
||||
of the next character is a multiple of eight (counting from zero).
|
||||
The column number of the first non-space character then defines the
|
||||
line's indentation.
|
||||
Indentation cannot be split over multiple physical lines using
|
||||
backslashes.
|
||||
First, each tab is replaced by one to eight spaces such that the total
|
||||
number of spaces up to that point is a multiple of eight. The total
|
||||
number of spaces preceding the first non-blank character then
|
||||
determines the line's indentation. Indentation cannot be split over
|
||||
multiple physical lines using backslashes.
|
||||
|
||||
The indentation levels of consecutive lines are used to generate
|
||||
INDENT and DEDENT tokens, using a stack, as follows.
|
||||
|
||||
Before the first line of the file is read, a single zero is pushed on
|
||||
the stack; this will never be popped off again. The numbers pushed
|
||||
on the stack will always be strictly increasing from bottom to top.
|
||||
At the beginning of each logical line, the line's indentation level
|
||||
is compared to the top of the stack.
|
||||
If it is equal, nothing happens.
|
||||
If it larger, it is pushed on the stack, and one INDENT token is generated.
|
||||
If it is smaller, it {\em must} be one of the numbers occurring on the
|
||||
stack; all numbers on the stack that are larger are popped off,
|
||||
and for each number popped off a DEDENT token is generated.
|
||||
At the end of the file, a DEDENT token is generated for each number
|
||||
remaining on the stack that is larger than zero.
|
||||
the stack; this will never be popped off again. The numbers pushed on
|
||||
the stack will always be strictly increasing from bottom to top. At
|
||||
the beginning of each logical line, the line's indentation level is
|
||||
compared to the top of the stack. If it is equal, nothing happens.
|
||||
If it larger, it is pushed on the stack, and one INDENT token is
|
||||
generated. If it is smaller, it {\em must} be one of the numbers
|
||||
occurring on the stack; all numbers on the stack that are larger are
|
||||
popped off, and for each number popped off a DEDENT token is
|
||||
generated. At the end of the file, a DEDENT token is generated for
|
||||
each number remaining on the stack that is larger than zero.
|
||||
|
||||
\section{Other tokens}
|
||||
|
||||
Besides NEWLINE, INDENT and DEDENT, the following categories of tokens
|
||||
exist: identifiers, keywords, literals, operators, and delimiters.
|
||||
Spaces and tabs are not tokens, but serve to delimit tokens.
|
||||
Where ambiguity exists, a token comprises the longest possible
|
||||
string that forms a legal token, when reading from left to right.
|
||||
Spaces and tabs are not tokens, but serve to delimit tokens. Where
|
||||
ambiguity exists, a token comprises the longest possible string that
|
||||
forms a legal token, when read from left to right.
|
||||
|
||||
Tokens are described using an extended regular expression notation.
|
||||
This is similar to the extended BNF notation used later, except that
|
||||
the notation <...> is used to give an informal description of a character,
|
||||
and that spaces and tabs are not to be ignored.
|
||||
the notation \verb\<...>\ is used to give an informal description of a
|
||||
character, and that spaces and tabs are not to be ignored.
|
||||
|
||||
\section{Identifiers}
|
||||
|
||||
Identifiers are described by the following regular expressions:
|
||||
|
||||
\begin{verbatim}
|
||||
identifier: (letter|'_') (letter|digit|'_')*
|
||||
identifier: (letter|"_") (letter|digit|"_")*
|
||||
letter: lowercase | uppercase
|
||||
lowercase: 'a'|'b'|...|'z'
|
||||
uppercase: 'A'|'B'|...|'Z'
|
||||
digit: '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
|
||||
lowercase: "a"|"b"|...|"z"
|
||||
uppercase: "A"|"B"|...|"Z"
|
||||
digit: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
|
||||
\end{verbatim}
|
||||
|
||||
Identifiers are unlimited in length.
|
||||
Upper and lower case letters are different.
|
||||
Identifiers are unlimited in length. Case is significant.
|
||||
|
||||
\section{Keywords}
|
||||
|
||||
The following tokens are used as reserved words,
|
||||
or keywords of the language,
|
||||
and may not be used as ordinary identifiers.
|
||||
They must be spelled exactly as written here:
|
||||
The following identifiers are used as reserved words, or {\em
|
||||
keywords} of the language, and may not be used as ordinary
|
||||
identifiers. They must be spelled exactly as written here:
|
||||
|
||||
{\tt
|
||||
and
|
||||
break
|
||||
class
|
||||
continue
|
||||
def
|
||||
del
|
||||
elif
|
||||
else
|
||||
except
|
||||
finally
|
||||
for
|
||||
from
|
||||
if
|
||||
import
|
||||
in
|
||||
is
|
||||
not
|
||||
or
|
||||
pass
|
||||
print
|
||||
raise
|
||||
return
|
||||
try
|
||||
while
|
||||
}
|
||||
\begin{verbatim}
|
||||
and del for is raise
|
||||
break elif from not return
|
||||
class else if or try
|
||||
continue except import pass while
|
||||
def finally in print
|
||||
\end{verbatim}
|
||||
|
||||
% import string
|
||||
% l = []
|
||||
% try:
|
||||
% while 1:
|
||||
% l = l + string.split(raw_input())
|
||||
% except EOFError:
|
||||
% pass
|
||||
% l.sort()
|
||||
% for i in range((len(l)+4)/5):
|
||||
% for j in range(i, len(l), 5):
|
||||
% print string.ljust(l[j], 10),
|
||||
% print
|
||||
|
||||
\section{Literals}
|
||||
|
||||
|
@ -197,24 +183,47 @@ They must be spelled exactly as written here:
|
|||
String literals are described by the following regular expressions:
|
||||
|
||||
\begin{verbatim}
|
||||
stringliteral: '\'' stringitem* '\''
|
||||
stringliteral: "'" stringitem* "'"
|
||||
stringitem: stringchar | escapeseq
|
||||
stringchar: <any character except newline or '\\' or '\''>
|
||||
escapeseq: '\\' <any character except newline>
|
||||
stringchar: <any character except newline or "\" or "'">
|
||||
escapeseq: "'" <any character except newline>
|
||||
\end{verbatim}
|
||||
|
||||
String literals cannot span physical line boundaries.
|
||||
Escape sequences in strings are actually interpreted according to almost the
|
||||
same rules as used by Standard C
|
||||
(XXX which should be made explicit here),
|
||||
except that \verb/\E/ is equivalent to \verb/\033/,
|
||||
\verb/\"/ is not recognized,
|
||||
newline characters cannot be escaped, and
|
||||
{\em all unrecognized escape sequences are left in the string unchanged}.
|
||||
(The latter rule is useful when debugging: if an escape sequence is
|
||||
mistyped, the resulting output is more easily recognized as broken.
|
||||
It also helps somewhat for string literals used as regular expressions
|
||||
or otherwise passed to other modules that do their own escape handling.)
|
||||
String literals cannot span physical line boundaries. Escape
|
||||
sequences in strings are actually interpreted according to rules
|
||||
simular to those used by Standard C. The recognized escape sequences
|
||||
are:
|
||||
|
||||
\begin{center}
|
||||
\begin{tabular}{|l|l|}
|
||||
\hline
|
||||
\verb/\\/ & Backslash (\verb/\/) \\
|
||||
\verb/\'/ & Single quote (\verb/'/) \\
|
||||
\verb/\a/ & ASCII Bell (BEL) \\
|
||||
\verb/\b/ & ASCII Backspace (BS) \\
|
||||
\verb/\E/ & ASCII Escape (ESC) \\
|
||||
\verb/\f/ & ASCII Formfeed (FF) \\
|
||||
\verb/\n/ & ASCII Linefeed (LF) \\
|
||||
\verb/\r/ & ASCII Carriage Return (CR) \\
|
||||
\verb/\t/ & ASCII Horizontal Tab (TAB) \\
|
||||
\verb/\v/ & ASCII Vertical Tab (VT) \\
|
||||
\verb/\/{\em ooo} & ASCII character with octal value {\em ooo} \\
|
||||
\verb/\x/{em xx...} & ASCII character with hex value {\em xx} \\
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{center}
|
||||
|
||||
For compatibility with in Standard C, up to three octal digits are
|
||||
accepted, but an unlimited number of hex digits is taken to be part of
|
||||
the hex escape (and then the lower 8 bits of the resulting hex number
|
||||
are used...).
|
||||
|
||||
All unrecognized escape sequences are left in the string {\em
|
||||
unchanged}, i.e., the backslash is left in the string. (This rule is
|
||||
useful when debugging: if an escape sequence is mistyped, the
|
||||
resulting output is more easily recognized as broken. It also helps
|
||||
somewhat for string literals used as regular expressions or otherwise
|
||||
passed to other modules that do their own escape handling.)
|
||||
|
||||
\subsection{Numeric literals}
|
||||
|
||||
|
@ -224,24 +233,24 @@ and floating point numbers.
|
|||
Integers and long integers are described by the following regular expressions:
|
||||
|
||||
\begin{verbatim}
|
||||
longinteger: integer ('l'|'L')
|
||||
longinteger: integer ("l"|"L")
|
||||
integer: decimalinteger | octinteger | hexinteger
|
||||
decimalinteger: nonzerodigit digit* | '0'
|
||||
octinteger: '0' octdigit+
|
||||
hexinteger: '0' ('x'|'X') hexdigit+
|
||||
decimalinteger: nonzerodigit digit* | "0"
|
||||
octinteger: "0" octdigit+
|
||||
hexinteger: "0" ("x"|"X") hexdigit+
|
||||
|
||||
nonzerodigit: '1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
|
||||
octdigit: '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'
|
||||
hexdigit: digit|'a'|'b'|'c'|'d'|'e'|'f'|'A'|'B'|'C'|'D'|'E'|'F'
|
||||
nonzerodigit: "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
|
||||
octdigit: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"
|
||||
hexdigit: digit|"a"|"b"|"c"|"d"|"e"|"f"|"A"|"B"|"C"|"D"|"E"|"F"
|
||||
\end{verbatim}
|
||||
|
||||
Floating point numbers are described by the following regular expressions:
|
||||
|
||||
\begin{verbatim}
|
||||
floatnumber: [intpart] fraction [exponent] | intpart ['.'] exponent
|
||||
floatnumber: [intpart] fraction [exponent] | intpart ["."] exponent
|
||||
intpart: digit+
|
||||
fraction: '.' digit+
|
||||
exponent: ('e'|'E') ['+'|'-'] digit+
|
||||
fraction: "." digit+
|
||||
exponent: ("e"|"E") ["+"|"-"] digit+
|
||||
\end{verbatim}
|
||||
|
||||
\section{Operators}
|
||||
|
@ -292,15 +301,15 @@ conditions. Conditions are a superset of expressions, and a condition
|
|||
may be used where an expression is required by enclosing it in
|
||||
parentheses. The only place where an unparenthesized condition
|
||||
is not allowed is on the right-hand side of the assignment operator,
|
||||
because this operator is the same token (\verb/'='/) as used for
|
||||
because this operator is the same token (\verb\=\) as used for
|
||||
compasisons.
|
||||
|
||||
The comma plays a somewhat special role in Python's syntax.
|
||||
It is an operator with a lower precedence than all others, but
|
||||
occasionally serves other purposes as well (e.g., it has special
|
||||
semantics in print statements). When a comma is accepted by the
|
||||
syntax, one of the syntactic categories \verb/expression_list/
|
||||
or \verb/condition_list/ is always used.
|
||||
syntax, one of the syntactic categories \verb\expression_list\
|
||||
or \verb\condition_list\ is always used.
|
||||
|
||||
When (one alternative of) a syntax rule has the form
|
||||
|
||||
|
@ -308,8 +317,8 @@ When (one alternative of) a syntax rule has the form
|
|||
name: othername
|
||||
\end{verbatim}
|
||||
|
||||
and no semantics are given, the semantics of this form of \verb/name/
|
||||
are the same as for \verb/othername/.
|
||||
and no semantics are given, the semantics of this form of \verb\name\
|
||||
are the same as for \verb\othername\.
|
||||
|
||||
\section{Arithmetic conversions}
|
||||
|
||||
|
@ -414,11 +423,11 @@ key value prevails.
|
|||
A string conversion evaluates the contained condition list and converts the
|
||||
resulting object into a string according to rules specific to its type.
|
||||
|
||||
If the object is a string, a number, \verb/None/, or a tuple, list or
|
||||
If the object is a string, a number, \verb\None\, or a tuple, list or
|
||||
dictionary containing only objects whose type is in this list,
|
||||
the resulting
|
||||
string is a valid Python expression which can be passed to the
|
||||
built-in function \verb/eval()/ to yield an expression with the
|
||||
built-in function \verb\eval()\ to yield an expression with the
|
||||
same value (or an approximation, if floating point numbers are
|
||||
involved).
|
||||
|
||||
|
@ -459,11 +468,11 @@ Their syntax is:
|
|||
factor: primary | '-' factor | '+' factor | '~' factor
|
||||
\end{verbatim}
|
||||
|
||||
The unary \verb/'-'/ operator yields the negative of its numeric argument.
|
||||
The unary \verb\-\ operator yields the negative of its numeric argument.
|
||||
|
||||
The unary \verb/'+'/ operator yields its numeric argument unchanged.
|
||||
The unary \verb\+\ operator yields its numeric argument unchanged.
|
||||
|
||||
The unary \verb/'~'/ operator yields the bit-wise negation of its
|
||||
The unary \verb\~\ operator yields the bit-wise negation of its
|
||||
integral numerical argument.
|
||||
|
||||
In all three cases, if the argument does not have the proper type,
|
||||
|
@ -477,7 +486,7 @@ Terms represent the most tightly binding binary operators:
|
|||
term: factor | term '*' factor | term '/' factor | term '%' factor
|
||||
\end{verbatim}
|
||||
|
||||
The \verb/'*'/ operator yields the product of its arguments.
|
||||
The \verb\*\ operator yields the product of its arguments.
|
||||
The arguments must either both be numbers, or one argument must be
|
||||
a (short) integer and the other must be a string.
|
||||
In the former case, the numbers are converted to a common type
|
||||
|
@ -572,7 +581,7 @@ it is optional in all other cases (a single expression without
|
|||
a trailing comma doesn't create a tuple, but rather yields the
|
||||
value of that expression).
|
||||
|
||||
To create an empty tuple, use an empty pair of parentheses: \verb/()/.
|
||||
To create an empty tuple, use an empty pair of parentheses: \verb\()\.
|
||||
|
||||
\section{Comparisons}
|
||||
|
||||
|
@ -597,8 +606,8 @@ Note that $e_0 op_1 e_1 op_2 e_2$ does not imply any kind of comparison
|
|||
between $e_0$ and $e_2$, e.g., $x < y > z$ is perfectly legal.
|
||||
|
||||
For the benefit of C programmers,
|
||||
the comparison operators \verb/=/ and \verb/==/ are equivalent,
|
||||
and so are \verb/<>/ and \verb/!=/.
|
||||
the comparison operators \verb\=\ and \verb\==\ are equivalent,
|
||||
and so are \verb\<>\ and \verb\!=\.
|
||||
Use of the C variants is discouraged.
|
||||
|
||||
The operators {\tt '<', '>', '=', '>=', '<='}, and {\tt '<>'} compare
|
||||
|
@ -610,7 +619,7 @@ the value \verb\None\ compares smaller than the values of any other type.
|
|||
|
||||
(This unusual
|
||||
definition of comparison is done to simplify the definition of
|
||||
operations like sorting and the \verb/in/ and \verb/not in/ operators.)
|
||||
operations like sorting and the \verb\in\ and \verb\not in\ operators.)
|
||||
|
||||
Comparison of objects of the same type depends on the type:
|
||||
|
||||
|
@ -869,12 +878,12 @@ A space is written before each object is (converted and) written,
|
|||
unless the output system believes it is positioned at the beginning
|
||||
of a line. This is the case: (1) when no characters have been written
|
||||
to standard output; or (2) when the last character written to
|
||||
standard output is \verb/'\n'/;
|
||||
standard output is \verb/\n/;
|
||||
or (3) when the last I/O operation
|
||||
on standard output was not a \verb\print\ statement.
|
||||
|
||||
Finally,
|
||||
a \verb/'\n'/ character is written at the end,
|
||||
a \verb/\n/ character is written at the end,
|
||||
unless the \verb\print\ statement ends with a comma.
|
||||
This is the only action if the statement contains just the keyword
|
||||
\verb\print\.
|
||||
|
|
Loading…
Reference in New Issue