Fix return value for m.group() for groups not in the part of the RE that
matched; reported by Paul Moore. Wrapped several long lines.
This commit is contained in:
parent
f8c7c20ba5
commit
f4bdb57e15
|
@ -74,16 +74,16 @@ further information and a gentler presentation, consult the Regular
|
||||||
Expression HOWTO, accessible from \url{http://www.python.org/doc/howto/}.
|
Expression HOWTO, accessible from \url{http://www.python.org/doc/howto/}.
|
||||||
|
|
||||||
Regular expressions can contain both special and ordinary characters.
|
Regular expressions can contain both special and ordinary characters.
|
||||||
Most ordinary characters, like \character{A}, \character{a}, or \character{0},
|
Most ordinary characters, like \character{A}, \character{a}, or
|
||||||
are the simplest regular expressions; they simply match themselves.
|
\character{0}, are the simplest regular expressions; they simply match
|
||||||
You can concatenate ordinary characters, so \regexp{last} matches the
|
themselves. You can concatenate ordinary characters, so \regexp{last}
|
||||||
string \code{'last'}. (In the rest of this section, we'll write RE's in
|
matches the string \code{'last'}. (In the rest of this section, we'll
|
||||||
\regexp{this special style}, usually without quotes, and strings to be
|
write RE's in \regexp{this special style}, usually without quotes, and
|
||||||
matched \code{'in single quotes'}.)
|
strings to be matched \code{'in single quotes'}.)
|
||||||
|
|
||||||
Some characters, like \character{|} or \character{(}, are special. Special
|
Some characters, like \character{|} or \character{(}, are special.
|
||||||
characters either stand for classes of ordinary characters, or affect
|
Special characters either stand for classes of ordinary characters, or
|
||||||
how the regular expressions around them are interpreted.
|
affect how the regular expressions around them are interpreted.
|
||||||
|
|
||||||
The special characters are:
|
The special characters are:
|
||||||
|
|
||||||
|
@ -114,15 +114,16 @@ will not match just 'a'.
|
||||||
\item[\character{?}] Causes the resulting RE to
|
\item[\character{?}] Causes the resulting RE to
|
||||||
match 0 or 1 repetitions of the preceding RE. \regexp{ab?} will
|
match 0 or 1 repetitions of the preceding RE. \regexp{ab?} will
|
||||||
match either 'a' or 'ab'.
|
match either 'a' or 'ab'.
|
||||||
\item[\code{*?}, \code{+?}, \code{??}] The \character{*}, \character{+}, and
|
|
||||||
\character{?} qualifiers are all \dfn{greedy}; they match as much text as
|
\item[\code{*?}, \code{+?}, \code{??}] The \character{*},
|
||||||
possible. Sometimes this behaviour isn't desired; if the RE
|
\character{+}, and \character{?} qualifiers are all \dfn{greedy}; they
|
||||||
\regexp{<.*>} is matched against \code{'<H1>title</H1>'}, it will match the
|
match as much text as possible. Sometimes this behaviour isn't
|
||||||
entire string, and not just \code{'<H1>'}.
|
desired; if the RE \regexp{<.*>} is matched against
|
||||||
Adding \character{?} after the qualifier makes it perform the match in
|
\code{'<H1>title</H1>'}, it will match the entire string, and not just
|
||||||
\dfn{non-greedy} or \dfn{minimal} fashion; as \emph{few} characters as
|
\code{'<H1>'}. Adding \character{?} after the qualifier makes it
|
||||||
possible will be matched. Using \regexp{.*?} in the previous
|
perform the match in \dfn{non-greedy} or \dfn{minimal} fashion; as
|
||||||
expression will match only \code{'<H1>'}.
|
\emph{few} characters as possible will be matched. Using \regexp{.*?}
|
||||||
|
in the previous expression will match only \code{'<H1>'}.
|
||||||
|
|
||||||
\item[\code{\{\var{m},\var{n}\}}] Causes the resulting RE to match from
|
\item[\code{\{\var{m},\var{n}\}}] Causes the resulting RE to match from
|
||||||
\var{m} to \var{n} repetitions of the preceding RE, attempting to
|
\var{m} to \var{n} repetitions of the preceding RE, attempting to
|
||||||
|
@ -167,10 +168,10 @@ backslash, or place it as the first character. The
|
||||||
pattern \regexp{[]]} will match \code{']'}, for example.
|
pattern \regexp{[]]} will match \code{']'}, for example.
|
||||||
|
|
||||||
You can match the characters not within a range by \dfn{complementing}
|
You can match the characters not within a range by \dfn{complementing}
|
||||||
the set. This is indicated by including a
|
the set. This is indicated by including a \character{\^} as the first
|
||||||
\character{\^} as the first character of the set; \character{\^} elsewhere will
|
character of the set; \character{\^} elsewhere will simply match the
|
||||||
simply match the \character{\^} character. For example, \regexp{[{\^}5]}
|
\character{\^} character. For example, \regexp{[{\^}5]} will match
|
||||||
will match any character except \character{5}.
|
any character except \character{5}.
|
||||||
|
|
||||||
\item[\character{|}]\code{A|B}, where A and B can be arbitrary REs,
|
\item[\character{|}]\code{A|B}, where A and B can be arbitrary REs,
|
||||||
creates a regular expression that will match either A or B. An
|
creates a regular expression that will match either A or B. An
|
||||||
|
@ -399,8 +400,9 @@ expression will be used several times in a single program.
|
||||||
|
|
||||||
\begin{datadesc}{I}
|
\begin{datadesc}{I}
|
||||||
\dataline{IGNORECASE}
|
\dataline{IGNORECASE}
|
||||||
Perform case-insensitive matching; expressions like \regexp{[A-Z]} will match
|
Perform case-insensitive matching; expressions like \regexp{[A-Z]}
|
||||||
lowercase letters, too. This is not affected by the current locale.
|
will match lowercase letters, too. This is not affected by the
|
||||||
|
current locale.
|
||||||
\end{datadesc}
|
\end{datadesc}
|
||||||
|
|
||||||
\begin{datadesc}{L}
|
\begin{datadesc}{L}
|
||||||
|
@ -414,11 +416,11 @@ Make \regexp{\e w}, \regexp{\e W}, \regexp{\e b}, and
|
||||||
When specified, the pattern character \character{\^} matches at the
|
When specified, the pattern character \character{\^} matches at the
|
||||||
beginning of the string and at the beginning of each line
|
beginning of the string and at the beginning of each line
|
||||||
(immediately following each newline); and the pattern character
|
(immediately following each newline); and the pattern character
|
||||||
\character{\$} matches at the end of the string and at the end of each line
|
\character{\$} matches at the end of the string and at the end of each
|
||||||
(immediately preceding each newline).
|
line (immediately preceding each newline). By default, \character{\^}
|
||||||
By default, \character{\^} matches only at the beginning of the string, and
|
matches only at the beginning of the string, and \character{\$} only
|
||||||
\character{\$} only at the end of the string and immediately before the
|
at the end of the string and immediately before the newline (if any)
|
||||||
newline (if any) at the end of the string.
|
at the end of the string.
|
||||||
\end{datadesc}
|
\end{datadesc}
|
||||||
|
|
||||||
\begin{datadesc}{S}
|
\begin{datadesc}{S}
|
||||||
|
@ -440,9 +442,10 @@ Make \regexp{\e w}, \regexp{\e W}, \regexp{\e b}, and
|
||||||
This flag allows you to write regular expressions that look nicer.
|
This flag allows you to write regular expressions that look nicer.
|
||||||
Whitespace within the pattern is ignored,
|
Whitespace within the pattern is ignored,
|
||||||
except when in a character class or preceded by an unescaped
|
except when in a character class or preceded by an unescaped
|
||||||
backslash, and, when a line contains a \character{\#} neither in a character
|
backslash, and, when a line contains a \character{\#} neither in a
|
||||||
class or preceded by an unescaped backslash, all characters from the
|
character class or preceded by an unescaped backslash, all characters
|
||||||
leftmost such \character{\#} through the end of the line are ignored.
|
from the leftmost such \character{\#} through the end of the line are
|
||||||
|
ignored.
|
||||||
% XXX should add an example here
|
% XXX should add an example here
|
||||||
\end{datadesc}
|
\end{datadesc}
|
||||||
|
|
||||||
|
@ -521,17 +524,18 @@ embedded modifiers in a pattern; for example,
|
||||||
\samp{sub("(?i)b+", "x", "bbbb BBBB")} returns \code{'x x'}.
|
\samp{sub("(?i)b+", "x", "bbbb BBBB")} returns \code{'x x'}.
|
||||||
|
|
||||||
The optional argument \var{count} is the maximum number of pattern
|
The optional argument \var{count} is the maximum number of pattern
|
||||||
occurrences to be replaced; \var{count} must be a non-negative integer, and
|
occurrences to be replaced; \var{count} must be a non-negative
|
||||||
the default value of 0 means to replace all occurrences.
|
integer, and the default value of 0 means to replace all occurrences.
|
||||||
|
|
||||||
Empty matches for the pattern are replaced only when not adjacent to a
|
Empty matches for the pattern are replaced only when not adjacent to a
|
||||||
previous match, so \samp{sub('x*', '-', 'abc')} returns \code{'-a-b-c-'}.
|
previous match, so \samp{sub('x*', '-', 'abc')} returns
|
||||||
|
\code{'-a-b-c-'}.
|
||||||
|
|
||||||
If \var{repl} is a string, any backslash escapes in it are processed.
|
If \var{repl} is a string, any backslash escapes in it are processed.
|
||||||
That is, \samp{\e n} is converted to a single newline character,
|
That is, \samp{\e n} is converted to a single newline character,
|
||||||
\samp{\e r} is converted to a linefeed, and so forth. Unknown escapes
|
\samp{\e r} is converted to a linefeed, and so forth. Unknown escapes
|
||||||
such as \samp{\e j} are left alone. Backreferences, such as \samp{\e 6}, are
|
such as \samp{\e j} are left alone. Backreferences, such as \samp{\e
|
||||||
replaced with the substring matched by group 6 in the pattern.
|
6}, are replaced with the substring matched by group 6 in the pattern.
|
||||||
|
|
||||||
In addition to character escapes and backreferences as described
|
In addition to character escapes and backreferences as described
|
||||||
above, \samp{\e g<name>} will use the substring matched by the group
|
above, \samp{\e g<name>} will use the substring matched by the group
|
||||||
|
@ -641,15 +645,16 @@ The pattern string from which the RE object was compiled.
|
||||||
|
|
||||||
\subsection{Match Objects \label{match-objects}}
|
\subsection{Match Objects \label{match-objects}}
|
||||||
|
|
||||||
\class{MatchObject} instances support the following methods and attributes:
|
\class{MatchObject} instances support the following methods and
|
||||||
|
attributes:
|
||||||
|
|
||||||
\begin{methoddesc}[MatchObject]{expand}{template}
|
\begin{methoddesc}[MatchObject]{expand}{template}
|
||||||
Return the string obtained by doing backslash substitution on the
|
Return the string obtained by doing backslash substitution on the
|
||||||
template string \var{template}, as done by the \method{sub()} method.
|
template string \var{template}, as done by the \method{sub()} method.
|
||||||
Escapes such as \samp{\e n} are converted to the appropriate
|
Escapes such as \samp{\e n} are converted to the appropriate
|
||||||
characters, and numeric backreferences (\samp{\e 1}, \samp{\e 2}) and named
|
characters, and numeric backreferences (\samp{\e 1}, \samp{\e 2}) and
|
||||||
backreferences (\samp{\e g<1>}, \samp{\e g<name>}) are replaced by the contents of the
|
named backreferences (\samp{\e g<1>}, \samp{\e g<name>}) are replaced
|
||||||
corresponding group.
|
by the contents of the corresponding group.
|
||||||
\end{methoddesc}
|
\end{methoddesc}
|
||||||
|
|
||||||
\begin{methoddesc}[MatchObject]{group}{\optional{group1, \moreargs}}
|
\begin{methoddesc}[MatchObject]{group}{\optional{group1, \moreargs}}
|
||||||
|
@ -664,7 +669,7 @@ the string matching the the corresponding parenthesized group. If a
|
||||||
group number is negative or larger than the number of groups defined
|
group number is negative or larger than the number of groups defined
|
||||||
in the pattern, an \exception{IndexError} exception is raised.
|
in the pattern, an \exception{IndexError} exception is raised.
|
||||||
If a group is contained in a part of the pattern that did not match,
|
If a group is contained in a part of the pattern that did not match,
|
||||||
the corresponding result is \code{-1}. If a group is contained in a
|
the corresponding result is \code{None}. If a group is contained in a
|
||||||
part of the pattern that matched multiple times, the last match is
|
part of the pattern that matched multiple times, the last match is
|
||||||
returned.
|
returned.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue