#6538: fix regex documentation again -- use fictional class names "regex" and "match" but do not document them as classes, remove 1.5 compat info and use new default argument syntax where possible.

This commit is contained in:
Georg Brandl 2010-07-29 11:49:05 +00:00
parent ebeb44d8d3
commit c62a704189
1 changed files with 198 additions and 200 deletions

View File

@ -33,8 +33,9 @@ newline. Usually patterns will be expressed in Python code using this raw
string notation. string notation.
It is important to note that most regular expression operations are available as It is important to note that most regular expression operations are available as
module-level functions and :class:`RegexObject` methods. The functions are module-level functions and methods on
shortcuts that don't require you to compile a regex object first, but miss some :ref:`compiled regular expressions <re-objects>`. The functions are shortcuts
that don't require you to compile a regex object first, but miss some
fine-tuning parameters. fine-tuning parameters.
.. seealso:: .. seealso::
@ -546,21 +547,21 @@ form.
.. function:: search(pattern, string[, flags]) .. function:: search(pattern, string, flags=0)
Scan through *string* looking for a location where the regular expression Scan through *string* looking for a location where the regular expression
*pattern* produces a match, and return a corresponding :class:`MatchObject` *pattern* produces a match, and return a corresponding :ref:`match object
instance. Return ``None`` if no position in the string matches the pattern; note <match-objects>`. Return ``None`` if no position in the string matches the
that this is different from finding a zero-length match at some point in the pattern; note that this is different from finding a zero-length match at some
string. point in the string.
.. function:: match(pattern, string, flags=0) .. function:: match(pattern, string, flags=0)
If zero or more characters at the beginning of *string* match the regular If zero or more characters at the beginning of *string* match the regular
expression *pattern*, return a corresponding :class:`MatchObject` instance. expression *pattern*, return a corresponding :ref:`match object
Return ``None`` if the string does not match the pattern; note that this is <match-objects>`. Return ``None`` if the string does not match the pattern;
different from a zero-length match. note that this is different from a zero-length match.
.. note:: .. note::
@ -620,9 +621,9 @@ form.
.. function:: finditer(pattern, string, flags=0) .. function:: finditer(pattern, string, flags=0)
Return an :term:`iterator` yielding :class:`MatchObject` instances over all Return an :term:`iterator` yielding :ref:`match objects <match-objects>` over
non-overlapping matches for the RE *pattern* in *string*. The *string* is all non-overlapping matches for the RE *pattern* in *string*. The *string*
scanned left-to-right, and matches are returned in the order found. Empty is scanned left-to-right, and matches are returned in the order found. Empty
matches are included in the result unless they touch the beginning of another matches are included in the result unless they touch the beginning of another
match. match.
@ -710,107 +711,107 @@ form.
Regular Expression Objects Regular Expression Objects
-------------------------- --------------------------
.. class:: RegexObject Compiled regular expression objects support the following methods and
attributes.
The :class:`RegexObject` class supports the following methods and attributes: .. method:: regex.search(string[, pos[, endpos]])
.. method:: RegexObject.search(string[, pos[, endpos]]) Scan through *string* looking for a location where this regular expression
produces a match, and return a corresponding :ref:`match object
<match-objects>`. Return ``None`` if no position in the string matches the
pattern; note that this is different from finding a zero-length match at some
point in the string.
Scan through *string* looking for a location where this regular expression The optional second parameter *pos* gives an index in the string where the
produces a match, and return a corresponding :class:`MatchObject` instance. search is to start; it defaults to ``0``. This is not completely equivalent to
Return ``None`` if no position in the string matches the pattern; note that this slicing the string; the ``'^'`` pattern character matches at the real beginning
is different from finding a zero-length match at some point in the string. of the string and at positions just after a newline, but not necessarily at the
index where the search is to start.
The optional second parameter *pos* gives an index in the string where the The optional parameter *endpos* limits how far the string will be searched; it
search is to start; it defaults to ``0``. This is not completely equivalent to will be as if the string is *endpos* characters long, so only the characters
slicing the string; the ``'^'`` pattern character matches at the real beginning from *pos* to ``endpos - 1`` will be searched for a match. If *endpos* is less
of the string and at positions just after a newline, but not necessarily at the than *pos*, no match will be found, otherwise, if *rx* is a compiled regular
index where the search is to start. expression object, ``rx.search(string, 0, 50)`` is equivalent to
``rx.search(string[:50], 0)``.
The optional parameter *endpos* limits how far the string will be searched; it >>> pattern = re.compile("d")
will be as if the string is *endpos* characters long, so only the characters >>> pattern.search("dog") # Match at index 0
from *pos* to ``endpos - 1`` will be searched for a match. If *endpos* is less <_sre.SRE_Match object at ...>
than *pos*, no match will be found, otherwise, if *rx* is a compiled regular >>> pattern.search("dog", 1) # No match; search doesn't include the "d"
expression object, ``rx.search(string, 0, 50)`` is equivalent to
``rx.search(string[:50], 0)``.
>>> pattern = re.compile("d")
>>> pattern.search("dog") # Match at index 0
<_sre.SRE_Match object at ...>
>>> pattern.search("dog", 1) # No match; search doesn't include the "d"
.. method:: RegexObject.match(string[, pos[, endpos]]) .. method:: regex.match(string[, pos[, endpos]])
If zero or more characters at the *beginning* of *string* match this regular If zero or more characters at the *beginning* of *string* match this regular
expression, return a corresponding :class:`MatchObject` instance. Return expression, return a corresponding :ref:`match object <match-objects>`.
``None`` if the string does not match the pattern; note that this is different Return ``None`` if the string does not match the pattern; note that this is
from a zero-length match. different from a zero-length match.
The optional *pos* and *endpos* parameters have the same meaning as for the The optional *pos* and *endpos* parameters have the same meaning as for the
:meth:`~RegexObject.search` method. :meth:`~regex.search` method.
.. note:: .. note::
If you want to locate a match anywhere in *string*, use If you want to locate a match anywhere in *string*, use
:meth:`~RegexObject.search` instead. :meth:`~regex.search` instead.
>>> pattern = re.compile("o") >>> pattern = re.compile("o")
>>> pattern.match("dog") # No match as "o" is not at the start of "dog". >>> pattern.match("dog") # No match as "o" is not at the start of "dog".
>>> pattern.match("dog", 1) # Match as "o" is the 2nd character of "dog". >>> pattern.match("dog", 1) # Match as "o" is the 2nd character of "dog".
<_sre.SRE_Match object at ...> <_sre.SRE_Match object at ...>
.. method:: RegexObject.split(string[, maxsplit=0]) .. method:: regex.split(string, maxsplit=0)
Identical to the :func:`split` function, using the compiled pattern. Identical to the :func:`split` function, using the compiled pattern.
.. method:: RegexObject.findall(string[, pos[, endpos]]) .. method:: regex.findall(string[, pos[, endpos]])
Similar to the :func:`findall` function, using the compiled pattern, but Similar to the :func:`findall` function, using the compiled pattern, but
also accepts optional *pos* and *endpos* parameters that limit the search also accepts optional *pos* and *endpos* parameters that limit the search
region like for :meth:`match`. region like for :meth:`match`.
.. method:: RegexObject.finditer(string[, pos[, endpos]]) .. method:: regex.finditer(string[, pos[, endpos]])
Similar to the :func:`finditer` function, using the compiled pattern, but Similar to the :func:`finditer` function, using the compiled pattern, but
also accepts optional *pos* and *endpos* parameters that limit the search also accepts optional *pos* and *endpos* parameters that limit the search
region like for :meth:`match`. region like for :meth:`match`.
.. method:: RegexObject.sub(repl, string[, count=0]) .. method:: regex.sub(repl, string, count=0)
Identical to the :func:`sub` function, using the compiled pattern. Identical to the :func:`sub` function, using the compiled pattern.
.. method:: RegexObject.subn(repl, string[, count=0]) .. method:: regex.subn(repl, string, count=0)
Identical to the :func:`subn` function, using the compiled pattern. Identical to the :func:`subn` function, using the compiled pattern.
.. attribute:: RegexObject.flags .. attribute:: regex.flags
The flags argument used when the RE object was compiled, or ``0`` if no flags The flags argument used when the RE object was compiled, or ``0`` if no flags
were provided. were provided.
.. attribute:: RegexObject.groups .. attribute:: regex.groups
The number of capturing groups in the pattern. The number of capturing groups in the pattern.
.. attribute:: RegexObject.groupindex .. attribute:: regex.groupindex
A dictionary mapping any symbolic group names defined by ``(?P<id>)`` to group A dictionary mapping any symbolic group names defined by ``(?P<id>)`` to group
numbers. The dictionary is empty if no symbolic groups were used in the numbers. The dictionary is empty if no symbolic groups were used in the
pattern. pattern.
.. attribute:: RegexObject.pattern .. attribute:: regex.pattern
The pattern string from which the RE object was compiled. The pattern string from which the RE object was compiled.
.. _match-objects: .. _match-objects:
@ -818,178 +819,176 @@ Regular Expression Objects
Match Objects Match Objects
------------- -------------
.. class:: MatchObject Match objects always have a boolean value of :const:`True`, so that you can test
whether e.g. :func:`match` resulted in a match with a simple if statement. They
Match Objects always have a boolean value of :const:`True`, so that you can test support the following methods and attributes:
whether e.g. :func:`match` resulted in a match with a simple if statement. They
support the following methods and attributes:
.. method:: MatchObject.expand(template) .. method:: match.expand(template)
Return the string obtained by doing backslash substitution on the template Return the string obtained by doing backslash substitution on the template
string *template*, as done by the :meth:`~RegexObject.sub` method. Escapes string *template*, as done by the :meth:`~regex.sub` method.
such as ``\n`` are converted to the appropriate characters, and numeric Escapes such as ``\n`` are converted to the appropriate characters,
backreferences (``\1``, ``\2``) and named backreferences (``\g<1>``, and numeric backreferences (``\1``, ``\2``) and named backreferences
``\g<name>``) are replaced by the contents of the corresponding group. (``\g<1>``, ``\g<name>``) are replaced by the contents of the
corresponding group.
.. method:: MatchObject.group([group1, ...]) .. method:: match.group([group1, ...])
Returns one or more subgroups of the match. If there is a single argument, the Returns one or more subgroups of the match. If there is a single argument, the
result is a single string; if there are multiple arguments, the result is a result is a single string; if there are multiple arguments, the result is a
tuple with one item per argument. Without arguments, *group1* defaults to zero tuple with one item per argument. Without arguments, *group1* defaults to zero
(the whole match is returned). If a *groupN* argument is zero, the corresponding (the whole match is returned). If a *groupN* argument is zero, the corresponding
return value is the entire matching string; if it is in the inclusive range return value is the entire matching string; if it is in the inclusive range
[1..99], it is the string matching the corresponding parenthesized group. If a [1..99], it is the string matching the corresponding parenthesized group. If a
group number is negative or larger than the number of groups defined in the group number is negative or larger than the number of groups defined in the
pattern, an :exc:`IndexError` exception is raised. If a group is contained in a pattern, an :exc:`IndexError` exception is raised. If a group is contained in a
part of the pattern that did not match, the corresponding result is ``None``. part of the pattern that did not match, the corresponding result is ``None``.
If a group is contained in a part of the pattern that matched multiple times, If a group is contained in a part of the pattern that matched multiple times,
the last match is returned. the last match is returned.
>>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist") >>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
>>> m.group(0) # The entire match >>> m.group(0) # The entire match
'Isaac Newton' 'Isaac Newton'
>>> m.group(1) # The first parenthesized subgroup. >>> m.group(1) # The first parenthesized subgroup.
'Isaac' 'Isaac'
>>> m.group(2) # The second parenthesized subgroup. >>> m.group(2) # The second parenthesized subgroup.
'Newton' 'Newton'
>>> m.group(1, 2) # Multiple arguments give us a tuple. >>> m.group(1, 2) # Multiple arguments give us a tuple.
('Isaac', 'Newton') ('Isaac', 'Newton')
If the regular expression uses the ``(?P<name>...)`` syntax, the *groupN* If the regular expression uses the ``(?P<name>...)`` syntax, the *groupN*
arguments may also be strings identifying groups by their group name. If a arguments may also be strings identifying groups by their group name. If a
string argument is not used as a group name in the pattern, an :exc:`IndexError` string argument is not used as a group name in the pattern, an :exc:`IndexError`
exception is raised. exception is raised.
A moderately complicated example: A moderately complicated example:
>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds") >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
>>> m.group('first_name') >>> m.group('first_name')
'Malcolm' 'Malcolm'
>>> m.group('last_name') >>> m.group('last_name')
'Reynolds' 'Reynolds'
Named groups can also be referred to by their index: Named groups can also be referred to by their index:
>>> m.group(1) >>> m.group(1)
'Malcolm' 'Malcolm'
>>> m.group(2) >>> m.group(2)
'Reynolds' 'Reynolds'
If a group matches multiple times, only the last match is accessible: If a group matches multiple times, only the last match is accessible:
>>> m = re.match(r"(..)+", "a1b2c3") # Matches 3 times. >>> m = re.match(r"(..)+", "a1b2c3") # Matches 3 times.
>>> m.group(1) # Returns only the last match. >>> m.group(1) # Returns only the last match.
'c3' 'c3'
.. method:: MatchObject.groups(default=None) .. method:: match.groups(default=None)
Return a tuple containing all the subgroups of the match, from 1 up to however Return a tuple containing all the subgroups of the match, from 1 up to however
many groups are in the pattern. The *default* argument is used for groups that many groups are in the pattern. The *default* argument is used for groups that
did not participate in the match; it defaults to ``None``. did not participate in the match; it defaults to ``None``.
For example: For example:
>>> m = re.match(r"(\d+)\.(\d+)", "24.1632") >>> m = re.match(r"(\d+)\.(\d+)", "24.1632")
>>> m.groups() >>> m.groups()
('24', '1632') ('24', '1632')
If we make the decimal place and everything after it optional, not all groups If we make the decimal place and everything after it optional, not all groups
might participate in the match. These groups will default to ``None`` unless might participate in the match. These groups will default to ``None`` unless
the *default* argument is given: the *default* argument is given:
>>> m = re.match(r"(\d+)\.?(\d+)?", "24") >>> m = re.match(r"(\d+)\.?(\d+)?", "24")
>>> m.groups() # Second group defaults to None. >>> m.groups() # Second group defaults to None.
('24', None) ('24', None)
>>> m.groups('0') # Now, the second group defaults to '0'. >>> m.groups('0') # Now, the second group defaults to '0'.
('24', '0') ('24', '0')
.. method:: MatchObject.groupdict([default]) .. method:: match.groupdict(default=None)
Return a dictionary containing all the *named* subgroups of the match, keyed by Return a dictionary containing all the *named* subgroups of the match, keyed by
the subgroup name. The *default* argument is used for groups that did not the subgroup name. The *default* argument is used for groups that did not
participate in the match; it defaults to ``None``. For example: participate in the match; it defaults to ``None``. For example:
>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds") >>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
>>> m.groupdict() >>> m.groupdict()
{'first_name': 'Malcolm', 'last_name': 'Reynolds'} {'first_name': 'Malcolm', 'last_name': 'Reynolds'}
.. method:: MatchObject.start([group]) .. method:: match.start([group])
MatchObject.end([group]) match.end([group])
Return the indices of the start and end of the substring matched by *group*; Return the indices of the start and end of the substring matched by *group*;
*group* defaults to zero (meaning the whole matched substring). Return ``-1`` if *group* defaults to zero (meaning the whole matched substring). Return ``-1`` if
*group* exists but did not contribute to the match. For a match object *m*, and *group* exists but did not contribute to the match. For a match object *m*, and
a group *g* that did contribute to the match, the substring matched by group *g* a group *g* that did contribute to the match, the substring matched by group *g*
(equivalent to ``m.group(g)``) is :: (equivalent to ``m.group(g)``) is ::
m.string[m.start(g):m.end(g)] m.string[m.start(g):m.end(g)]
Note that ``m.start(group)`` will equal ``m.end(group)`` if *group* matched a Note that ``m.start(group)`` will equal ``m.end(group)`` if *group* matched a
null string. For example, after ``m = re.search('b(c?)', 'cba')``, null string. For example, after ``m = re.search('b(c?)', 'cba')``,
``m.start(0)`` is 1, ``m.end(0)`` is 2, ``m.start(1)`` and ``m.end(1)`` are both ``m.start(0)`` is 1, ``m.end(0)`` is 2, ``m.start(1)`` and ``m.end(1)`` are both
2, and ``m.start(2)`` raises an :exc:`IndexError` exception. 2, and ``m.start(2)`` raises an :exc:`IndexError` exception.
An example that will remove *remove_this* from email addresses: An example that will remove *remove_this* from email addresses:
>>> email = "tony@tiremove_thisger.net" >>> email = "tony@tiremove_thisger.net"
>>> m = re.search("remove_this", email) >>> m = re.search("remove_this", email)
>>> email[:m.start()] + email[m.end():] >>> email[:m.start()] + email[m.end():]
'tony@tiger.net' 'tony@tiger.net'
.. method:: MatchObject.span([group]) .. method:: match.span([group])
For :class:`MatchObject` *m*, return the 2-tuple ``(m.start(group), For a match *m*, return the 2-tuple ``(m.start(group), m.end(group))``. Note
m.end(group))``. Note that if *group* did not contribute to the match, this is that if *group* did not contribute to the match, this is ``(-1, -1)``.
``(-1, -1)``. *group* defaults to zero, the entire match. *group* defaults to zero, the entire match.
.. attribute:: MatchObject.pos .. attribute:: match.pos
The value of *pos* which was passed to the :meth:`~RegexObject.search` or The value of *pos* which was passed to the :meth:`~regex.search` or
:meth:`~RegexObject.match` method of the :class:`RegexObject`. This is the :meth:`~regex.match` method of a :ref:`match object <match-objects>`. This
index into the string at which the RE engine started looking for a match. is the index into the string at which the RE engine started looking for a
match.
.. attribute:: MatchObject.endpos .. attribute:: match.endpos
The value of *endpos* which was passed to the :meth:`~RegexObject.search` or The value of *endpos* which was passed to the :meth:`~regex.search` or
:meth:`~RegexObject.match` method of the :class:`RegexObject`. This is the :meth:`~regex.match` method of a :ref:`match object <match-objects>`. This
index into the string beyond which the RE engine will not go. is the index into the string beyond which the RE engine will not go.
.. attribute:: MatchObject.lastindex .. attribute:: match.lastindex
The integer index of the last matched capturing group, or ``None`` if no group The integer index of the last matched capturing group, or ``None`` if no group
was matched at all. For example, the expressions ``(a)b``, ``((a)(b))``, and was matched at all. For example, the expressions ``(a)b``, ``((a)(b))``, and
``((ab))`` will have ``lastindex == 1`` if applied to the string ``'ab'``, while ``((ab))`` will have ``lastindex == 1`` if applied to the string ``'ab'``, while
the expression ``(a)(b)`` will have ``lastindex == 2``, if applied to the same the expression ``(a)(b)`` will have ``lastindex == 2``, if applied to the same
string. string.
.. attribute:: MatchObject.lastgroup .. attribute:: match.lastgroup
The name of the last matched capturing group, or ``None`` if the group didn't The name of the last matched capturing group, or ``None`` if the group didn't
have a name, or if no group was matched at all. have a name, or if no group was matched at all.
.. attribute:: MatchObject.re .. attribute:: match.re
The regular expression object whose :meth:`~RegexObject.match` or The regular expression object whose :meth:`~regex.match` or
:meth:`~RegexObject.search` method produced this :class:`MatchObject` :meth:`~regex.search` method produced this match instance.
instance.
.. attribute:: MatchObject.string .. attribute:: match.string
The string passed to :meth:`~RegexObject.match` or The string passed to :meth:`~regex.match` or :meth:`~regex.search`.
:meth:`~RegexObject.search`.
Examples Examples
@ -1035,8 +1034,7 @@ To match this with a regular expression, one could use backreferences as such:
"<Match: '354aa', groups=('a',)>" "<Match: '354aa', groups=('a',)>"
To find out what card the pair consists of, one could use the To find out what card the pair consists of, one could use the
:meth:`~MatchObject.group` method of :class:`MatchObject` in the following :meth:`~match.group` method of the match object in the following manner:
manner:
.. doctest:: .. doctest::
@ -1250,10 +1248,10 @@ Finding all Adverbs and their Positions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If one wants more information about all matches of a pattern than the matched If one wants more information about all matches of a pattern than the matched
text, :func:`finditer` is useful as it provides instances of text, :func:`finditer` is useful as it provides :ref:`match objects
:class:`MatchObject` instead of strings. Continuing with the previous example, <match-objects>` instead of strings. Continuing with the previous example, if
if one was a writer who wanted to find all of the adverbs *and their positions* one was a writer who wanted to find all of the adverbs *and their positions* in
in some text, he or she would use :func:`finditer` in the following manner: some text, he or she would use :func:`finditer` in the following manner:
>>> text = "He was carefully disguised but captured quickly by police." >>> text = "He was carefully disguised but captured quickly by police."
>>> for m in re.finditer(r"\w+ly", text): >>> for m in re.finditer(r"\w+ly", text):