Issue #10875: Update Regular Expression HOWTO; patch by 'SilentGhost'.

This commit is contained in:
Terry Reedy 2011-01-10 21:27:49 +00:00
parent eb70b90097
commit aacfd4a0cf
2 changed files with 18 additions and 28 deletions

View File

@ -5,7 +5,6 @@
**************************** ****************************
:Author: A.M. Kuchling <amk@amk.ca> :Author: A.M. Kuchling <amk@amk.ca>
:Release: 0.05
.. TODO: .. TODO:
Document lookbehind assertions Document lookbehind assertions
@ -24,11 +23,6 @@
Introduction Introduction
============ ============
The :mod:`re` module was added in Python 1.5, and provides Perl-style regular
expression patterns. Earlier versions of Python came with the :mod:`regex`
module, which provided Emacs-style patterns. The :mod:`regex` module was
removed completely in Python 2.5.
Regular expressions (called REs, or regexes, or regex patterns) are essentially Regular expressions (called REs, or regexes, or regex patterns) are essentially
a tiny, highly specialized programming language embedded inside Python and made a tiny, highly specialized programming language embedded inside Python and made
available through the :mod:`re` module. Using this little language, you specify available through the :mod:`re` module. Using this little language, you specify
@ -264,7 +258,7 @@ performing string substitutions. ::
>>> import re >>> import re
>>> p = re.compile('ab*') >>> p = re.compile('ab*')
>>> p >>> p
<_sre.SRE_Pattern object at 80b4150> <_sre.SRE_Pattern object at 0x...>
:func:`re.compile` also accepts an optional *flags* argument, used to enable :func:`re.compile` also accepts an optional *flags* argument, used to enable
various special features and syntax variations. We'll go over the available various special features and syntax variations. We'll go over the available
@ -362,8 +356,8 @@ information about the match: where it starts and ends, the substring it matched,
and more. and more.
You can learn about this by interactively experimenting with the :mod:`re` You can learn about this by interactively experimenting with the :mod:`re`
module. If you have Tkinter available, you may also want to look at module. If you have :mod:`tkinter` available, you may also want to look at
:file:`Tools/scripts/redemo.py`, a demonstration program included with the :file:`Tools/demo/redemo.py`, a demonstration program included with the
Python distribution. It allows you to enter REs and strings, and displays Python distribution. It allows you to enter REs and strings, and displays
whether the RE matches or fails. :file:`redemo.py` can be quite useful when whether the RE matches or fails. :file:`redemo.py` can be quite useful when
trying to debug a complicated RE. Phil Schwartz's `Kodos trying to debug a complicated RE. Phil Schwartz's `Kodos
@ -373,11 +367,10 @@ testing RE patterns.
This HOWTO uses the standard Python interpreter for its examples. First, run the This HOWTO uses the standard Python interpreter for its examples. First, run the
Python interpreter, import the :mod:`re` module, and compile a RE:: Python interpreter, import the :mod:`re` module, and compile a RE::
Python 2.2.2 (#1, Feb 10 2003, 12:57:01)
>>> import re >>> import re
>>> p = re.compile('[a-z]+') >>> p = re.compile('[a-z]+')
>>> p >>> p
<_sre.SRE_Pattern object at 80c3c28> <_sre.SRE_Pattern object at 0x...>
Now, you can try matching various strings against the RE ``[a-z]+``. An empty Now, you can try matching various strings against the RE ``[a-z]+``. An empty
string shouldn't match at all, since ``+`` means 'one or more repetitions'. string shouldn't match at all, since ``+`` means 'one or more repetitions'.
@ -395,7 +388,7 @@ result in a variable for later use. ::
>>> m = p.match('tempo') >>> m = p.match('tempo')
>>> m >>> m
<_sre.SRE_Match object at 80c4f68> <_sre.SRE_Match object at 0x...>
Now you can query the :class:`MatchObject` for information about the matching Now you can query the :class:`MatchObject` for information about the matching
string. :class:`MatchObject` instances also have several methods and string. :class:`MatchObject` instances also have several methods and
@ -434,7 +427,7 @@ case. ::
>>> print(p.match('::: message')) >>> print(p.match('::: message'))
None None
>>> m = p.search('::: message') ; print(m) >>> m = p.search('::: message') ; print(m)
<re.MatchObject instance at 80c9650> <_sre.SRE_Match object at 0x...>
>>> m.group() >>> m.group()
'message' 'message'
>>> m.span() >>> m.span()
@ -459,11 +452,11 @@ Two pattern methods return all of the matches for a pattern.
:meth:`findall` has to create the entire list before it can be returned as the :meth:`findall` has to create the entire list before it can be returned as the
result. The :meth:`finditer` method returns a sequence of :class:`MatchObject` result. The :meth:`finditer` method returns a sequence of :class:`MatchObject`
instances as an :term:`iterator`. [#]_ :: instances as an :term:`iterator`::
>>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...') >>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
>>> iterator >>> iterator
<callable-iterator object at 0x401833ac> <callable_iterator object at 0x...>
>>> for match in iterator: >>> for match in iterator:
... print(match.span()) ... print(match.span())
... ...
@ -485,7 +478,7 @@ the RE string added as the first argument, and still return either ``None`` or a
>>> print(re.match(r'From\s+', 'Fromage amk')) >>> print(re.match(r'From\s+', 'Fromage amk'))
None None
>>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998') >>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998')
<re.MatchObject instance at 80c5978> <_sre.SRE_Match object at 0x...>
Under the hood, these functions simply create a pattern object for you Under the hood, these functions simply create a pattern object for you
and call the appropriate method on it. They also store the compiled object in a and call the appropriate method on it. They also store the compiled object in a
@ -687,7 +680,7 @@ given location, they can obviously be matched an infinite number of times.
line, the RE to use is ``^From``. :: line, the RE to use is ``^From``. ::
>>> print(re.search('^From', 'From Here to Eternity')) >>> print(re.search('^From', 'From Here to Eternity'))
<re.MatchObject instance at 80c1520> <_sre.SRE_Match object at 0x...>
>>> print(re.search('^From', 'Reciting From Memory')) >>> print(re.search('^From', 'Reciting From Memory'))
None None
@ -699,11 +692,11 @@ given location, they can obviously be matched an infinite number of times.
or any location followed by a newline character. :: or any location followed by a newline character. ::
>>> print(re.search('}$', '{block}')) >>> print(re.search('}$', '{block}'))
<re.MatchObject instance at 80adfa8> <_sre.SRE_Match object at 0x...>
>>> print(re.search('}$', '{block} ')) >>> print(re.search('}$', '{block} '))
None None
>>> print(re.search('}$', '{block}\n')) >>> print(re.search('}$', '{block}\n'))
<re.MatchObject instance at 80adfa8> <_sre.SRE_Match object at 0x...>
To match a literal ``'$'``, use ``\$`` or enclose it inside a character class, To match a literal ``'$'``, use ``\$`` or enclose it inside a character class,
as in ``[$]``. as in ``[$]``.
@ -728,7 +721,7 @@ given location, they can obviously be matched an infinite number of times.
>>> p = re.compile(r'\bclass\b') >>> p = re.compile(r'\bclass\b')
>>> print(p.search('no class at all')) >>> print(p.search('no class at all'))
<re.MatchObject instance at 80c8f28> <_sre.SRE_Match object at 0x...>
>>> print(p.search('the declassified algorithm')) >>> print(p.search('the declassified algorithm'))
None None
>>> print(p.search('one subclass is')) >>> print(p.search('one subclass is'))
@ -746,7 +739,7 @@ given location, they can obviously be matched an infinite number of times.
>>> print(p.search('no class at all')) >>> print(p.search('no class at all'))
None None
>>> print(p.search('\b' + 'class' + '\b') ) >>> print(p.search('\b' + 'class' + '\b') )
<re.MatchObject instance at 80c3ee0> <_sre.SRE_Match object at 0x...>
Second, inside a character class, where there's no use for this assertion, Second, inside a character class, where there's no use for this assertion,
``\b`` represents the backspace character, for compatibility with Python's ``\b`` represents the backspace character, for compatibility with Python's
@ -1316,8 +1309,8 @@ a regular expression that handles all of the possible cases, the patterns will
be *very* complicated. Use an HTML or XML parser module for such tasks.) be *very* complicated. Use an HTML or XML parser module for such tasks.)
Not Using re.VERBOSE Using re.VERBOSE
-------------------- ----------------
By now you've probably noticed that regular expressions are a very compact By now you've probably noticed that regular expressions are a very compact
notation, but they're not terribly readable. REs of moderate complexity can notation, but they're not terribly readable. REs of moderate complexity can
@ -1366,8 +1359,3 @@ reference for programming in Python. (The first edition covered Python's
now-removed :mod:`regex` module, which won't help you much.) Consider checking now-removed :mod:`regex` module, which won't help you much.) Consider checking
it out from your library. it out from your library.
.. rubric:: Footnotes
.. [#] Introduced in Python 2.2.2.

View File

@ -40,6 +40,8 @@ Core and Builtins
Library Library
------- -------
- Issue #10875: Update Regular Expression HOWTO; patch by 'SilentGhost'.
- Issue #10872: The repr() of TextIOWrapper objects now includes the mode - Issue #10872: The repr() of TextIOWrapper objects now includes the mode
if available. if available.