Issue #10875: Update Regular Expression HOWTO; patch by 'SilentGhost'.
This commit is contained in:
parent
eb70b90097
commit
aacfd4a0cf
|
@ -5,7 +5,6 @@
|
|||
****************************
|
||||
|
||||
:Author: A.M. Kuchling <amk@amk.ca>
|
||||
:Release: 0.05
|
||||
|
||||
.. TODO:
|
||||
Document lookbehind assertions
|
||||
|
@ -24,11 +23,6 @@
|
|||
Introduction
|
||||
============
|
||||
|
||||
The :mod:`re` module was added in Python 1.5, and provides Perl-style regular
|
||||
expression patterns. Earlier versions of Python came with the :mod:`regex`
|
||||
module, which provided Emacs-style patterns. The :mod:`regex` module was
|
||||
removed completely in Python 2.5.
|
||||
|
||||
Regular expressions (called REs, or regexes, or regex patterns) are essentially
|
||||
a tiny, highly specialized programming language embedded inside Python and made
|
||||
available through the :mod:`re` module. Using this little language, you specify
|
||||
|
@ -264,7 +258,7 @@ performing string substitutions. ::
|
|||
>>> import re
|
||||
>>> p = re.compile('ab*')
|
||||
>>> p
|
||||
<_sre.SRE_Pattern object at 80b4150>
|
||||
<_sre.SRE_Pattern object at 0x...>
|
||||
|
||||
:func:`re.compile` also accepts an optional *flags* argument, used to enable
|
||||
various special features and syntax variations. We'll go over the available
|
||||
|
@ -362,8 +356,8 @@ information about the match: where it starts and ends, the substring it matched,
|
|||
and more.
|
||||
|
||||
You can learn about this by interactively experimenting with the :mod:`re`
|
||||
module. If you have Tkinter available, you may also want to look at
|
||||
:file:`Tools/scripts/redemo.py`, a demonstration program included with the
|
||||
module. If you have :mod:`tkinter` available, you may also want to look at
|
||||
:file:`Tools/demo/redemo.py`, a demonstration program included with the
|
||||
Python distribution. It allows you to enter REs and strings, and displays
|
||||
whether the RE matches or fails. :file:`redemo.py` can be quite useful when
|
||||
trying to debug a complicated RE. Phil Schwartz's `Kodos
|
||||
|
@ -373,11 +367,10 @@ testing RE patterns.
|
|||
This HOWTO uses the standard Python interpreter for its examples. First, run the
|
||||
Python interpreter, import the :mod:`re` module, and compile a RE::
|
||||
|
||||
Python 2.2.2 (#1, Feb 10 2003, 12:57:01)
|
||||
>>> import re
|
||||
>>> p = re.compile('[a-z]+')
|
||||
>>> p
|
||||
<_sre.SRE_Pattern object at 80c3c28>
|
||||
<_sre.SRE_Pattern object at 0x...>
|
||||
|
||||
Now, you can try matching various strings against the RE ``[a-z]+``. An empty
|
||||
string shouldn't match at all, since ``+`` means 'one or more repetitions'.
|
||||
|
@ -395,7 +388,7 @@ result in a variable for later use. ::
|
|||
|
||||
>>> m = p.match('tempo')
|
||||
>>> m
|
||||
<_sre.SRE_Match object at 80c4f68>
|
||||
<_sre.SRE_Match object at 0x...>
|
||||
|
||||
Now you can query the :class:`MatchObject` for information about the matching
|
||||
string. :class:`MatchObject` instances also have several methods and
|
||||
|
@ -434,7 +427,7 @@ case. ::
|
|||
>>> print(p.match('::: message'))
|
||||
None
|
||||
>>> m = p.search('::: message') ; print(m)
|
||||
<re.MatchObject instance at 80c9650>
|
||||
<_sre.SRE_Match object at 0x...>
|
||||
>>> m.group()
|
||||
'message'
|
||||
>>> m.span()
|
||||
|
@ -459,11 +452,11 @@ Two pattern methods return all of the matches for a pattern.
|
|||
|
||||
:meth:`findall` has to create the entire list before it can be returned as the
|
||||
result. The :meth:`finditer` method returns a sequence of :class:`MatchObject`
|
||||
instances as an :term:`iterator`. [#]_ ::
|
||||
instances as an :term:`iterator`::
|
||||
|
||||
>>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
|
||||
>>> iterator
|
||||
<callable-iterator object at 0x401833ac>
|
||||
<callable_iterator object at 0x...>
|
||||
>>> for match in iterator:
|
||||
... print(match.span())
|
||||
...
|
||||
|
@ -485,7 +478,7 @@ the RE string added as the first argument, and still return either ``None`` or a
|
|||
>>> print(re.match(r'From\s+', 'Fromage amk'))
|
||||
None
|
||||
>>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998')
|
||||
<re.MatchObject instance at 80c5978>
|
||||
<_sre.SRE_Match object at 0x...>
|
||||
|
||||
Under the hood, these functions simply create a pattern object for you
|
||||
and call the appropriate method on it. They also store the compiled object in a
|
||||
|
@ -687,7 +680,7 @@ given location, they can obviously be matched an infinite number of times.
|
|||
line, the RE to use is ``^From``. ::
|
||||
|
||||
>>> print(re.search('^From', 'From Here to Eternity'))
|
||||
<re.MatchObject instance at 80c1520>
|
||||
<_sre.SRE_Match object at 0x...>
|
||||
>>> print(re.search('^From', 'Reciting From Memory'))
|
||||
None
|
||||
|
||||
|
@ -699,11 +692,11 @@ given location, they can obviously be matched an infinite number of times.
|
|||
or any location followed by a newline character. ::
|
||||
|
||||
>>> print(re.search('}$', '{block}'))
|
||||
<re.MatchObject instance at 80adfa8>
|
||||
<_sre.SRE_Match object at 0x...>
|
||||
>>> print(re.search('}$', '{block} '))
|
||||
None
|
||||
>>> print(re.search('}$', '{block}\n'))
|
||||
<re.MatchObject instance at 80adfa8>
|
||||
<_sre.SRE_Match object at 0x...>
|
||||
|
||||
To match a literal ``'$'``, use ``\$`` or enclose it inside a character class,
|
||||
as in ``[$]``.
|
||||
|
@ -728,7 +721,7 @@ given location, they can obviously be matched an infinite number of times.
|
|||
|
||||
>>> p = re.compile(r'\bclass\b')
|
||||
>>> print(p.search('no class at all'))
|
||||
<re.MatchObject instance at 80c8f28>
|
||||
<_sre.SRE_Match object at 0x...>
|
||||
>>> print(p.search('the declassified algorithm'))
|
||||
None
|
||||
>>> print(p.search('one subclass is'))
|
||||
|
@ -746,7 +739,7 @@ given location, they can obviously be matched an infinite number of times.
|
|||
>>> print(p.search('no class at all'))
|
||||
None
|
||||
>>> print(p.search('\b' + 'class' + '\b') )
|
||||
<re.MatchObject instance at 80c3ee0>
|
||||
<_sre.SRE_Match object at 0x...>
|
||||
|
||||
Second, inside a character class, where there's no use for this assertion,
|
||||
``\b`` represents the backspace character, for compatibility with Python's
|
||||
|
@ -1316,8 +1309,8 @@ a regular expression that handles all of the possible cases, the patterns will
|
|||
be *very* complicated. Use an HTML or XML parser module for such tasks.)
|
||||
|
||||
|
||||
Not Using re.VERBOSE
|
||||
--------------------
|
||||
Using re.VERBOSE
|
||||
----------------
|
||||
|
||||
By now you've probably noticed that regular expressions are a very compact
|
||||
notation, but they're not terribly readable. REs of moderate complexity can
|
||||
|
@ -1366,8 +1359,3 @@ reference for programming in Python. (The first edition covered Python's
|
|||
now-removed :mod:`regex` module, which won't help you much.) Consider checking
|
||||
it out from your library.
|
||||
|
||||
|
||||
.. rubric:: Footnotes
|
||||
|
||||
.. [#] Introduced in Python 2.2.2.
|
||||
|
||||
|
|
Loading…
Reference in New Issue