Merged revisions 53538-53622 via svnmerge from

svn+ssh://pythondev@svn.python.org/python/trunk

........
  r53545 | andrew.kuchling | 2007-01-24 21:06:41 +0100 (Wed, 24 Jan 2007) | 1 line

  Strengthen warning about using lock()
........
  r53556 | thomas.heller | 2007-01-25 19:34:14 +0100 (Thu, 25 Jan 2007) | 3 lines

  Fix for #1643874: When calling SysAllocString, create a PyCObject
  which will eventually call SysFreeString to free the BSTR resource.
........
  r53563 | andrew.kuchling | 2007-01-25 21:02:13 +0100 (Thu, 25 Jan 2007) | 1 line

  Add item
........
  r53564 | brett.cannon | 2007-01-25 21:22:02 +0100 (Thu, 25 Jan 2007) | 8 lines

  Fix time.strptime's %U support.  Basically rewrote the algorithm to be more
  generic so that one only has to shift certain values based on whether the week
  was specified to start on Monday or Sunday.  Cut out a lot of edge case code
  compared to the previous version.  Also broke algorithm out into its own
  function (that is private to the module).

  Fixes bug #1643943 (thanks Biran Nahas for the report).
........
  r53570 | brett.cannon | 2007-01-26 00:30:39 +0100 (Fri, 26 Jan 2007) | 4 lines

  Remove specific mention of my name and email address from modules.  Not really
  needed and all bug reports should go to the bug tracker, not directly to me.
  Plus I am not the only person to have edited these files at this point.
........
  r53573 | fred.drake | 2007-01-26 17:28:44 +0100 (Fri, 26 Jan 2007) | 1 line

  fix typo (extraneous ")")
........
  r53575 | georg.brandl | 2007-01-27 18:43:02 +0100 (Sat, 27 Jan 2007) | 4 lines

  Patch #1638243: the compiler package is now able to correctly compile
  a with statement; previously, executing code containing a with statement
  compiled by the compiler package crashed the interpreter.
........
  r53578 | georg.brandl | 2007-01-27 18:59:42 +0100 (Sat, 27 Jan 2007) | 3 lines

  Patch #1634778: add missing encoding aliases for iso8859_15 and
  iso8859_16.
........
  r53579 | georg.brandl | 2007-01-27 20:38:50 +0100 (Sat, 27 Jan 2007) | 2 lines

  Bug #1645944: os.access now returns bool but docstring is not updated
........
  r53590 | brett.cannon | 2007-01-28 21:58:00 +0100 (Sun, 28 Jan 2007) | 2 lines

  Use the thread lock's context manager instead of a try/finally statement.
........
  r53591 | brett.cannon | 2007-01-29 05:41:44 +0100 (Mon, 29 Jan 2007) | 2 lines

  Add a test for slicing an exception.
........
  r53594 | andrew.kuchling | 2007-01-29 21:21:43 +0100 (Mon, 29 Jan 2007) | 1 line

  Minor edits to the curses HOWTO
........
  r53596 | andrew.kuchling | 2007-01-29 21:55:40 +0100 (Mon, 29 Jan 2007) | 1 line

  Various minor edits
........
  r53597 | andrew.kuchling | 2007-01-29 22:28:48 +0100 (Mon, 29 Jan 2007) | 1 line

  More edits
........
  r53601 | tim.peters | 2007-01-30 04:03:46 +0100 (Tue, 30 Jan 2007) | 2 lines

  Whitespace normalization.
........
  r53603 | georg.brandl | 2007-01-30 21:21:30 +0100 (Tue, 30 Jan 2007) | 2 lines

  Bug #1648191: typo in docs.
........
  r53605 | brett.cannon | 2007-01-30 22:34:36 +0100 (Tue, 30 Jan 2007) | 8 lines

  No more raising of string exceptions!

  The next step of PEP 352 (for 2.6) causes raising a string exception to trigger
  a TypeError.  Trying to catch a string exception raises a DeprecationWarning.
  References to string exceptions has been removed from the docs since they are
  now just an error.
........
  r53618 | raymond.hettinger | 2007-02-01 22:02:59 +0100 (Thu, 01 Feb 2007) | 1 line

  Bug #1648179:  set.update() not recognizing __iter__ overrides in dict subclasses.
........
This commit is contained in:
Thomas Wouters 2007-02-05 01:24:16 +00:00
parent 08f00467b9
commit 9fe394c1be
46 changed files with 464 additions and 344 deletions

View File

@ -1,7 +1,7 @@
Short-term tasks:
Quick revision pass to make HOWTOs match the current state of Python:
curses doanddont regex sockets sorting
Quick revision pass to make HOWTOs match the current state of Python
doanddont regex sockets
Medium-term tasks:
Revisit the regex howto.

View File

@ -2,7 +2,7 @@
\title{Curses Programming with Python}
\release{2.01}
\release{2.02}
\author{A.M. Kuchling, Eric S. Raymond}
\authoraddress{\email{amk@amk.ca}, \email{esr@thyrsus.com}}
@ -147,10 +147,10 @@ makes using the shell difficult.
In Python you can avoid these complications and make debugging much
easier by importing the module \module{curses.wrapper}. It supplies a
function \function{wrapper} that takes a hook argument. It does the
\function{wrapper()} function that takes a callable. It does the
initializations described above, and also initializes colors if color
support is present. It then runs your hook, and then finally
deinitializes appropriately. The hook is called inside a try-catch
support is present. It then runs your provided callable and finally
deinitializes appropriately. The callable is called inside a try-catch
clause which catches exceptions, performs curses deinitialization, and
then passes the exception upwards. Thus, your terminal won't be left
in a funny state on exception.
@ -229,8 +229,8 @@ ordinary windows and support the same methods.
If you have multiple windows and pads on screen there is a more
efficient way to go, which will prevent annoying screen flicker at
refresh time. Use the methods \method{noutrefresh()} and/or
\method{noutrefresh()} of each window to update the data structure
refresh time. Use the \method{noutrefresh()} method
of each window to update the data structure
representing the desired state of the screen; then change the physical
screen to match the desired state in one go with the function
\function{doupdate()}. The normal \method{refresh()} method calls
@ -254,9 +254,9 @@ four different forms.
\begin{tableii}{|c|l|}{textrm}{Form}{Description}
\lineii{\var{str} or \var{ch}}{Display the string \var{str} or
character \var{ch}}
character \var{ch} at the current position}
\lineii{\var{str} or \var{ch}, \var{attr}}{Display the string \var{str} or
character \var{ch}, using attribute \var{attr}}
character \var{ch}, using attribute \var{attr} at the current position}
\lineii{\var{y}, \var{x}, \var{str} or \var{ch}}
{Move to position \var{y,x} within the window, and display \var{str}
or \var{ch}}
@ -271,7 +271,7 @@ in more detail in the next subsection.
The \function{addstr()} function takes a Python string as the value to
be displayed, while the \function{addch()} functions take a character,
which can be either a Python string of length 1, or an integer. If
which can be either a Python string of length 1 or an integer. If
it's a string, you're limited to displaying characters between 0 and
255. SVr4 curses provides constants for extension characters; these
constants are integers greater than 255. For example,
@ -331,15 +331,15 @@ The curses library also supports color on those terminals that
provide it, The most common such terminal is probably the Linux
console, followed by color xterms.
To use color, you must call the \function{start_color()} function
soon after calling \function{initscr()}, to initialize the default
color set (the \function{curses.wrapper.wrapper()} function does this
To use color, you must call the \function{start_color()} function soon
after calling \function{initscr()}, to initialize the default color
set (the \function{curses.wrapper.wrapper()} function does this
automatically). Once that's done, the \function{has_colors()}
function returns TRUE if the terminal in use can actually display
color. (Note from AMK: curses uses the American spelling
'color', instead of the Canadian/British spelling 'colour'. If you're
like me, you'll have to resign yourself to misspelling it for the sake
of these functions.)
color. (Note: curses uses the American spelling 'color', instead of
the Canadian/British spelling 'colour'. If you're used to the British
spelling, you'll have to resign yourself to misspelling it for the
sake of these functions.)
The curses library maintains a finite number of color pairs,
containing a foreground (or text) color and a background color. You
@ -400,18 +400,19 @@ Python's support adds a text-input widget that makes up some of the
lack.
The most common way to get input to a window is to use its
\method{getch()} method. that pauses, and waits for the user to hit
a key, displaying it if \function{echo()} has been called earlier.
You can optionally specify a coordinate to which the cursor should be
moved before pausing.
\method{getch()} method. \method{getch()} pauses and waits for the
user to hit a key, displaying it if \function{echo()} has been called
earlier. You can optionally specify a coordinate to which the cursor
should be moved before pausing.
It's possible to change this behavior with the method
\method{nodelay()}. After \method{nodelay(1)}, \method{getch()} for
the window becomes non-blocking and returns ERR (-1) when no input is
ready. There's also a \function{halfdelay()} function, which can be
used to (in effect) set a timer on each \method{getch()}; if no input
becomes available within the number of milliseconds specified as the
argument to \function{halfdelay()}, curses throws an exception.
the window becomes non-blocking and returns \code{curses.ERR} (a value
of -1) when no input is ready. There's also a \function{halfdelay()}
function, which can be used to (in effect) set a timer on each
\method{getch()}; if no input becomes available within the number of
milliseconds specified as the argument to \function{halfdelay()},
curses raises an exception.
The \method{getch()} method returns an integer; if it's between 0 and
255, it represents the ASCII code of the key pressed. Values greater

View File

@ -32,7 +32,7 @@ plain dangerous.
\subsubsection{Inside Function Definitions}
\code{from module import *} is {\em invalid} inside function definitions.
While many versions of Python do no check for the invalidity, it does not
While many versions of Python do not check for the invalidity, it does not
make it more valid, no more then having a smart lawyer makes a man innocent.
Do not use it like that ever. Even in versions where it was accepted, it made
the function execution slower, because the compiler could not be certain

View File

@ -34,17 +34,18 @@ This document is available from
The \module{re} module was added in Python 1.5, and provides
Perl-style regular expression patterns. Earlier versions of Python
came with the \module{regex} module, which provided Emacs-style
patterns. \module{regex} module was removed in Python 2.5.
patterns. The \module{regex} module was removed completely in Python 2.5.
Regular expressions (or REs) are essentially a tiny, highly
specialized programming language embedded inside Python and made
available through the \module{re} module. Using this little language,
you specify the rules for the set of possible strings that you want to
match; this set might contain English sentences, or e-mail addresses,
or TeX commands, or anything you like. You can then ask questions
such as ``Does this string match the pattern?'', or ``Is there a match
for the pattern anywhere in this string?''. You can also use REs to
modify a string or to split it apart in various ways.
Regular expressions (called REs, or regexes, or regex patterns) are
essentially a tiny, highly specialized programming language embedded
inside Python and made available through the \module{re} module.
Using this little language, you specify the rules for the set of
possible strings that you want to match; this set might contain
English sentences, or e-mail addresses, or TeX commands, or anything
you like. You can then ask questions such as ``Does this string match
the pattern?'', or ``Is there a match for the pattern anywhere in this
string?''. You can also use REs to modify a string or to split it
apart in various ways.
Regular expression patterns are compiled into a series of bytecodes
which are then executed by a matching engine written in C. For
@ -80,11 +81,12 @@ example, the regular expression \regexp{test} will match the string
would let this RE match \samp{Test} or \samp{TEST} as well; more
about this later.)
There are exceptions to this rule; some characters are
special, and don't match themselves. Instead, they signal that some
out-of-the-ordinary thing should be matched, or they affect other
portions of the RE by repeating them. Much of this document is
devoted to discussing various metacharacters and what they do.
There are exceptions to this rule; some characters are special
\dfn{metacharacters}, and don't match themselves. Instead, they
signal that some out-of-the-ordinary thing should be matched, or they
affect other portions of the RE by repeating them or changing their
meaning. Much of this document is devoted to discussing various
metacharacters and what they do.
Here's a complete list of the metacharacters; their meanings will be
discussed in the rest of this HOWTO.
@ -111,9 +113,10 @@ Metacharacters are not active inside classes. For example,
usually a metacharacter, but inside a character class it's stripped of
its special nature.
You can match the characters not within a range by \dfn{complementing}
the set. This is indicated by including a \character{\^} as the first
character of the class; \character{\^} elsewhere will simply match the
You can match the characters not listed within the class by
\dfn{complementing} the set. This is indicated by including a
\character{\^} as the first character of the class; \character{\^}
outside a character class will simply match the
\character{\^} character. For example, \verb|[^5]| will match any
character except \character{5}.
@ -176,7 +179,7 @@ or more times, instead of exactly once.
For example, \regexp{ca*t} will match \samp{ct} (0 \samp{a}
characters), \samp{cat} (1 \samp{a}), \samp{caaat} (3 \samp{a}
characters), and so forth. The RE engine has various internal
limitations stemming from the size of C's \code{int} type, that will
limitations stemming from the size of C's \code{int} type that will
prevent it from matching over 2 billion \samp{a} characters; you
probably don't have enough memory to construct a string that large, so
you shouldn't run into that limit.
@ -238,9 +241,9 @@ will match \samp{a/b}, \samp{a//b}, and \samp{a///b}. It won't match
You can omit either \var{m} or \var{n}; in that case, a reasonable
value is assumed for the missing value. Omitting \var{m} is
interpreted as a lower limit of 0, while omitting \var{n} results in an
upper bound of infinity --- actually, the 2 billion limit mentioned
earlier, but that might as well be infinity.
interpreted as a lower limit of 0, while omitting \var{n} results in
an upper bound of infinity --- actually, the upper bound is the
2-billion limit mentioned earlier, but that might as well be infinity.
Readers of a reductionist bent may notice that the three other qualifiers
can all be expressed using this notation. \regexp{\{0,\}} is the same
@ -285,7 +288,7 @@ them. (There are applications that don't need REs at all, so there's
no need to bloat the language specification by including them.)
Instead, the \module{re} module is simply a C extension module
included with Python, just like the \module{socket} or \module{zlib}
module.
modules.
Putting REs in strings keeps the Python language simpler, but has one
disadvantage which is the topic of the next section.
@ -326,7 +329,7 @@ expressions; backslashes are not handled in any special way in
a string literal prefixed with \character{r}, so \code{r"\e n"} is a
two-character string containing \character{\e} and \character{n},
while \code{"\e n"} is a one-character string containing a newline.
Frequently regular expressions will be expressed in Python
Regular expressions will often be written in Python
code using this raw string notation.
\begin{tableii}{c|c}{code}{Regular String}{Raw string}
@ -368,9 +371,9 @@ strings, and displays whether the RE matches or fails.
\file{redemo.py} can be quite useful when trying to debug a
complicated RE. Phil Schwartz's
\ulink{Kodos}{http://www.phil-schwartz.com/kodos.spy} is also an interactive
tool for developing and testing RE patterns. This HOWTO will use the
standard Python interpreter for its examples.
tool for developing and testing RE patterns.
This HOWTO uses the standard Python interpreter for its examples.
First, run the Python interpreter, import the \module{re} module, and
compile a RE:
@ -472,9 +475,9 @@ Two \class{RegexObject} methods return all of the matches for a pattern.
\end{verbatim}
\method{findall()} has to create the entire list before it can be
returned as the result. In Python 2.2, the \method{finditer()} method
is also available, returning a sequence of \class{MatchObject} instances
as an iterator.
returned as the result. The \method{finditer()} method returns a
sequence of \class{MatchObject} instances as an
iterator.\footnote{Introduced in Python 2.2.2.}
\begin{verbatim}
>>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
@ -491,13 +494,13 @@ as an iterator.
\subsection{Module-Level Functions}
You don't have to produce a \class{RegexObject} and call its methods;
You don't have to create a \class{RegexObject} and call its methods;
the \module{re} module also provides top-level functions called
\function{match()}, \function{search()}, \function{sub()}, and so
forth. These functions take the same arguments as the corresponding
\class{RegexObject} method, with the RE string added as the first
argument, and still return either \code{None} or a \class{MatchObject}
instance.
\function{match()}, \function{search()}, \function{findall()},
\function{sub()}, and so forth. These functions take the same
arguments as the corresponding \class{RegexObject} method, with the RE
string added as the first argument, and still return either
\code{None} or a \class{MatchObject} instance.
\begin{verbatim}
>>> print re.match(r'From\s+', 'Fromage amk')
@ -514,7 +517,7 @@ RE are faster.
Should you use these module-level functions, or should you get the
\class{RegexObject} and call its methods yourself? That choice
depends on how frequently the RE will be used, and on your personal
coding style. If a RE is being used at only one point in the code,
coding style. If the RE is being used at only one point in the code,
then the module functions are probably more convenient. If a program
contains a lot of regular expressions, or re-uses the same ones in
several locations, then it might be worthwhile to collect all the
@ -537,7 +540,7 @@ as I am.
Compilation flags let you modify some aspects of how regular
expressions work. Flags are available in the \module{re} module under
two names, a long name such as \constant{IGNORECASE}, and a short,
two names, a long name such as \constant{IGNORECASE} and a short,
one-letter form such as \constant{I}. (If you're familiar with Perl's
pattern modifiers, the one-letter forms use the same letters; the
short form of \constant{re.VERBOSE} is \constant{re.X}, for example.)
@ -617,7 +620,7 @@ that are more readable by granting you more flexibility in how you can
format them. When this flag has been specified, whitespace within the
RE string is ignored, except when the whitespace is in a character
class or preceded by an unescaped backslash; this lets you organize
and indent the RE more clearly. It also enables you to put comments
and indent the RE more clearly. This flag also lets you put comments
within a RE that will be ignored by the engine; comments are marked by
a \character{\#} that's neither in a character class or preceded by an
unescaped backslash.
@ -629,18 +632,19 @@ much easier it is to read?
charref = re.compile(r"""
&[#] # Start of a numeric entity reference
(
[0-9]+[^0-9] # Decimal form
| 0[0-7]+[^0-7] # Octal form
| x[0-9a-fA-F]+[^0-9a-fA-F] # Hexadecimal form
0[0-7]+ # Octal form
| [0-9]+ # Decimal form
| x[0-9a-fA-F]+ # Hexadecimal form
)
; # Trailing semicolon
""", re.VERBOSE)
\end{verbatim}
Without the verbose setting, the RE would look like this:
\begin{verbatim}
charref = re.compile("&#([0-9]+[^0-9]"
"|0[0-7]+[^0-7]"
"|x[0-9a-fA-F]+[^0-9a-fA-F])")
charref = re.compile("&#(0[0-7]+"
"|[0-9]+"
"|x[0-9a-fA-F]+);")
\end{verbatim}
In the above example, Python's automatic concatenation of string
@ -722,8 +726,8 @@ inside a character class, as in \regexp{[\$]}.
\item[\regexp{\e A}] Matches only at the start of the string. When
not in \constant{MULTILINE} mode, \regexp{\e A} and \regexp{\^} are
effectively the same. In \constant{MULTILINE} mode, however, they're
different; \regexp{\e A} still matches only at the beginning of the
effectively the same. In \constant{MULTILINE} mode, they're
different: \regexp{\e A} still matches only at the beginning of the
string, but \regexp{\^} may match at any location inside the string
that follows a newline character.
@ -782,14 +786,23 @@ RE matched or not. Regular expressions are often used to dissect
strings by writing a RE divided into several subgroups which
match different components of interest. For example, an RFC-822
header line is divided into a header name and a value, separated by a
\character{:}. This can be handled by writing a regular expression
\character{:}, like this:
\begin{verbatim}
From: author@example.com
User-Agent: Thunderbird 1.5.0.9 (X11/20061227)
MIME-Version: 1.0
To: editor@example.com
\end{verbatim}
This can be handled by writing a regular expression
which matches an entire header line, and has one group which matches the
header name, and another group which matches the header's value.
Groups are marked by the \character{(}, \character{)} metacharacters.
\character{(} and \character{)} have much the same meaning as they do
in mathematical expressions; they group together the expressions
contained inside them. For example, you can repeat the contents of a
contained inside them, and you can repeat the contents of a
group with a repeating qualifier, such as \regexp{*}, \regexp{+},
\regexp{?}, or \regexp{\{\var{m},\var{n}\}}. For example,
\regexp{(ab)*} will match zero or more repetitions of \samp{ab}.
@ -882,11 +895,12 @@ syntax for regular expression extensions, so we'll look at that first.
Perl 5 added several additional features to standard regular
expressions, and the Python \module{re} module supports most of them.
It would have been difficult to choose new single-keystroke
metacharacters or new special sequences beginning with \samp{\e} to
represent the new features without making Perl's regular expressions
confusingly different from standard REs. If you chose \samp{\&} as a
new metacharacter, for example, old expressions would be assuming that
It would have been difficult to choose new
single-keystroke metacharacters or new special sequences beginning
with \samp{\e} to represent the new features without making Perl's
regular expressions confusingly different from standard REs. If you
chose \samp{\&} as a new metacharacter, for example, old expressions
would be assuming that
\samp{\&} was a regular character and wouldn't have escaped it by
writing \regexp{\e \&} or \regexp{[\&]}.
@ -913,15 +927,15 @@ Now that we've looked at the general extension syntax, we can return
to the features that simplify working with groups in complex REs.
Since groups are numbered from left to right and a complex expression
may use many groups, it can become difficult to keep track of the
correct numbering, and modifying such a complex RE is annoying.
Insert a new group near the beginning, and you change the numbers of
correct numbering. Modifying such a complex RE is annoying, too:
insert a new group near the beginning and you change the numbers of
everything that follows it.
First, sometimes you'll want to use a group to collect a part of a
regular expression, but aren't interested in retrieving the group's
contents. You can make this fact explicit by using a non-capturing
group: \regexp{(?:...)}, where you can put any other regular
expression inside the parentheses.
Sometimes you'll want to use a group to collect a part of a regular
expression, but aren't interested in retrieving the group's contents.
You can make this fact explicit by using a non-capturing group:
\regexp{(?:...)}, where you can replace the \regexp{...}
with any other regular expression.
\begin{verbatim}
>>> m = re.match("([abc])+", "abc")
@ -937,23 +951,23 @@ group matched, a non-capturing group behaves exactly the same as a
capturing group; you can put anything inside it, repeat it with a
repetition metacharacter such as \samp{*}, and nest it within other
groups (capturing or non-capturing). \regexp{(?:...)} is particularly
useful when modifying an existing group, since you can add new groups
useful when modifying an existing pattern, since you can add new groups
without changing how all the other groups are numbered. It should be
mentioned that there's no performance difference in searching between
capturing and non-capturing groups; neither form is any faster than
the other.
The second, and more significant, feature is named groups; instead of
A more significant feature is named groups: instead of
referring to them by numbers, groups can be referenced by a name.
The syntax for a named group is one of the Python-specific extensions:
\regexp{(?P<\var{name}>...)}. \var{name} is, obviously, the name of
the group. Except for associating a name with a group, named groups
also behave identically to capturing groups. The \class{MatchObject}
methods that deal with capturing groups all accept either integers, to
refer to groups by number, or a string containing the group name.
Named groups are still given numbers, so you can retrieve information
about a group in two ways:
the group. Named groups also behave exactly like capturing groups,
and additionally associate a name with a group. The
\class{MatchObject} methods that deal with capturing groups all accept
either integers that refer to the group by number or strings that
contain the desired group's name. Named groups are still given
numbers, so you can retrieve information about a group in two ways:
\begin{verbatim}
>>> p = re.compile(r'(?P<word>\b\w+\b)')
@ -980,11 +994,11 @@ InternalDate = re.compile(r'INTERNALDATE "'
It's obviously much easier to retrieve \code{m.group('zonem')},
instead of having to remember to retrieve group 9.
Since the syntax for backreferences, in an expression like
\regexp{(...)\e 1}, refers to the number of the group there's
The syntax for backreferences in an expression such as
\regexp{(...)\e 1} refers to the number of the group. There's
naturally a variant that uses the group name instead of the number.
This is also a Python extension: \regexp{(?P=\var{name})} indicates
that the contents of the group called \var{name} should again be found
This is another Python extension: \regexp{(?P=\var{name})} indicates
that the contents of the group called \var{name} should again be matched
at the current point. The regular expression for finding doubled
words, \regexp{(\e b\e w+)\e s+\e 1} can also be written as
\regexp{(?P<word>\e b\e w+)\e s+(?P=word)}:
@ -1014,11 +1028,11 @@ opposite of the positive assertion; it succeeds if the contained expression
\emph{doesn't} match at the current position in the string.
\end{itemize}
An example will help make this concrete by demonstrating a case
where a lookahead is useful. Consider a simple pattern to match a
filename and split it apart into a base name and an extension,
separated by a \samp{.}. For example, in \samp{news.rc}, \samp{news}
is the base name, and \samp{rc} is the filename's extension.
To make this concrete, let's look at a case where a lookahead is
useful. Consider a simple pattern to match a filename and split it
apart into a base name and an extension, separated by a \samp{.}. For
example, in \samp{news.rc}, \samp{news} is the base name, and
\samp{rc} is the filename's extension.
The pattern to match this is quite simple:
@ -1065,12 +1079,12 @@ read and understand. Worse, if the problem changes and you want to
exclude both \samp{bat} and \samp{exe} as extensions, the pattern
would get even more complicated and confusing.
A negative lookahead cuts through all this:
A negative lookahead cuts through all this confusion:
\regexp{.*[.](?!bat\$).*\$}
% $
The lookahead means: if the expression \regexp{bat} doesn't match at
The negative lookahead means: if the expression \regexp{bat} doesn't match at
this point, try the rest of the pattern; if \regexp{bat\$} does match,
the whole pattern will fail. The trailing \regexp{\$} is required to
ensure that something like \samp{sample.batch}, where the extension
@ -1087,7 +1101,7 @@ filenames that end in either \samp{bat} or \samp{exe}:
\section{Modifying Strings}
Up to this point, we've simply performed searches against a static
string. Regular expressions are also commonly used to modify a string
string. Regular expressions are also commonly used to modify strings
in various ways, using the following \class{RegexObject} methods:
\begin{tableii}{c|l}{code}{Method/Attribute}{Purpose}

View File

@ -10,22 +10,6 @@ module never needs to be imported explicitly: the exceptions are
provided in the built-in namespace as well as the \module{exceptions}
module.
\begin{notice}
In past versions of Python string exceptions were supported. In
Python 1.5 and newer versions, all standard exceptions have been
converted to class objects and users are encouraged to do the same.
String exceptions will raise a \code{DeprecationWarning} in Python 2.5 and
newer.
In future versions, support for string exceptions will be removed.
Two distinct string objects with the same value are considered different
exceptions. This is done to force programmers to use exception names
rather than their string value when specifying exception handlers.
The string value of all built-in exceptions is their name, but this is
not a requirement for user-defined exceptions or exceptions defined by
library modules.
\end{notice}
For class exceptions, in a \keyword{try}\stindex{try} statement with
an \keyword{except}\stindex{except} clause that mentions a particular
class, that clause also handles any exception classes derived from

View File

@ -19,7 +19,7 @@ per pixel, etc.
\begin{funcdesc}{crop}{image, psize, width, height, x0, y0, x1, y1}
Return the selected part of \var{image}, which should by
Return the selected part of \var{image}, which should be
\var{width} by \var{height} in size and consist of pixels of
\var{psize} bytes. \var{x0}, \var{y0}, \var{x1} and \var{y1} are like
the \function{gl.lrectread()} parameters, i.e.\ the boundary is

View File

@ -58,14 +58,18 @@ skipped, though using a key from an iterator may result in a
\exception{KeyError} exception if the corresponding message is subsequently
removed.
Be very cautious when modifying mailboxes that might also be changed
by some other process. The safest mailbox format to use for such
tasks is Maildir; try to avoid using single-file formats such as mbox
for concurrent writing. If you're modifying a mailbox, no matter what
the format, you must lock it by calling the \method{lock()} and
\method{unlock()} methods before making any changes. Failing to lock
the mailbox runs the risk of losing data if some other process makes
changes to the mailbox while your Python code is running.
\begin{notice}[warning]
Be very cautious when modifying mailboxes that might be
simultaneously changed by some other process. The safest mailbox
format to use for such tasks is Maildir; try to avoid using
single-file formats such as mbox for concurrent writing. If you're
modifying a mailbox, you
\emph{must} lock it by calling the \method{lock()} and
\method{unlock()} methods \emph{before} reading any messages in the file
or making any changes by adding or deleting a message. Failing to
lock the mailbox runs the risk of losing messages or corrupting the entire
mailbox.
\end{notice}
\class{Mailbox} instances have the following methods:

View File

@ -197,10 +197,6 @@ Exceptions can also be identified by strings, in which case the
value can be raised along with the identifying string which can be
passed to the handler.
\deprecated{2.5}{String exceptions should not be used in new code.
They will not be supported in a future version of Python. Old code
should be rewritten to use class exceptions instead.}
\begin{notice}[warning]
Messages to exceptions are not part of the Python API. Their contents may
change from one version of Python to the next without warning and should not

View File

@ -1991,7 +1991,7 @@ applied to complex expressions and nested functions:
There is a way to remove an item from a list given its index instead
of its value: the \keyword{del} statement. This differs from the
\method{pop()}) method which returns a value. The \keyword{del}
\method{pop()} method which returns a value. The \keyword{del}
statement can also be used to remove slices from a list or clear the
entire list (which we did earlier by assignment of an empty list to
the slice). For example:

View File

@ -72,6 +72,12 @@ SSL thanks to the addition of the \class{SMTP_SSL} class.
This class supports an interface identical to the existing \class{SMTP}
class. (Contributed by Monty Taylor.)
\item The \module{test.test_support} module now contains a
\function{EnvironmentVarGuard} context manager that
supports temporarily changing environment variables and
automatically restores them to their old values.
(Contributed by Brett Cannon.)
\end{itemize}

View File

@ -22,9 +22,6 @@ try:
except:
from dummy_thread import allocate_lock as _thread_allocate_lock
__author__ = "Brett Cannon"
__email__ = "brett@python.org"
__all__ = ['strptime']
def _getlang():
@ -273,11 +270,31 @@ _TimeRE_cache = TimeRE()
_CACHE_MAX_SIZE = 5 # Max number of regexes stored in _regex_cache
_regex_cache = {}
def _calc_julian_from_U_or_W(year, week_of_year, day_of_week, week_starts_Mon):
"""Calculate the Julian day based on the year, week of the year, and day of
the week, with week_start_day representing whether the week of the year
assumes the week starts on Sunday or Monday (6 or 0)."""
first_weekday = datetime_date(year, 1, 1).weekday()
# If we are dealing with the %U directive (week starts on Sunday), it's
# easier to just shift the view to Sunday being the first day of the
# week.
if not week_starts_Mon:
first_weekday = (first_weekday + 1) % 7
day_of_week = (day_of_week + 1) % 7
# Need to watch out for a week 0 (when the first day of the year is not
# the same as that specified by %U or %W).
week_0_length = (7 - first_weekday) % 7
if week_of_year == 0:
return 1 + day_of_week - first_weekday
else:
days_to_week = week_0_length + (7 * (week_of_year - 1))
return 1 + days_to_week + day_of_week
def strptime(data_string, format="%a %b %d %H:%M:%S %Y"):
"""Return a time struct based on the input string and the format string."""
global _TimeRE_cache, _regex_cache
_cache_lock.acquire()
try:
with _cache_lock:
time_re = _TimeRE_cache
locale_time = time_re.locale_time
if _getlang() != locale_time.lang:
@ -302,8 +319,6 @@ def strptime(data_string, format="%a %b %d %H:%M:%S %Y"):
except IndexError:
raise ValueError("stray %% in format '%s'" % format)
_regex_cache[format] = format_regex
finally:
_cache_lock.release()
found = format_regex.match(data_string)
if not found:
raise ValueError("time data %r does not match format %r" %
@ -385,10 +400,10 @@ def strptime(data_string, format="%a %b %d %H:%M:%S %Y"):
elif group_key in ('U', 'W'):
week_of_year = int(found_dict[group_key])
if group_key == 'U':
# U starts week on Sunday
# U starts week on Sunday.
week_of_year_start = 6
else:
# W starts week on Monday
# W starts week on Monday.
week_of_year_start = 0
elif group_key == 'Z':
# Since -1 is default value only need to worry about setting tz if
@ -406,42 +421,20 @@ def strptime(data_string, format="%a %b %d %H:%M:%S %Y"):
tz = value
break
# If we know the week of the year and what day of that week, we can figure
# out the Julian day of the year
# Calculations below assume 0 is a Monday
# out the Julian day of the year.
if julian == -1 and week_of_year != -1 and weekday != -1:
# Calculate how many days in week 0
first_weekday = datetime_date(year, 1, 1).weekday()
preceeding_days = 7 - first_weekday
if preceeding_days == 7:
preceeding_days = 0
# Adjust for U directive so that calculations are not dependent on
# directive used to figure out week of year
if weekday == 6 and week_of_year_start == 6:
week_of_year -= 1
# If a year starts and ends on a Monday but a week is specified to
# start on a Sunday we need to up the week to counter-balance the fact
# that with %W that first Monday starts week 1 while with %U that is
# week 0 and thus shifts everything by a week
if weekday == 0 and first_weekday == 0 and week_of_year_start == 6:
week_of_year += 1
# If in week 0, then just figure out how many days from Jan 1 to day of
# week specified, else calculate by multiplying week of year by 7,
# adding in days in week 0, and the number of days from Monday to the
# day of the week
if week_of_year == 0:
julian = 1 + weekday - first_weekday
else:
days_to_week = preceeding_days + (7 * (week_of_year - 1))
julian = 1 + days_to_week + weekday
week_starts_Mon = True if week_of_year_start == 0 else False
julian = _calc_julian_from_U_or_W(year, week_of_year, weekday,
week_starts_Mon)
# Cannot pre-calculate datetime_date() since can change in Julian
# calculation and thus could have different value for the day of the week
#calculation
# calculation.
if julian == -1:
# Need to add 1 to result since first day of the year is 1, not 0.
julian = datetime_date(year, month, day).toordinal() - \
datetime_date(year, 1, 1).toordinal() + 1
else: # Assume that if they bothered to include Julian day it will
#be accurate
# be accurate.
datetime_result = datetime_date.fromordinal((julian - 1) + datetime_date(year, 1, 1).toordinal())
year = datetime_result.year
month = datetime_result.month

View File

@ -914,6 +914,8 @@ class CodeGenerator:
self.emit('LOAD_CONST', None)
self.nextBlock(final)
self.setups.push((END_FINALLY, final))
self._implicitNameOp('LOAD', exitvar)
self._implicitNameOp('DELETE', exitvar)
self.emit('WITH_CLEANUP')
self.emit('END_FINALLY')
self.setups.pop()

View File

@ -1018,7 +1018,7 @@ class Transformer:
if nodelist[2][0] == token.COLON:
var = None
else:
var = self.com_node(nodelist[2])
var = self.com_assign(nodelist[2][2], OP_ASSIGN)
return With(expr, var, body, lineno=nodelist[0][2])
def com_with_var(self, nodelist):

View File

@ -11,11 +11,8 @@ Suggested usage is::
import dummy_thread as thread
"""
__author__ = "Brett Cannon"
__email__ = "brett@python.org"
# Exports only things specified by thread documentation
# (skipping obsolete synonyms allocate(), start_new(), exit_thread())
# Exports only things specified by thread documentation;
# skipping obsolete synonyms allocate(), start_new(), exit_thread().
__all__ = ['error', 'start_new_thread', 'exit', 'get_ident', 'allocate_lock',
'interrupt_main', 'LockType']

View File

@ -5,11 +5,6 @@ to not have ``threading`` considered imported. Had ``threading`` been
directly imported it would have made all subsequent imports succeed
regardless of whether ``thread`` was available which is not desired.
:Author: Brett Cannon
:Contact: brett@python.org
XXX: Try to get rid of ``_dummy_threading``.
"""
from sys import modules as sys_modules

View File

@ -46,6 +46,7 @@ CHARSETS = {
'iso-8859-13': (QP, QP, None),
'iso-8859-14': (QP, QP, None),
'iso-8859-15': (QP, QP, None),
'iso-8859-16': (QP, QP, None),
'windows-1252':(QP, QP, None),
'viscii': (QP, QP, None),
'us-ascii': (None, None, None),
@ -81,6 +82,8 @@ ALIASES = {
'latin-8': 'iso-8859-14',
'latin_9': 'iso-8859-15',
'latin-9': 'iso-8859-15',
'latin_10':'iso-8859-16',
'latin-10':'iso-8859-16',
'cp949': 'ks_c_5601-1987',
'euc_jp': 'euc-jp',
'euc_kr': 'euc-kr',

View File

@ -301,6 +301,8 @@ aliases = {
# iso8859_13 codec
'iso_8859_13' : 'iso8859_13',
'l7' : 'iso8859_13',
'latin7' : 'iso8859_13',
# iso8859_14 codec
'iso_8859_14' : 'iso8859_14',
@ -312,6 +314,8 @@ aliases = {
# iso8859_15 codec
'iso_8859_15' : 'iso8859_15',
'l9' : 'iso8859_15',
'latin9' : 'iso8859_15',
# iso8859_16 codec
'iso_8859_16' : 'iso8859_16',

View File

@ -7,6 +7,12 @@ from random import random
# How much time in seconds can pass before we print a 'Still working' message.
_PRINT_WORKING_MSG_INTERVAL = 5 * 60
class TrivialContext(object):
def __enter__(self):
return self
def __exit__(self, *exc_info):
pass
class CompilerTest(unittest.TestCase):
def testCompileLibrary(self):
@ -157,6 +163,31 @@ class CompilerTest(unittest.TestCase):
exec(c, dct)
self.assertEquals(dct['f'].func_annotations, expected)
def testWith(self):
# SF bug 1638243
c = compiler.compile('from __future__ import with_statement\n'
'def f():\n'
' with TrivialContext():\n'
' return 1\n'
'result = f()',
'<string>',
'exec' )
dct = {'TrivialContext': TrivialContext}
exec(c, dct)
self.assertEquals(dct.get('result'), 1)
def testWithAss(self):
c = compiler.compile('from __future__ import with_statement\n'
'def f():\n'
' with TrivialContext() as tc:\n'
' return 1\n'
'result = f()',
'<string>',
'exec' )
dct = {'TrivialContext': TrivialContext}
exec(c, dct)
self.assertEquals(dct.get('result'), 1)
NOLINENO = (compiler.ast.Module, compiler.ast.Stmt, compiler.ast.Discard)

View File

@ -311,6 +311,13 @@ class ExceptionTests(unittest.TestCase):
'pickled "%r", attribute "%s' %
(e, checkArgName))
def testSlicing(self):
# Test that you can slice an exception directly instead of requiring
# going through the 'args' attribute.
args = (1, 2, 3)
exc = BaseException(*args)
self.failUnlessEqual(exc[:], args)
def testKeywordArgs(self):
# test that builtin exception don't take keyword args,
# but user-defined subclasses can if they want

View File

@ -2,7 +2,7 @@ import unittest
import __builtin__
import exceptions
import warnings
from test.test_support import run_unittest
from test.test_support import run_unittest, guard_warnings_filter
import os
from platform import system as platform_system
@ -113,13 +113,11 @@ class UsageTests(unittest.TestCase):
"""Test usage of exceptions"""
def setUp(self):
self._filters = warnings.filters[:]
def tearDown(self):
warnings.filters = self._filters[:]
def test_raise_new_style_non_exception(self):
# You cannot raise a new-style class that does not inherit from
# BaseException; the ability was not possible until BaseException's
# introduction so no need to support new-style objects that do not
# inherit from it.
class NewStyleClass(object):
pass
try:
@ -127,13 +125,51 @@ class UsageTests(unittest.TestCase):
except TypeError:
pass
except:
self.fail("unable to raise new-style class")
self.fail("able to raise new-style class")
try:
raise NewStyleClass()
except TypeError:
pass
except:
self.fail("unable to raise new-style class instance")
self.fail("able to raise new-style class instance")
def test_raise_string(self):
# Raising a string raises TypeError.
try:
raise "spam"
except TypeError:
pass
except:
self.fail("was able to raise a string exception")
def test_catch_string(self):
# Catching a string should trigger a DeprecationWarning.
with guard_warnings_filter():
warnings.resetwarnings()
warnings.filterwarnings("error")
str_exc = "spam"
try:
try:
raise StandardError
except str_exc:
pass
except DeprecationWarning:
pass
except StandardError:
self.fail("catching a string exception did not raise "
"DeprecationWarning")
# Make sure that even if the string exception is listed in a tuple
# that a warning is raised.
try:
try:
raise StandardError
except (AssertionError, str_exc):
pass
except DeprecationWarning:
pass
except StandardError:
self.fail("catching a string exception specified in a tuple did "
"not raise DeprecationWarning")
def test_main():
run_unittest(ExceptionClassTests, UsageTests)

View File

@ -463,6 +463,10 @@ class CalculationTests(unittest.TestCase):
"of the year")
test_helper((1917, 12, 31), "Dec 31 on Monday with year starting and "
"ending on Monday")
test_helper((2007, 01, 07), "First Sunday of 2007")
test_helper((2007, 01, 14), "Second Sunday of 2007")
test_helper((2006, 12, 31), "Last Sunday of 2006")
test_helper((2006, 12, 24), "Second to last Sunday of 2006")
class CacheTests(unittest.TestCase):

View File

@ -1432,10 +1432,19 @@ Z_get(void *ptr, unsigned size)
#endif
#ifdef MS_WIN32
/* We cannot use SysFreeString as the PyCObject_FromVoidPtr
because of different calling convention
*/
static void _my_SysFreeString(void *p)
{
SysFreeString((BSTR)p);
}
static PyObject *
BSTR_set(void *ptr, PyObject *value, unsigned size)
{
BSTR bstr;
PyObject *result;
/* convert value into a PyUnicodeObject or NULL */
if (Py_None == value) {
@ -1463,15 +1472,19 @@ BSTR_set(void *ptr, PyObject *value, unsigned size)
} else
bstr = NULL;
/* free the previous contents, if any */
if (*(BSTR *)ptr)
SysFreeString(*(BSTR *)ptr);
if (bstr) {
result = PyCObject_FromVoidPtr((void *)bstr, _my_SysFreeString);
if (result == NULL) {
SysFreeString(bstr);
return NULL;
}
} else {
result = Py_None;
Py_INCREF(result);
}
/* and store it */
*(BSTR *)ptr = bstr;
/* We don't need to keep any other object */
_RET(value);
return result;
}

View File

@ -1462,7 +1462,7 @@ posix_do_stat(PyObject *self, PyObject *args,
/* POSIX methods */
PyDoc_STRVAR(posix_access__doc__,
"access(path, mode) -> 1 if granted, 0 otherwise\n\n\
"access(path, mode) -> True if granted, False otherwise\n\n\
Use the real uid/gid to test for access to a path. Note that most\n\
operations will use the effective uid/gid, therefore this routine can\n\
be used in a suid/sgid environment to test if the invoking user has the\n\

View File

@ -935,7 +935,7 @@ set_update_internal(PySetObject *so, PyObject *other)
if (PyAnySet_Check(other))
return set_merge(so, other);
if (PyDict_Check(other)) {
if (PyDict_CheckExact(other)) {
PyObject *value;
Py_ssize_t pos = 0;
while (PyDict_Next(other, &pos, &key, &value)) {
@ -1383,7 +1383,7 @@ set_difference(PySetObject *so, PyObject *other)
setentry *entry;
Py_ssize_t pos = 0;
if (!PyAnySet_Check(other) && !PyDict_Check(other)) {
if (!PyAnySet_Check(other) && !PyDict_CheckExact(other)) {
result = set_copy(so);
if (result == NULL)
return NULL;
@ -1397,7 +1397,7 @@ set_difference(PySetObject *so, PyObject *other)
if (result == NULL)
return NULL;
if (PyDict_Check(other)) {
if (PyDict_CheckExact(other)) {
while (set_next(so, &pos, &entry)) {
setentry entrycopy;
entrycopy.hash = entry->hash;
@ -1470,7 +1470,7 @@ set_symmetric_difference_update(PySetObject *so, PyObject *other)
if ((PyObject *)so == other)
return set_clear(so);
if (PyDict_Check(other)) {
if (PyDict_CheckExact(other)) {
PyObject *value;
int rv;
while (PyDict_Next(other, &pos, &key, &value)) {

View File

@ -2174,8 +2174,9 @@ PyEval_EvalFrameEx(PyFrameObject *f, int throwflag)
case SETUP_LOOP:
case SETUP_EXCEPT:
case SETUP_FINALLY:
/* NOTE: If you add any new block-setup opcodes that are not try/except/finally
handlers, you may need to update the PyGen_NeedsFinalizing() function. */
/* NOTE: If you add any new block-setup opcodes that are
not try/except/finally handlers, you may need to
update the PyGen_NeedsFinalizing() function. */
PyFrame_BlockSetup(f, opcode, INSTR_OFFSET() + oparg,
STACK_LEVEL());
@ -4010,6 +4011,35 @@ cmp_outcome(int op, register PyObject *v, register PyObject *w)
res = !res;
break;
case PyCmp_EXC_MATCH:
if (PyTuple_Check(w)) {
Py_ssize_t i, length;
length = PyTuple_Size(w);
for (i = 0; i < length; i += 1) {
PyObject *exc = PyTuple_GET_ITEM(w, i);
if (PyString_Check(exc)) {
int ret_val;
ret_val = PyErr_WarnEx(
PyExc_DeprecationWarning,
"catching of string "
"exceptions is "
"deprecated", 1);
if (ret_val == -1)
return NULL;
}
}
}
else {
if (PyString_Check(w)) {
int ret_val;
ret_val = PyErr_WarnEx(
PyExc_DeprecationWarning,
"catching of string "
"exceptions is deprecated",
1);
if (ret_val == -1)
return NULL;
}
}
res = PyErr_GivenExceptionMatches(v, w);
break;
default: