2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
:mod:`urllib` --- Open arbitrary resources by URL
|
|
|
|
=================================================
|
|
|
|
|
|
|
|
.. module:: urllib
|
|
|
|
:synopsis: Open an arbitrary network resource by URL (requires sockets).
|
|
|
|
|
|
|
|
|
|
|
|
.. index::
|
|
|
|
single: WWW
|
|
|
|
single: World Wide Web
|
|
|
|
single: URL
|
|
|
|
|
|
|
|
This module provides a high-level interface for fetching data across the World
|
|
|
|
Wide Web. In particular, the :func:`urlopen` function is similar to the
|
|
|
|
built-in function :func:`open`, but accepts Universal Resource Locators (URLs)
|
|
|
|
instead of filenames. Some restrictions apply --- it can only open URLs for
|
|
|
|
reading, and no seek operations are available.
|
|
|
|
|
|
|
|
It defines the following public functions:
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: urlopen(url[, data[, proxies]])
|
|
|
|
|
|
|
|
Open a network object denoted by a URL for reading. If the URL does not have a
|
|
|
|
scheme identifier, or if it has :file:`file:` as its scheme identifier, this
|
|
|
|
opens a local file (without universal newlines); otherwise it opens a socket to
|
|
|
|
a server somewhere on the network. If the connection cannot be made the
|
|
|
|
:exc:`IOError` exception is raised. If all went well, a file-like object is
|
|
|
|
returned. This supports the following methods: :meth:`read`, :meth:`readline`,
|
|
|
|
:meth:`readlines`, :meth:`fileno`, :meth:`close`, :meth:`info` and
|
#1370: Finish the merge r58749, log below, by resolving all conflicts in Doc/.
Merged revisions 58221-58741 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r58221 | georg.brandl | 2007-09-20 10:57:59 -0700 (Thu, 20 Sep 2007) | 2 lines
Patch #1181: add os.environ.clear() method.
........
r58225 | sean.reifschneider | 2007-09-20 23:33:28 -0700 (Thu, 20 Sep 2007) | 3 lines
Issue1704287: "make install" fails unless you do "make" first. Make
oldsharedmods and sharedmods in "libinstall".
........
r58232 | guido.van.rossum | 2007-09-22 13:18:03 -0700 (Sat, 22 Sep 2007) | 4 lines
Patch # 188 by Philip Jenvey.
Make tell() mark CRLF as a newline.
With unit test.
........
r58242 | georg.brandl | 2007-09-24 10:55:47 -0700 (Mon, 24 Sep 2007) | 2 lines
Fix typo and double word.
........
r58245 | georg.brandl | 2007-09-24 10:59:28 -0700 (Mon, 24 Sep 2007) | 2 lines
#1196: document default radix for int().
........
r58247 | georg.brandl | 2007-09-24 11:08:24 -0700 (Mon, 24 Sep 2007) | 2 lines
#1177: accept 2xx responses for https too, not only http.
........
r58249 | andrew.kuchling | 2007-09-24 16:45:51 -0700 (Mon, 24 Sep 2007) | 1 line
Remove stray odd character; grammar fix
........
r58250 | andrew.kuchling | 2007-09-24 16:46:28 -0700 (Mon, 24 Sep 2007) | 1 line
Typo fix
........
r58251 | andrew.kuchling | 2007-09-24 17:09:42 -0700 (Mon, 24 Sep 2007) | 1 line
Add various items
........
r58268 | vinay.sajip | 2007-09-26 22:34:45 -0700 (Wed, 26 Sep 2007) | 1 line
Change to flush and close logic to fix #1760556.
........
r58269 | vinay.sajip | 2007-09-26 22:38:51 -0700 (Wed, 26 Sep 2007) | 1 line
Change to basicConfig() to fix #1021.
........
r58270 | georg.brandl | 2007-09-26 23:26:58 -0700 (Wed, 26 Sep 2007) | 2 lines
#1208: document match object's boolean value.
........
r58271 | vinay.sajip | 2007-09-26 23:56:13 -0700 (Wed, 26 Sep 2007) | 1 line
Minor date change.
........
r58272 | vinay.sajip | 2007-09-27 00:35:10 -0700 (Thu, 27 Sep 2007) | 1 line
Change to LogRecord.__init__() to fix #1206. Note that archaic use of type(x) == types.DictType is because of keeping 1.5.2 compatibility. While this is much less relevant these days, there probably needs to be a separate commit for removing all archaic constructs at the same time.
........
r58288 | brett.cannon | 2007-09-30 12:45:10 -0700 (Sun, 30 Sep 2007) | 9 lines
tuple.__repr__ did not consider a reference loop as it is not possible from
Python code; but it is possible from C. object.__str__ had the issue of not
expecting a type to doing something within it's tp_str implementation that
could trigger an infinite recursion, but it could in C code.. Both found
thanks to BaseException and how it handles its repr.
Closes issue #1686386. Thanks to Thomas Herve for taking an initial stab at
coming up with a solution.
........
r58289 | brett.cannon | 2007-09-30 13:37:19 -0700 (Sun, 30 Sep 2007) | 3 lines
Fix error introduced by r58288; if a tuple is length 0 return its repr and
don't worry about any self-referring tuples.
........
r58294 | facundo.batista | 2007-10-02 10:01:24 -0700 (Tue, 02 Oct 2007) | 11 lines
Made the various is_* operations return booleans. This was discussed
with Cawlishaw by mail, and he basically confirmed that to these is_*
operations, there's no need to return Decimal(0) and Decimal(1) if
the language supports the False and True booleans.
Also added a few tests for the these functions in extra.decTest, since
they are mostly untested (apart from the doctests).
Thanks Mark Dickinson
........
r58295 | facundo.batista | 2007-10-02 11:21:18 -0700 (Tue, 02 Oct 2007) | 4 lines
Added a class to store the digits of log(10), so that they can be made
available when necessary without recomputing. Thanks Mark Dickinson
........
r58299 | mark.summerfield | 2007-10-03 01:53:21 -0700 (Wed, 03 Oct 2007) | 4 lines
Added note in footnote about string comparisons about
unicodedata.normalize().
........
r58304 | raymond.hettinger | 2007-10-03 14:18:11 -0700 (Wed, 03 Oct 2007) | 1 line
enumerate() is no longer bounded to using sequences shorter than LONG_MAX. The possibility of overflow was sending some newsgroup posters into a tizzy.
........
r58305 | raymond.hettinger | 2007-10-03 17:20:27 -0700 (Wed, 03 Oct 2007) | 1 line
itertools.count() no longer limited to sys.maxint.
........
r58306 | kurt.kaiser | 2007-10-03 18:49:54 -0700 (Wed, 03 Oct 2007) | 3 lines
Assume that the user knows when he wants to end the line; don't insert
something he didn't select or complete.
........
r58307 | kurt.kaiser | 2007-10-03 19:07:50 -0700 (Wed, 03 Oct 2007) | 2 lines
Remove unused theme that was causing a fault in p3k.
........
r58308 | kurt.kaiser | 2007-10-03 19:09:17 -0700 (Wed, 03 Oct 2007) | 2 lines
Clean up EditorWindow close.
........
r58309 | kurt.kaiser | 2007-10-03 19:53:07 -0700 (Wed, 03 Oct 2007) | 7 lines
textView cleanup. Patch 1718043 Tal Einat.
M idlelib/EditorWindow.py
M idlelib/aboutDialog.py
M idlelib/textView.py
M idlelib/NEWS.txt
........
r58310 | kurt.kaiser | 2007-10-03 20:11:12 -0700 (Wed, 03 Oct 2007) | 3 lines
configDialog cleanup. Patch 1730217 Tal Einat.
........
r58311 | neal.norwitz | 2007-10-03 23:00:48 -0700 (Wed, 03 Oct 2007) | 4 lines
Coverity #151: Remove deadcode.
All this code already exists above starting at line 653.
........
r58325 | fred.drake | 2007-10-04 19:46:12 -0700 (Thu, 04 Oct 2007) | 1 line
wrap lines to <80 characters before fixing errors
........
r58326 | raymond.hettinger | 2007-10-04 19:47:07 -0700 (Thu, 04 Oct 2007) | 6 lines
Add __asdict__() to NamedTuple and refine the docs.
Add maxlen support to deque() and fixup docs.
Partially fix __reduce__(). The None as a third arg was no longer supported.
Still needs work on __reduce__() to handle recursive inputs.
........
r58327 | fred.drake | 2007-10-04 19:48:32 -0700 (Thu, 04 Oct 2007) | 3 lines
move descriptions of ac_(in|out)_buffer_size to the right place
http://bugs.python.org/issue1053
........
r58329 | neal.norwitz | 2007-10-04 20:39:17 -0700 (Thu, 04 Oct 2007) | 3 lines
dict could be NULL, so we need to XDECREF.
Fix a compiler warning about passing a PyTypeObject* instead of PyObject*.
........
r58330 | neal.norwitz | 2007-10-04 20:41:19 -0700 (Thu, 04 Oct 2007) | 2 lines
Fix Coverity #158: Check the correct variable.
........
r58332 | neal.norwitz | 2007-10-04 22:01:38 -0700 (Thu, 04 Oct 2007) | 7 lines
Fix Coverity #159.
This code was broken if save() returned a negative number since i contained
a boolean value and then we compared i < 0 which should never be true.
Will backport (assuming it's necessary)
........
r58334 | neal.norwitz | 2007-10-04 22:29:17 -0700 (Thu, 04 Oct 2007) | 1 line
Add a note about fixing some more warnings found by Coverity.
........
r58338 | raymond.hettinger | 2007-10-05 12:07:31 -0700 (Fri, 05 Oct 2007) | 1 line
Restore BEGIN/END THREADS macros which were squashed in the previous checkin
........
r58343 | gregory.p.smith | 2007-10-06 00:48:10 -0700 (Sat, 06 Oct 2007) | 3 lines
Stab in the dark attempt to fix the test_bsddb3 failure on sparc and S-390
ubuntu buildbots.
........
r58344 | gregory.p.smith | 2007-10-06 00:51:59 -0700 (Sat, 06 Oct 2007) | 2 lines
Allows BerkeleyDB 4.6.x >= 4.6.21 for the bsddb module.
........
r58348 | gregory.p.smith | 2007-10-06 08:47:37 -0700 (Sat, 06 Oct 2007) | 3 lines
Use the host the author likely meant in the first place. pop.gmail.com is
reliable. gmail.org is someones personal domain.
........
r58351 | neal.norwitz | 2007-10-06 12:16:28 -0700 (Sat, 06 Oct 2007) | 3 lines
Ensure that this test will pass even if another test left an unwritable TESTFN.
Also use the safe unlink in test_support instead of rolling our own here.
........
r58368 | georg.brandl | 2007-10-08 00:50:24 -0700 (Mon, 08 Oct 2007) | 3 lines
#1123: fix the docs for the str.split(None, sep) case.
Also expand a few other methods' docs, which had more info in the deprecated string module docs.
........
r58369 | georg.brandl | 2007-10-08 01:06:05 -0700 (Mon, 08 Oct 2007) | 2 lines
Update docstring of sched, also remove an unused assignment.
........
r58370 | raymond.hettinger | 2007-10-08 02:14:28 -0700 (Mon, 08 Oct 2007) | 5 lines
Add comments to NamedTuple code.
Let the field spec be either a string or a non-string sequence (suggested by Martin Blais with use cases).
Improve the error message in the case of a SyntaxError (caused by a duplicate field name).
........
r58371 | raymond.hettinger | 2007-10-08 02:56:29 -0700 (Mon, 08 Oct 2007) | 1 line
Missed a line in the docs
........
r58372 | raymond.hettinger | 2007-10-08 03:11:51 -0700 (Mon, 08 Oct 2007) | 1 line
Better variable names
........
r58376 | georg.brandl | 2007-10-08 07:12:47 -0700 (Mon, 08 Oct 2007) | 3 lines
#1199: docs for tp_as_{number,sequence,mapping}, by Amaury Forgeot d'Arc.
No need to merge this to py3k!
........
r58380 | raymond.hettinger | 2007-10-08 14:26:58 -0700 (Mon, 08 Oct 2007) | 1 line
Eliminate camelcase function name
........
r58381 | andrew.kuchling | 2007-10-08 16:23:03 -0700 (Mon, 08 Oct 2007) | 1 line
Eliminate camelcase function name
........
r58382 | raymond.hettinger | 2007-10-08 18:36:23 -0700 (Mon, 08 Oct 2007) | 1 line
Make the error messages more specific
........
r58384 | gregory.p.smith | 2007-10-08 23:02:21 -0700 (Mon, 08 Oct 2007) | 10 lines
Splits Modules/_bsddb.c up into bsddb.h and _bsddb.c and adds a C API
object available as bsddb.db.api. This is based on the patch submitted
by Duncan Grisby here:
http://sourceforge.net/tracker/index.php?func=detail&aid=1551895&group_id=13900&atid=313900
See this thread for additional info:
http://sourceforge.net/mailarchive/forum.php?thread_name=E1GAVDK-0002rk-Iw%40apasphere.com&forum_name=pybsddb-users
It also cleans up the code a little by removing some ifdef/endifs for
python prior to 2.1 and for unsupported Berkeley DB <= 3.2.
........
r58385 | gregory.p.smith | 2007-10-08 23:50:43 -0700 (Mon, 08 Oct 2007) | 5 lines
Fix a double free when positioning a database cursor to a non-existant
string key (and probably a few other situations with string keys).
This was reported with a patch as pybsddb sourceforge bug 1708868 by
jjjhhhlll at gmail.
........
r58386 | gregory.p.smith | 2007-10-09 00:19:11 -0700 (Tue, 09 Oct 2007) | 3 lines
Use the highest cPickle protocol in bsddb.dbshelve. This comes from
sourceforge pybsddb patch 1551443 by w_barnes.
........
r58394 | gregory.p.smith | 2007-10-09 11:26:02 -0700 (Tue, 09 Oct 2007) | 2 lines
remove another sleepycat reference
........
r58396 | kurt.kaiser | 2007-10-09 12:31:30 -0700 (Tue, 09 Oct 2007) | 3 lines
Allow interrupt only when executing user code in subprocess
Patch 1225 Tal Einat modified from IDLE-Spoon.
........
r58399 | brett.cannon | 2007-10-09 17:07:50 -0700 (Tue, 09 Oct 2007) | 5 lines
Remove file-level typedefs that were inconsistently used throughout the file.
Just move over to the public API names.
Closes issue1238.
........
r58401 | raymond.hettinger | 2007-10-09 17:26:46 -0700 (Tue, 09 Oct 2007) | 1 line
Accept Jim Jewett's api suggestion to use None instead of -1 to indicate unbounded deques.
........
r58403 | kurt.kaiser | 2007-10-09 17:55:40 -0700 (Tue, 09 Oct 2007) | 2 lines
Allow cursor color change w/o restart. Patch 1725576 Tal Einat.
........
r58404 | kurt.kaiser | 2007-10-09 18:06:47 -0700 (Tue, 09 Oct 2007) | 2 lines
show paste if > 80 columns. Patch 1659326 Tal Einat.
........
r58415 | thomas.heller | 2007-10-11 12:51:32 -0700 (Thu, 11 Oct 2007) | 5 lines
On OS X, use os.uname() instead of gestalt.sysv(...) to get the
operating system version. This allows to use ctypes when Python
was configured with --disable-toolbox-glue.
........
r58419 | neal.norwitz | 2007-10-11 20:01:01 -0700 (Thu, 11 Oct 2007) | 1 line
Get rid of warning about not being able to create an existing directory.
........
r58420 | neal.norwitz | 2007-10-11 20:01:30 -0700 (Thu, 11 Oct 2007) | 1 line
Get rid of warnings on a bunch of platforms by using a proper prototype.
........
r58421 | neal.norwitz | 2007-10-11 20:01:54 -0700 (Thu, 11 Oct 2007) | 4 lines
Get rid of compiler warning about retval being used (returned) without
being initialized. (gcc warning and Coverity 202)
........
r58422 | neal.norwitz | 2007-10-11 20:03:23 -0700 (Thu, 11 Oct 2007) | 1 line
Fix Coverity 168: Close the file before returning (exiting).
........
r58423 | neal.norwitz | 2007-10-11 20:04:18 -0700 (Thu, 11 Oct 2007) | 4 lines
Fix Coverity 180: Don't overallocate. We don't need structs, but pointers.
Also fix a memory leak.
........
r58424 | neal.norwitz | 2007-10-11 20:05:19 -0700 (Thu, 11 Oct 2007) | 5 lines
Fix Coverity 185-186: If the passed in FILE is NULL, uninitialized memory
would be accessed.
Will backport.
........
r58425 | neal.norwitz | 2007-10-11 20:52:34 -0700 (Thu, 11 Oct 2007) | 1 line
Get this module to compile with bsddb versions prior to 4.3
........
r58430 | martin.v.loewis | 2007-10-12 01:56:52 -0700 (Fri, 12 Oct 2007) | 3 lines
Bug #1216: Restore support for Visual Studio 2002.
Will backport to 2.5.
........
r58433 | raymond.hettinger | 2007-10-12 10:53:11 -0700 (Fri, 12 Oct 2007) | 1 line
Fix test of count.__repr__() to ignore the 'L' if the count is a long
........
r58434 | gregory.p.smith | 2007-10-12 11:44:06 -0700 (Fri, 12 Oct 2007) | 4 lines
Fixes http://bugs.python.org/issue1233 - bsddb.dbshelve.DBShelf.append
was useless due to inverted logic. Also adds a test case for RECNO dbs
to test_dbshelve.
........
r58445 | georg.brandl | 2007-10-13 06:20:03 -0700 (Sat, 13 Oct 2007) | 2 lines
Fix email example.
........
r58450 | gregory.p.smith | 2007-10-13 16:02:05 -0700 (Sat, 13 Oct 2007) | 2 lines
Fix an uncollectable reference leak in bsddb.db.DBShelf.append
........
r58453 | neal.norwitz | 2007-10-13 17:18:40 -0700 (Sat, 13 Oct 2007) | 8 lines
Let the O/S supply a port if none of the default ports can be used.
This should make the tests more robust at the expense of allowing
tests to be sloppier by not requiring them to cleanup after themselves.
(It will legitamitely help when running two test suites simultaneously
or if another process is already using one of the predefined ports.)
Also simplifies (slightLy) the exception handling elsewhere.
........
r58459 | neal.norwitz | 2007-10-14 11:30:21 -0700 (Sun, 14 Oct 2007) | 2 lines
Don't raise a string exception, they don't work anymore.
........
r58460 | neal.norwitz | 2007-10-14 11:40:37 -0700 (Sun, 14 Oct 2007) | 1 line
Use unittest for assertions
........
r58468 | armin.rigo | 2007-10-15 00:48:35 -0700 (Mon, 15 Oct 2007) | 2 lines
test_bigbits was not testing what it seemed to.
........
r58471 | guido.van.rossum | 2007-10-15 08:54:11 -0700 (Mon, 15 Oct 2007) | 3 lines
Change a PyErr_Print() into a PyErr_Clear(),
per discussion in issue 1031213.
........
r58500 | raymond.hettinger | 2007-10-16 12:18:30 -0700 (Tue, 16 Oct 2007) | 1 line
Improve error messages
........
r58506 | raymond.hettinger | 2007-10-16 14:28:32 -0700 (Tue, 16 Oct 2007) | 1 line
More docs, error messages, and tests
........
r58507 | andrew.kuchling | 2007-10-16 15:58:03 -0700 (Tue, 16 Oct 2007) | 1 line
Add items
........
r58508 | brett.cannon | 2007-10-16 16:24:06 -0700 (Tue, 16 Oct 2007) | 3 lines
Remove ``:const:`` notation on None in parameter list. Since the markup is not
rendered for parameters it just showed up as ``:const:`None` `` in the output.
........
r58509 | brett.cannon | 2007-10-16 16:26:45 -0700 (Tue, 16 Oct 2007) | 3 lines
Re-order some functions whose parameters differ between PyObject and const char
* so that they are next to each other.
........
r58522 | armin.rigo | 2007-10-17 11:46:37 -0700 (Wed, 17 Oct 2007) | 5 lines
Fix the overflow checking of list_repeat.
Introduce overflow checking into list_inplace_repeat.
Backport candidate, possibly.
........
r58530 | facundo.batista | 2007-10-17 20:16:03 -0700 (Wed, 17 Oct 2007) | 7 lines
Issue #1580738. When HTTPConnection reads the whole stream with read(),
it closes itself. When the stream is read in several calls to read(n),
it should behave in the same way if HTTPConnection knows where the end
of the stream is (through self.length). Added a test case for this
behaviour.
........
r58531 | facundo.batista | 2007-10-17 20:44:48 -0700 (Wed, 17 Oct 2007) | 3 lines
Issue 1289, just a typo.
........
r58532 | gregory.p.smith | 2007-10-18 00:56:54 -0700 (Thu, 18 Oct 2007) | 4 lines
cleanup test_dbtables to use mkdtemp. cleanup dbtables to pass txn as a
keyword argument whenever possible to avoid bugs and confusion. (dbtables.py
line 447 self.db.get using txn as a non-keyword was an actual bug due to this)
........
r58533 | gregory.p.smith | 2007-10-18 01:34:20 -0700 (Thu, 18 Oct 2007) | 4 lines
Fix a weird bug in dbtables: if it chose a random rowid string that contained
NULL bytes it would cause the database all sorts of problems in the future
leading to very strange random failures and corrupt dbtables.bsdTableDb dbs.
........
r58534 | gregory.p.smith | 2007-10-18 09:32:02 -0700 (Thu, 18 Oct 2007) | 3 lines
A cleaner fix than the one committed last night. Generate random rowids that
do not contain null bytes.
........
r58537 | gregory.p.smith | 2007-10-18 10:17:57 -0700 (Thu, 18 Oct 2007) | 2 lines
mention bsddb fixes.
........
r58538 | raymond.hettinger | 2007-10-18 14:13:06 -0700 (Thu, 18 Oct 2007) | 1 line
Remove useless warning
........
r58539 | gregory.p.smith | 2007-10-19 00:31:20 -0700 (Fri, 19 Oct 2007) | 2 lines
squelch the warning that this test is supposed to trigger.
........
r58542 | georg.brandl | 2007-10-19 05:32:39 -0700 (Fri, 19 Oct 2007) | 2 lines
Clarify wording for apply().
........
r58544 | mark.summerfield | 2007-10-19 05:48:17 -0700 (Fri, 19 Oct 2007) | 3 lines
Added a cross-ref to each other.
........
r58545 | georg.brandl | 2007-10-19 10:38:49 -0700 (Fri, 19 Oct 2007) | 2 lines
#1284: "S" means "seen", not unread.
........
r58548 | thomas.heller | 2007-10-19 11:11:41 -0700 (Fri, 19 Oct 2007) | 4 lines
Fix ctypes on 32-bit systems when Python is configured --with-system-ffi.
See also https://bugs.launchpad.net/bugs/72505.
Ported from release25-maint branch.
........
r58550 | facundo.batista | 2007-10-19 12:25:57 -0700 (Fri, 19 Oct 2007) | 8 lines
The constructor from tuple was way too permissive: it allowed bad
coefficient numbers, floats in the sign, and other details that
generated directly the wrong number in the best case, or triggered
misfunctionality in the alorithms.
Test cases added for these issues. Thanks Mark Dickinson.
........
r58559 | georg.brandl | 2007-10-20 06:22:53 -0700 (Sat, 20 Oct 2007) | 2 lines
Fix code being interpreted as a target.
........
r58561 | georg.brandl | 2007-10-20 06:36:24 -0700 (Sat, 20 Oct 2007) | 2 lines
Document new "cmdoption" directive.
........
r58562 | georg.brandl | 2007-10-20 08:21:22 -0700 (Sat, 20 Oct 2007) | 2 lines
Make a path more Unix-standardy.
........
r58564 | georg.brandl | 2007-10-20 10:51:39 -0700 (Sat, 20 Oct 2007) | 2 lines
Document new directive "envvar".
........
r58567 | georg.brandl | 2007-10-20 11:08:14 -0700 (Sat, 20 Oct 2007) | 6 lines
* Add new toplevel chapter, "Using Python." (how to install,
configure and setup python on different platforms -- at least
in theory.)
* Move the Python on Mac docs in that chapter.
* Add a new chapter about the command line invocation, by stargaming.
........
r58568 | georg.brandl | 2007-10-20 11:33:20 -0700 (Sat, 20 Oct 2007) | 2 lines
Change title, for now.
........
r58569 | georg.brandl | 2007-10-20 11:39:25 -0700 (Sat, 20 Oct 2007) | 2 lines
Add entry to ACKS.
........
r58570 | georg.brandl | 2007-10-20 12:05:45 -0700 (Sat, 20 Oct 2007) | 2 lines
Clarify -E docs.
........
r58571 | georg.brandl | 2007-10-20 12:08:36 -0700 (Sat, 20 Oct 2007) | 2 lines
Even more clarification.
........
r58572 | andrew.kuchling | 2007-10-20 12:25:37 -0700 (Sat, 20 Oct 2007) | 1 line
Fix protocol name
........
r58573 | andrew.kuchling | 2007-10-20 12:35:18 -0700 (Sat, 20 Oct 2007) | 1 line
Various items
........
r58574 | andrew.kuchling | 2007-10-20 12:39:35 -0700 (Sat, 20 Oct 2007) | 1 line
Use correct header line
........
r58576 | armin.rigo | 2007-10-21 02:14:15 -0700 (Sun, 21 Oct 2007) | 3 lines
Add a crasher for the long-standing issue with closing a file
while another thread uses it.
........
r58577 | georg.brandl | 2007-10-21 03:01:56 -0700 (Sun, 21 Oct 2007) | 2 lines
Remove duplicate crasher.
........
r58578 | georg.brandl | 2007-10-21 03:24:20 -0700 (Sun, 21 Oct 2007) | 2 lines
Unify "byte code" to "bytecode". Also sprinkle :term: markup for it.
........
r58579 | georg.brandl | 2007-10-21 03:32:54 -0700 (Sun, 21 Oct 2007) | 2 lines
Add markup to new function descriptions.
........
r58580 | georg.brandl | 2007-10-21 03:45:46 -0700 (Sun, 21 Oct 2007) | 2 lines
Add :term:s for descriptors.
........
r58581 | georg.brandl | 2007-10-21 03:46:24 -0700 (Sun, 21 Oct 2007) | 2 lines
Unify "file-descriptor" to "file descriptor".
........
r58582 | georg.brandl | 2007-10-21 03:52:38 -0700 (Sun, 21 Oct 2007) | 2 lines
Add :term: for generators.
........
r58583 | georg.brandl | 2007-10-21 05:10:28 -0700 (Sun, 21 Oct 2007) | 2 lines
Add :term:s for iterator.
........
r58584 | georg.brandl | 2007-10-21 05:15:05 -0700 (Sun, 21 Oct 2007) | 2 lines
Add :term:s for "new-style class".
........
r58588 | neal.norwitz | 2007-10-21 21:47:54 -0700 (Sun, 21 Oct 2007) | 1 line
Add Chris Monson so he can edit PEPs.
........
r58594 | guido.van.rossum | 2007-10-22 09:27:19 -0700 (Mon, 22 Oct 2007) | 4 lines
Issue #1307, patch by Derek Shockey.
When "MAIL" is received without args, an exception happens instead of
sending a 501 syntax error response.
........
r58598 | travis.oliphant | 2007-10-22 19:40:56 -0700 (Mon, 22 Oct 2007) | 1 line
Add phuang patch from Issue 708374 which adds offset parameter to mmap module.
........
r58601 | neal.norwitz | 2007-10-22 22:44:27 -0700 (Mon, 22 Oct 2007) | 2 lines
Bug #1313, fix typo (wrong variable name) in example.
........
r58609 | georg.brandl | 2007-10-23 11:21:35 -0700 (Tue, 23 Oct 2007) | 2 lines
Update Pygments version from externals.
........
r58618 | guido.van.rossum | 2007-10-23 12:25:41 -0700 (Tue, 23 Oct 2007) | 3 lines
Issue 1307 by Derek Shockey, fox the same bug for RCPT.
Neal: please backport!
........
r58620 | raymond.hettinger | 2007-10-23 13:37:41 -0700 (Tue, 23 Oct 2007) | 1 line
Shorter name for namedtuple()
........
r58621 | andrew.kuchling | 2007-10-23 13:55:47 -0700 (Tue, 23 Oct 2007) | 1 line
Update name
........
r58622 | raymond.hettinger | 2007-10-23 14:23:07 -0700 (Tue, 23 Oct 2007) | 1 line
Fixup news entry
........
r58623 | raymond.hettinger | 2007-10-23 18:28:33 -0700 (Tue, 23 Oct 2007) | 1 line
Optimize sum() for integer and float inputs.
........
r58624 | raymond.hettinger | 2007-10-23 19:05:51 -0700 (Tue, 23 Oct 2007) | 1 line
Fixup error return and add support for intermixed ints and floats/
........
r58628 | vinay.sajip | 2007-10-24 03:47:06 -0700 (Wed, 24 Oct 2007) | 1 line
Bug #1321: Fixed logic error in TimedRotatingFileHandler.__init__()
........
r58641 | facundo.batista | 2007-10-24 12:11:08 -0700 (Wed, 24 Oct 2007) | 4 lines
Issue 1290. CharacterData.__repr__ was constructing a string
in response that keeped having a non-ascii character.
........
r58643 | thomas.heller | 2007-10-24 12:50:45 -0700 (Wed, 24 Oct 2007) | 1 line
Added unittest for calling a function with paramflags (backport from py3k branch).
........
r58645 | matthias.klose | 2007-10-24 13:00:44 -0700 (Wed, 24 Oct 2007) | 2 lines
- Build using system ffi library on arm*-linux*.
........
r58651 | georg.brandl | 2007-10-24 14:40:38 -0700 (Wed, 24 Oct 2007) | 2 lines
Bug #1287: make os.environ.pop() work as expected.
........
r58652 | raymond.hettinger | 2007-10-24 19:26:58 -0700 (Wed, 24 Oct 2007) | 1 line
Missing DECREFs
........
r58653 | matthias.klose | 2007-10-24 23:37:24 -0700 (Wed, 24 Oct 2007) | 2 lines
- Build using system ffi library on arm*-linux*, pass --with-system-ffi to CONFIG_ARGS
........
r58655 | thomas.heller | 2007-10-25 12:47:32 -0700 (Thu, 25 Oct 2007) | 2 lines
ffi_type_longdouble may be already #defined.
See issue 1324.
........
r58656 | kurt.kaiser | 2007-10-25 15:43:45 -0700 (Thu, 25 Oct 2007) | 3 lines
Correct an ancient bug in an unused path by removing that path: register() is
now idempotent.
........
r58660 | kurt.kaiser | 2007-10-25 17:10:09 -0700 (Thu, 25 Oct 2007) | 4 lines
1. Add comments to provide top-level documentation.
2. Refactor to use more descriptive names.
3. Enhance tests in main().
........
r58675 | georg.brandl | 2007-10-26 11:30:41 -0700 (Fri, 26 Oct 2007) | 2 lines
Fix new pop() method on os.environ on ignorecase-platforms.
........
r58696 | neal.norwitz | 2007-10-27 15:32:21 -0700 (Sat, 27 Oct 2007) | 1 line
Update URL for Pygments. 0.8.1 is no longer available
........
r58697 | hyeshik.chang | 2007-10-28 04:19:02 -0700 (Sun, 28 Oct 2007) | 3 lines
- Add support for FreeBSD 8 which is recently forked from FreeBSD 7.
- Regenerate IN module for most recent maintenance tree of FreeBSD 6 and 7.
........
r58698 | hyeshik.chang | 2007-10-28 05:38:09 -0700 (Sun, 28 Oct 2007) | 2 lines
Enable platform-specific tweaks for FreeBSD 8 (exactly same to FreeBSD 7's yet)
........
r58700 | kurt.kaiser | 2007-10-28 12:03:59 -0700 (Sun, 28 Oct 2007) | 2 lines
Add confirmation dialog before printing. Patch 1717170 Tal Einat.
........
r58706 | guido.van.rossum | 2007-10-29 13:52:45 -0700 (Mon, 29 Oct 2007) | 3 lines
Patch 1353 by Jacob Winther.
Add mp4 mapping to mimetypes.py.
........
r58709 | guido.van.rossum | 2007-10-29 15:15:05 -0700 (Mon, 29 Oct 2007) | 6 lines
Backport fixes for the code that decodes octal escapes (and for PyString
also hex escapes) -- this was reaching beyond the end of the input string
buffer, even though it is not supposed to be \0-terminated.
This has no visible effect but is clearly the correct thing to do.
(In 3.0 it had a visible effect after removing ob_sstate from PyString.)
........
r58710 | kurt.kaiser | 2007-10-29 19:38:54 -0700 (Mon, 29 Oct 2007) | 7 lines
check in Tal Einat's update to tabpage.py
Patch 1612746
M configDialog.py
M NEWS.txt
AM tabbedpages.py
........
r58715 | georg.brandl | 2007-10-30 10:51:18 -0700 (Tue, 30 Oct 2007) | 2 lines
Use correct markup.
........
r58716 | georg.brandl | 2007-10-30 10:57:12 -0700 (Tue, 30 Oct 2007) | 2 lines
Make example about hiding None return values at the prompt clearer.
........
r58728 | neal.norwitz | 2007-10-30 23:33:20 -0700 (Tue, 30 Oct 2007) | 1 line
Fix some compiler warnings for signed comparisons on Unix and Windows.
........
r58731 | martin.v.loewis | 2007-10-31 10:19:33 -0700 (Wed, 31 Oct 2007) | 2 lines
Adding Christian Heimes.
........
r58737 | raymond.hettinger | 2007-10-31 14:57:58 -0700 (Wed, 31 Oct 2007) | 1 line
Clarify the reasons why pickle is almost always better than marshal
........
r58739 | raymond.hettinger | 2007-10-31 15:15:49 -0700 (Wed, 31 Oct 2007) | 1 line
Sets are marshalable.
........
2007-11-01 17:32:30 -03:00
|
|
|
:meth:`geturl`. It also has proper support for the :term:`iterator` protocol. One
|
2007-08-15 11:28:22 -03:00
|
|
|
caveat: the :meth:`read` method, if the size argument is omitted or negative,
|
|
|
|
may not read until the end of the data stream; there is no good way to determine
|
|
|
|
that the entire stream from a socket has been read in the general case.
|
|
|
|
|
|
|
|
Except for the :meth:`info` and :meth:`geturl` methods, these methods have the
|
|
|
|
same interface as for file objects --- see section :ref:`bltin-file-objects` in
|
|
|
|
this manual. (It is not a built-in file object, however, so it can't be used at
|
|
|
|
those few places where a true built-in file object is required.)
|
|
|
|
|
|
|
|
.. index:: module: mimetools
|
|
|
|
|
|
|
|
The :meth:`info` method returns an instance of the class
|
|
|
|
:class:`mimetools.Message` containing meta-information associated with the
|
|
|
|
URL. When the method is HTTP, these headers are those returned by the server
|
|
|
|
at the head of the retrieved HTML page (including Content-Length and
|
|
|
|
Content-Type). When the method is FTP, a Content-Length header will be
|
|
|
|
present if (as is now usual) the server passed back a file length in response
|
|
|
|
to the FTP retrieval request. A Content-Type header will be present if the
|
|
|
|
MIME type can be guessed. When the method is local-file, returned headers
|
|
|
|
will include a Date representing the file's last-modified time, a
|
|
|
|
Content-Length giving file size, and a Content-Type containing a guess at the
|
|
|
|
file's type. See also the description of the :mod:`mimetools` module.
|
|
|
|
|
|
|
|
The :meth:`geturl` method returns the real URL of the page. In some cases, the
|
|
|
|
HTTP server redirects a client to another URL. The :func:`urlopen` function
|
|
|
|
handles this transparently, but in some cases the caller needs to know which URL
|
|
|
|
the client was redirected to. The :meth:`geturl` method can be used to get at
|
|
|
|
this redirected URL.
|
|
|
|
|
|
|
|
If the *url* uses the :file:`http:` scheme identifier, the optional *data*
|
|
|
|
argument may be given to specify a ``POST`` request (normally the request type
|
|
|
|
is ``GET``). The *data* argument must be in standard
|
|
|
|
:mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
|
|
|
|
function below.
|
|
|
|
|
|
|
|
The :func:`urlopen` function works transparently with proxies which do not
|
|
|
|
require authentication. In a Unix or Windows environment, set the
|
|
|
|
:envvar:`http_proxy`, or :envvar:`ftp_proxy` environment variables to a URL that
|
|
|
|
identifies the proxy server before starting the Python interpreter. For example
|
|
|
|
(the ``'%'`` is the command prompt)::
|
|
|
|
|
|
|
|
% http_proxy="http://www.someproxy.com:3128"
|
|
|
|
% export http_proxy
|
|
|
|
% python
|
|
|
|
...
|
|
|
|
|
|
|
|
In a Windows environment, if no proxy environment variables are set, proxy
|
|
|
|
settings are obtained from the registry's Internet Settings section.
|
|
|
|
|
|
|
|
.. index:: single: Internet Config
|
|
|
|
|
|
|
|
In a Macintosh environment, :func:`urlopen` will retrieve proxy information from
|
|
|
|
Internet Config.
|
|
|
|
|
|
|
|
Alternatively, the optional *proxies* argument may be used to explicitly specify
|
|
|
|
proxies. It must be a dictionary mapping scheme names to proxy URLs, where an
|
|
|
|
empty dictionary causes no proxies to be used, and ``None`` (the default value)
|
|
|
|
causes environmental proxy settings to be used as discussed above. For
|
|
|
|
example::
|
|
|
|
|
|
|
|
# Use http://www.someproxy.com:3128 for http proxying
|
|
|
|
proxies = {'http': 'http://www.someproxy.com:3128'}
|
|
|
|
filehandle = urllib.urlopen(some_url, proxies=proxies)
|
|
|
|
# Don't use any proxies
|
|
|
|
filehandle = urllib.urlopen(some_url, proxies={})
|
|
|
|
# Use proxies from environment - both versions are equivalent
|
|
|
|
filehandle = urllib.urlopen(some_url, proxies=None)
|
|
|
|
filehandle = urllib.urlopen(some_url)
|
|
|
|
|
|
|
|
The :func:`urlopen` function does not support explicit proxy specification. If
|
|
|
|
you need to override environmental proxy settings, use :class:`URLopener`, or a
|
|
|
|
subclass such as :class:`FancyURLopener`.
|
|
|
|
|
|
|
|
Proxies which require authentication for use are not currently supported; this
|
|
|
|
is considered an implementation limitation.
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: urlretrieve(url[, filename[, reporthook[, data]]])
|
|
|
|
|
|
|
|
Copy a network object denoted by a URL to a local file, if necessary. If the URL
|
|
|
|
points to a local file, or a valid cached copy of the object exists, the object
|
|
|
|
is not copied. Return a tuple ``(filename, headers)`` where *filename* is the
|
|
|
|
local file name under which the object can be found, and *headers* is whatever
|
|
|
|
the :meth:`info` method of the object returned by :func:`urlopen` returned (for
|
|
|
|
a remote object, possibly cached). Exceptions are the same as for
|
|
|
|
:func:`urlopen`.
|
|
|
|
|
|
|
|
The second argument, if present, specifies the file location to copy to (if
|
|
|
|
absent, the location will be a tempfile with a generated name). The third
|
|
|
|
argument, if present, is a hook function that will be called once on
|
|
|
|
establishment of the network connection and once after each block read
|
|
|
|
thereafter. The hook will be passed three arguments; a count of blocks
|
|
|
|
transferred so far, a block size in bytes, and the total size of the file. The
|
|
|
|
third argument may be ``-1`` on older FTP servers which do not return a file
|
|
|
|
size in response to a retrieval request.
|
|
|
|
|
|
|
|
If the *url* uses the :file:`http:` scheme identifier, the optional *data*
|
|
|
|
argument may be given to specify a ``POST`` request (normally the request type
|
|
|
|
is ``GET``). The *data* argument must in standard
|
|
|
|
:mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
|
|
|
|
function below.
|
|
|
|
|
2007-09-01 10:51:09 -03:00
|
|
|
:func:`urlretrieve` will raise :exc:`ContentTooShortError` when it detects that
|
|
|
|
the amount of data available was less than the expected amount (which is the
|
|
|
|
size reported by a *Content-Length* header). This can occur, for example, when
|
|
|
|
the download is interrupted.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2007-09-01 10:51:09 -03:00
|
|
|
The *Content-Length* is treated as a lower bound: if there's more data to read,
|
|
|
|
urlretrieve reads more data, but if less data is available, it raises the
|
|
|
|
exception.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2007-09-01 10:51:09 -03:00
|
|
|
You can still retrieve the downloaded data in this case, it is stored in the
|
|
|
|
:attr:`content` attribute of the exception instance.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
2007-09-01 10:51:09 -03:00
|
|
|
If no *Content-Length* header was supplied, urlretrieve can not check the size
|
|
|
|
of the data it has downloaded, and just returns it. In this case you just have
|
|
|
|
to assume that the download was successful.
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
|
|
|
|
.. data:: _urlopener
|
|
|
|
|
|
|
|
The public functions :func:`urlopen` and :func:`urlretrieve` create an instance
|
|
|
|
of the :class:`FancyURLopener` class and use it to perform their requested
|
|
|
|
actions. To override this functionality, programmers can create a subclass of
|
|
|
|
:class:`URLopener` or :class:`FancyURLopener`, then assign an instance of that
|
|
|
|
class to the ``urllib._urlopener`` variable before calling the desired function.
|
|
|
|
For example, applications may want to specify a different
|
|
|
|
:mailheader:`User-Agent` header than :class:`URLopener` defines. This can be
|
|
|
|
accomplished with the following code::
|
|
|
|
|
|
|
|
import urllib
|
|
|
|
|
|
|
|
class AppURLopener(urllib.FancyURLopener):
|
|
|
|
version = "App/1.7"
|
|
|
|
|
|
|
|
urllib._urlopener = AppURLopener()
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: urlcleanup()
|
|
|
|
|
|
|
|
Clear the cache that may have been built up by previous calls to
|
|
|
|
:func:`urlretrieve`.
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: quote(string[, safe])
|
|
|
|
|
|
|
|
Replace special characters in *string* using the ``%xx`` escape. Letters,
|
|
|
|
digits, and the characters ``'_.-'`` are never quoted. The optional *safe*
|
|
|
|
parameter specifies additional characters that should not be quoted --- its
|
|
|
|
default value is ``'/'``.
|
|
|
|
|
|
|
|
Example: ``quote('/~connolly/')`` yields ``'/%7econnolly/'``.
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: quote_plus(string[, safe])
|
|
|
|
|
|
|
|
Like :func:`quote`, but also replaces spaces by plus signs, as required for
|
|
|
|
quoting HTML form values. Plus signs in the original string are escaped unless
|
|
|
|
they are included in *safe*. It also does not have *safe* default to ``'/'``.
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: unquote(string)
|
|
|
|
|
|
|
|
Replace ``%xx`` escapes by their single-character equivalent.
|
|
|
|
|
|
|
|
Example: ``unquote('/%7Econnolly/')`` yields ``'/~connolly/'``.
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: unquote_plus(string)
|
|
|
|
|
|
|
|
Like :func:`unquote`, but also replaces plus signs by spaces, as required for
|
|
|
|
unquoting HTML form values.
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: urlencode(query[, doseq])
|
|
|
|
|
|
|
|
Convert a mapping object or a sequence of two-element tuples to a "url-encoded"
|
|
|
|
string, suitable to pass to :func:`urlopen` above as the optional *data*
|
|
|
|
argument. This is useful to pass a dictionary of form fields to a ``POST``
|
|
|
|
request. The resulting string is a series of ``key=value`` pairs separated by
|
|
|
|
``'&'`` characters, where both *key* and *value* are quoted using
|
|
|
|
:func:`quote_plus` above. If the optional parameter *doseq* is present and
|
|
|
|
evaluates to true, individual ``key=value`` pairs are generated for each element
|
|
|
|
of the sequence. When a sequence of two-element tuples is used as the *query*
|
|
|
|
argument, the first element of each tuple is a key and the second is a value.
|
|
|
|
The order of parameters in the encoded string will match the order of parameter
|
|
|
|
tuples in the sequence. The :mod:`cgi` module provides the functions
|
|
|
|
:func:`parse_qs` and :func:`parse_qsl` which are used to parse query strings
|
|
|
|
into Python data structures.
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: pathname2url(path)
|
|
|
|
|
|
|
|
Convert the pathname *path* from the local syntax for a path to the form used in
|
|
|
|
the path component of a URL. This does not produce a complete URL. The return
|
|
|
|
value will already be quoted using the :func:`quote` function.
|
|
|
|
|
|
|
|
|
|
|
|
.. function:: url2pathname(path)
|
|
|
|
|
|
|
|
Convert the path component *path* from an encoded URL to the local syntax for a
|
|
|
|
path. This does not accept a complete URL. This function uses :func:`unquote`
|
|
|
|
to decode *path*.
|
|
|
|
|
|
|
|
|
|
|
|
.. class:: URLopener([proxies[, **x509]])
|
|
|
|
|
|
|
|
Base class for opening and reading URLs. Unless you need to support opening
|
|
|
|
objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`,
|
|
|
|
you probably want to use :class:`FancyURLopener`.
|
|
|
|
|
|
|
|
By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header
|
|
|
|
of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number.
|
|
|
|
Applications can define their own :mailheader:`User-Agent` header by subclassing
|
|
|
|
:class:`URLopener` or :class:`FancyURLopener` and setting the class attribute
|
|
|
|
:attr:`version` to an appropriate string value in the subclass definition.
|
|
|
|
|
|
|
|
The optional *proxies* parameter should be a dictionary mapping scheme names to
|
|
|
|
proxy URLs, where an empty dictionary turns proxies off completely. Its default
|
|
|
|
value is ``None``, in which case environmental proxy settings will be used if
|
|
|
|
present, as discussed in the definition of :func:`urlopen`, above.
|
|
|
|
|
|
|
|
Additional keyword parameters, collected in *x509*, may be used for
|
|
|
|
authentication of the client when using the :file:`https:` scheme. The keywords
|
|
|
|
*key_file* and *cert_file* are supported to provide an SSL key and certificate;
|
|
|
|
both are needed to support client authentication.
|
|
|
|
|
|
|
|
:class:`URLopener` objects will raise an :exc:`IOError` exception if the server
|
|
|
|
returns an error code.
|
|
|
|
|
|
|
|
|
|
|
|
.. class:: FancyURLopener(...)
|
|
|
|
|
|
|
|
:class:`FancyURLopener` subclasses :class:`URLopener` providing default handling
|
|
|
|
for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x
|
|
|
|
response codes listed above, the :mailheader:`Location` header is used to fetch
|
|
|
|
the actual URL. For 401 response codes (authentication required), basic HTTP
|
|
|
|
authentication is performed. For the 30x response codes, recursion is bounded
|
|
|
|
by the value of the *maxtries* attribute, which defaults to 10.
|
|
|
|
|
|
|
|
For all other response codes, the method :meth:`http_error_default` is called
|
|
|
|
which you can override in subclasses to handle the error appropriately.
|
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests
|
|
|
|
must not be automatically redirected without confirmation by the user. In
|
|
|
|
reality, browsers do allow automatic redirection of these responses, changing
|
|
|
|
the POST to a GET, and :mod:`urllib` reproduces this behaviour.
|
|
|
|
|
|
|
|
The parameters to the constructor are the same as those for :class:`URLopener`.
|
|
|
|
|
|
|
|
.. note::
|
|
|
|
|
|
|
|
When performing basic authentication, a :class:`FancyURLopener` instance calls
|
|
|
|
its :meth:`prompt_user_passwd` method. The default implementation asks the
|
|
|
|
users for the required information on the controlling terminal. A subclass may
|
|
|
|
override this method to support more appropriate behavior if needed.
|
|
|
|
|
|
|
|
|
|
|
|
.. exception:: ContentTooShortError(msg[, content])
|
|
|
|
|
|
|
|
This exception is raised when the :func:`urlretrieve` function detects that the
|
|
|
|
amount of the downloaded data is less than the expected amount (given by the
|
|
|
|
*Content-Length* header). The :attr:`content` attribute stores the downloaded
|
|
|
|
(and supposedly truncated) data.
|
|
|
|
|
|
|
|
Restrictions:
|
|
|
|
|
|
|
|
.. index::
|
|
|
|
pair: HTTP; protocol
|
|
|
|
pair: FTP; protocol
|
|
|
|
|
|
|
|
* Currently, only the following protocols are supported: HTTP, (versions 0.9 and
|
|
|
|
1.0), FTP, and local files.
|
|
|
|
|
|
|
|
* The caching feature of :func:`urlretrieve` has been disabled until I find the
|
|
|
|
time to hack proper processing of Expiration time headers.
|
|
|
|
|
|
|
|
* There should be a function to query whether a particular URL is in the cache.
|
|
|
|
|
|
|
|
* For backward compatibility, if a URL appears to point to a local file but the
|
|
|
|
file can't be opened, the URL is re-interpreted using the FTP protocol. This
|
|
|
|
can sometimes cause confusing error messages.
|
|
|
|
|
|
|
|
* The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily
|
|
|
|
long delays while waiting for a network connection to be set up. This means
|
|
|
|
that it is difficult to build an interactive Web client using these functions
|
|
|
|
without using threads.
|
|
|
|
|
|
|
|
.. index::
|
|
|
|
single: HTML
|
|
|
|
pair: HTTP; protocol
|
|
|
|
module: htmllib
|
|
|
|
|
|
|
|
* The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data
|
|
|
|
returned by the server. This may be binary data (such as an image), plain text
|
|
|
|
or (for example) HTML. The HTTP protocol provides type information in the reply
|
|
|
|
header, which can be inspected by looking at the :mailheader:`Content-Type`
|
|
|
|
header. If the returned data is HTML, you can use the module :mod:`htmllib` to
|
|
|
|
parse it.
|
|
|
|
|
|
|
|
.. index:: single: FTP
|
|
|
|
|
|
|
|
* The code handling the FTP protocol cannot differentiate between a file and a
|
|
|
|
directory. This can lead to unexpected behavior when attempting to read a URL
|
|
|
|
that points to a file that is not accessible. If the URL ends in a ``/``, it is
|
|
|
|
assumed to refer to a directory and will be handled accordingly. But if an
|
|
|
|
attempt to read a file leads to a 550 error (meaning the URL cannot be found or
|
|
|
|
is not accessible, often for permission reasons), then the path is treated as a
|
|
|
|
directory in order to handle the case when a directory is specified by a URL but
|
|
|
|
the trailing ``/`` has been left off. This can cause misleading results when
|
|
|
|
you try to fetch a file whose read permissions make it inaccessible; the FTP
|
|
|
|
code will try to read it, fail with a 550 error, and then perform a directory
|
|
|
|
listing for the unreadable file. If fine-grained control is needed, consider
|
|
|
|
using the :mod:`ftplib` module, subclassing :class:`FancyURLOpener`, or changing
|
|
|
|
*_urlopener* to meet your needs.
|
|
|
|
|
|
|
|
* This module does not support the use of proxies which require authentication.
|
|
|
|
This may be implemented in the future.
|
|
|
|
|
|
|
|
.. index:: module: urlparse
|
|
|
|
|
|
|
|
* Although the :mod:`urllib` module contains (undocumented) routines to parse
|
|
|
|
and unparse URL strings, the recommended interface for URL manipulation is in
|
|
|
|
module :mod:`urlparse`.
|
|
|
|
|
|
|
|
|
|
|
|
.. _urlopener-objs:
|
|
|
|
|
|
|
|
URLopener Objects
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
.. sectionauthor:: Skip Montanaro <skip@mojam.com>
|
|
|
|
|
|
|
|
|
|
|
|
:class:`URLopener` and :class:`FancyURLopener` objects have the following
|
|
|
|
attributes.
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: URLopener.open(fullurl[, data])
|
|
|
|
|
|
|
|
Open *fullurl* using the appropriate protocol. This method sets up cache and
|
|
|
|
proxy information, then calls the appropriate open method with its input
|
|
|
|
arguments. If the scheme is not recognized, :meth:`open_unknown` is called.
|
|
|
|
The *data* argument has the same meaning as the *data* argument of
|
|
|
|
:func:`urlopen`.
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: URLopener.open_unknown(fullurl[, data])
|
|
|
|
|
|
|
|
Overridable interface to open unknown URL types.
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: URLopener.retrieve(url[, filename[, reporthook[, data]]])
|
|
|
|
|
|
|
|
Retrieves the contents of *url* and places it in *filename*. The return value
|
|
|
|
is a tuple consisting of a local filename and either a
|
|
|
|
:class:`mimetools.Message` object containing the response headers (for remote
|
|
|
|
URLs) or ``None`` (for local URLs). The caller must then open and read the
|
|
|
|
contents of *filename*. If *filename* is not given and the URL refers to a
|
|
|
|
local file, the input filename is returned. If the URL is non-local and
|
|
|
|
*filename* is not given, the filename is the output of :func:`tempfile.mktemp`
|
|
|
|
with a suffix that matches the suffix of the last path component of the input
|
|
|
|
URL. If *reporthook* is given, it must be a function accepting three numeric
|
|
|
|
parameters. It will be called after each chunk of data is read from the
|
|
|
|
network. *reporthook* is ignored for local URLs.
|
|
|
|
|
|
|
|
If the *url* uses the :file:`http:` scheme identifier, the optional *data*
|
|
|
|
argument may be given to specify a ``POST`` request (normally the request type
|
|
|
|
is ``GET``). The *data* argument must in standard
|
|
|
|
:mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
|
|
|
|
function below.
|
|
|
|
|
|
|
|
|
|
|
|
.. attribute:: URLopener.version
|
|
|
|
|
|
|
|
Variable that specifies the user agent of the opener object. To get
|
|
|
|
:mod:`urllib` to tell servers that it is a particular user agent, set this in a
|
|
|
|
subclass as a class variable or in the constructor before calling the base
|
|
|
|
constructor.
|
|
|
|
|
|
|
|
The :class:`FancyURLopener` class offers one additional method that should be
|
|
|
|
overloaded to provide the appropriate behavior:
|
|
|
|
|
|
|
|
|
|
|
|
.. method:: FancyURLopener.prompt_user_passwd(host, realm)
|
|
|
|
|
|
|
|
Return information needed to authenticate the user at the given host in the
|
|
|
|
specified security realm. The return value should be a tuple, ``(user,
|
|
|
|
password)``, which can be used for basic authentication.
|
|
|
|
|
|
|
|
The implementation prompts for this information on the terminal; an application
|
|
|
|
should override this method to use an appropriate interaction model in the local
|
|
|
|
environment.
|
|
|
|
|
|
|
|
|
|
|
|
.. _urllib-examples:
|
|
|
|
|
|
|
|
Examples
|
|
|
|
--------
|
|
|
|
|
|
|
|
Here is an example session that uses the ``GET`` method to retrieve a URL
|
|
|
|
containing parameters::
|
|
|
|
|
|
|
|
>>> import urllib
|
|
|
|
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
|
|
|
|
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
|
2007-09-01 20:34:30 -03:00
|
|
|
>>> print(f.read())
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
The following example uses the ``POST`` method instead::
|
|
|
|
|
|
|
|
>>> import urllib
|
|
|
|
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
|
|
|
|
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
|
2007-09-01 20:34:30 -03:00
|
|
|
>>> print(f.read())
|
2007-08-15 11:28:22 -03:00
|
|
|
|
|
|
|
The following example uses an explicitly specified HTTP proxy, overriding
|
|
|
|
environment settings::
|
|
|
|
|
|
|
|
>>> import urllib
|
|
|
|
>>> proxies = {'http': 'http://proxy.example.com:8080/'}
|
|
|
|
>>> opener = urllib.FancyURLopener(proxies)
|
|
|
|
>>> f = opener.open("http://www.python.org")
|
|
|
|
>>> f.read()
|
|
|
|
|
|
|
|
The following example uses no proxies at all, overriding environment settings::
|
|
|
|
|
|
|
|
>>> import urllib
|
|
|
|
>>> opener = urllib.FancyURLopener({})
|
|
|
|
>>> f = opener.open("http://www.python.org/")
|
|
|
|
>>> f.read()
|
|
|
|
|