Commit Graph

46 Commits

Author SHA1 Message Date
Johannes Gijsbers 41e4faa82b Patch #712317: In URLs such as http://www.example.com?query=spam, treat '?' as
a delimiter. Previously, the 'network location' (<authority> in RFC 2396) would
become 'www.example.com?query=spam', while RFC 2396 does not allow a '?' in
<authority>. See bug #548176 for further discussion.
2005-01-09 15:29:10 +00:00
Raymond Hettinger a617271dbd Use cStringIO where available. 2004-12-31 19:15:26 +00:00
Brett Cannon fbac294d59 rsync is now a recognized protocol that uses "netloc" (i.e. specifies a network
location) in its addressing.

Closes bug #981299.
2004-06-29 04:02:40 +00:00
Raymond Hettinger 156c49ad1c Revert last change. 2004-05-07 05:50:35 +00:00
Raymond Hettinger 6924a00d10 Use sets instead of lists for membership testing 2004-05-06 16:55:07 +00:00
Brett Cannon 8da2a52dd6 See rev. 1.42 for log message 2003-10-12 04:29:10 +00:00
Martin v. Löwis 12a7f96aec Patch #712124: Remove obsolete comment. 2003-03-30 16:28:26 +00:00
Raymond Hettinger ef30dc872b Revert change 1.37.
The nanoseconds saved by using dict.fromkeys aren't
worth the loss in clarity.  Linear searches live on.
2003-01-07 02:09:16 +00:00
Skip Montanaro f09b88ee2f * add mms (windows media) as another scheme
* reformat schemes to 80 columns
2003-01-06 20:27:03 +00:00
Raymond Hettinger f2128b004c Used dictionaries rather than lists for membership testing. 2003-01-06 12:30:53 +00:00
Neal Norwitz 4f442372cc SF feature #618024, urlparse fails on imap:// 2003-01-06 06:51:36 +00:00
Fred Drake f606e8d705 Added missing entries to __all__. 2002-10-16 21:21:39 +00:00
Guido van Rossum bbc0568a5c Fix for 1.33: urlsplit() should only add '//' if scheme != ''.
Will add test and backport.
2002-10-14 19:59:54 +00:00
Neal Norwitz 7dfb6e295b Fix SF # 591713, Fix "file:" URL to have right no. of /'s, by Bruce Atherton
Add a test too.  urljoin() would make file:/tmp/foo instead of file:///tmp/foo

Bugfix candidate, I will backport.
2002-09-25 19:20:12 +00:00
Michael W. Hudson bd3e771a97 amk's fix attached to
[ 516299 ] urlparse can get fragments wrong
2002-03-18 13:06:00 +00:00
Fred Drake 5751a22ede Fix parsing of parameters from a URL; urlparse() did not check that it only
split parameters from the last path segment.  Introduces two new functions,
urlsplit() and urlunsplit(), that do the simpler job of splitting the URL
without monkeying around with the parameters field, since that was not being
handled properly.
This closes bug #478038.
2001-11-16 02:52:57 +00:00
Andrew M. Kuchling 3e44248483 Remove unused variable 2001-08-13 14:38:50 +00:00
Skip Montanaro 40fc16059f final round of __all__ lists (I hope) - skipped urllib2 because Moshe may be
giving it a slight facelift
2001-03-01 04:27:19 +00:00
Tim Peters e119006e7d Whitespace normalization. Top level of Lib now fixed-point for reindent.py! 2001-01-15 03:34:38 +00:00
Fred Drake 867952f6e4 urlunparse(): Do not add a leading slash to the path if it is empty.
urljoin():  Make this conform to RFC 1808 for all examples given in that
            RFC (both "Normal" and "Abnormal"), so long as that RFC does
            not conflict the older RFC 1630, which also specified
            relative URL resolution.

This closes SF bug #110832 (Jitterbug PR#194).
2001-01-05 05:54:41 +00:00
Guido van Rossum fad81f0838 Be explicit about scheme_chars -- string.letters is locale dependent
so we can't use it.

While I'm at it, got rid of string module use.  (Found several new
hard special cases for a hypothetical conversion tool: from string
import join, find, rfind; and a local assignment "find=string.find".)
2000-12-19 16:48:13 +00:00
Fred Drake bdd44a389b Pekka Pessi <Pekka.Pessi@nokia.com>:
Patch to add support for sip: (Session Initiation Protocol, RFC2543)
URLs.
2000-06-20 18:32:16 +00:00
Fred Drake 0556501a81 Anthony Baxter <anthony@interlink.com.au>:
The following adds support for RTSP (RFC2326) URLs to the standard
urlparse.py module.

(Augmented by FLD to include rtspu:, specified in the same RFC & OK'd
by Anthony.)
2000-04-14 14:01:34 +00:00
Guido van Rossum a25d7ddbf0 Some cleanup -- don't use splitfields/joinfields, standardize
indentation (tabs only), rationalize some code in urljoin...
2000-04-10 17:02:46 +00:00
Guido van Rossum e7b146fb3b The third and final doc-string sweep by Ka-Ping Yee.
The attached patches update the standard library so that all modules
have docstrings beginning with one-line summaries.

A new docstring was added to formatter.  The docstring for os.py
was updated to mention nt, os2, ce in addition to posix, dos, mac.
2000-02-04 15:28:42 +00:00
Guido van Rossum 4f13669cf0 No need to import find(). (Andrew Dalke & kjpylint) 1999-05-03 18:16:23 +00:00
Guido van Rossum f3963b1269 Sjoerd Mullender writes:
If a filename on Windows starts with \\, it is converted to a URL
which starts with ////.  If this URL is passed to urlparse.urlparse
you get a path that starts with // (and an empty netloc).  If you pass
the result back to urlparse.urlunparse, you get a URL that starts with
//, which is parsed differently by urlparse.urlparse.  The fix is to
add the (empty) netloc with accompanying slashes if the path in
urlunparse starts with //.  Do this for all schemes that use a netloc.
1999-03-18 15:10:44 +00:00
Guido van Rossum a2e18051b7 Delete non-standard-conforming code in urljoin() that would use the
netloc from the base url as the default netloc for the resulting url
even if the schemes differ.

Once upon a time, when the web was wild, this was a valuable hack
because some people had a URL referencing an ftp server colocated with
an http server without having the host in the ftp URL (so they could
replicate it or change the hostname easily).

More recently, after the file: scheme got added back to the list of
schemes that accept a netloc, it turns out that this caused weirdness
when joining an http: URL with a file: URL -- the resulting file: URL
would always inherit the host from the http: URL because the file:
scheme supports a netloc but in practice never has one.

There are two reasons to get rid of the old, once-valuable hack,
instead of removing the file: scheme from the uses_netloc list.  One,
the RFC says that file: uses the netloc syntax, and does not endorse
the old hack.  Two, neither netscape 4.5 nor IE 4.0 support the old
hack.
1999-03-17 22:30:10 +00:00
Guido van Rossum 974e32d910 Steve Clift pointed out that 'file' allows a netloc. 1999-02-22 15:38:46 +00:00
Andrew M. Kuchling 5c355201e2 Fixed bug in the common-case code for HTTP URLs; it would lose the query,
fragment, and/or parameter information.
3 cases added to the test suite to check for this bug.
1999-01-06 22:13:09 +00:00
Guido van Rossum c08cc50e00 Add XXX comment about a test that doesn't seem right -- no time to
explore this now.
1998-12-21 18:24:09 +00:00
Jeremy Hylton b85c8479eb Easy optimizations of urlparse for the common case of parsing an http URL.
1. use dict.get instead of try/except KeyError
2. if the url scheme is 'http' then avoid the series of
   'if var in [someseq]:'.  instead, inline all of the code.
3. find = string.find
1998-09-02 21:53:16 +00:00
Jeremy Hylton 4722da6ebf fix typo in keyword argument 'allow_frament' should be 'allow_fragment' 1998-08-25 19:45:24 +00:00
Guido van Rossum f7edadbc58 Add Gopher to list of protocols that support query strings. 1998-01-19 22:27:21 +00:00
Guido van Rossum e612be5926 Patch my Marc Lemburg to fix urljoin("/a", "..") and urljoin("/a", "..#1"). 1997-12-03 22:38:56 +00:00
Guido van Rossum 7449540986 After some discussion with Jeremy and Fred, decided to limit the
default urlparse cache size to 20 instead of 2000.  The main use of
the cache seems to be to gain some speed in Grail, which is calling
urljoin with the same base for each anchor.  2000 is a bit too big for
Jeremy, who doesn't need the cache at all.  20 should keep at least
95% of the Grail speedup while wasting an insignificant amount of
memory in Jeremy's application.
1997-07-14 19:08:15 +00:00
Guido van Rossum 185147f1d0 Test urlparse cache with try/except instead of has_key.
This makes it thread-safe again.
1997-07-11 20:13:10 +00:00
Guido van Rossum b02092a9b2 Added characteristics of shttp, https, and snews. 1997-01-02 18:18:27 +00:00
Guido van Rossum 671dc20efc Crude but effective hack to clear the parser cache every so often.
(Fred Drake.)
1996-12-27 15:26:15 +00:00
Guido van Rossum 3fd32ecd92 optimizations due to Fred Drake; added urldefrag() function 1996-05-28 23:54:24 +00:00
Guido van Rossum 5feb54c461 added hdl protocol properties 1996-05-28 23:10:02 +00:00
Guido van Rossum ededb58c14 Update reference (it's now RFC 1808); added http to list of protocols
that use parameters.
1996-03-29 21:23:25 +00:00
Guido van Rossum 1a16c868d4 remove file: from list of protocols taking host 1995-08-10 19:45:41 +00:00
Guido van Rossum fb1a0cd74f subtle changes to relative rurl joins 1995-08-04 04:29:32 +00:00
Guido van Rossum a1124700f8 Add hacks for switching protocol and path but leaving host unchanged 1994-12-30 17:18:59 +00:00
Guido van Rossum 23cb2a83a5 New tty/pty modules by Steen; new urlparser. 1994-09-12 10:36:35 +00:00