Commit Graph

451 Commits

Author SHA1 Message Date
Rémi Lapeyre 8047e0e1c6 bpo-35922: Fix RobotFileParser when robots.txt has no relevant crawl delay or request rate (GH-11791)
Co-Authored-By: Tal Einat <taleinat+github@gmail.com>
2019-06-16 09:48:57 +03:00
Steve Dower 8d0ef0b5ed bpo-36742: Corrects fix to handle decomposition in usernames (#13812) 2019-06-04 17:55:29 +02:00
Rémi Lapeyre 674ee12600 bpo-35397: Remove deprecation and document urllib.parse.unwrap (GH-11481) 2019-05-27 09:43:45 -04:00
Steve Dower b82e17e626
bpo-36842: Implement PEP 578 (GH-12613)
Adds sys.audit, sys.addaudithook, io.open_code, and associated C APIs.
2019-05-23 08:45:22 -07:00
Victor Stinner 0c2b6a3943
bpo-35907, CVE-2019-9948: urllib rejects local_file:// scheme (GH-13474)
CVE-2019-9948: Avoid file reading as disallowing the unnecessary URL
scheme in URLopener().open() and URLopener().retrieve()
of urllib.request.

Co-Authored-By: SH <push0ebp@gmail.com>
2019-05-22 22:15:01 +02:00
Xtreak c661b30f89 bpo-36948: Fix NameError in urllib.request.URLopener.retrieve (GH-13389) 2019-05-19 16:40:05 +03:00
Steve Dower d537ab0ff9
bpo-36742: Fixes handling of pre-normalization characters in urlsplit() (GH-13017) 2019-04-30 12:03:02 +00:00
Jörn Hees 750d74fac5 bpo-12910: update and correct quote docstring (#2568)
Fixes some mistakes and misleadings in the quote function docstring:
- reserved chars are never actually used by quote code, unreserved chars are
- reserved chars were wrong and incomplete
- mentioned that use-case is not minimal quoting wrt. RFC, but cautious quoting
2019-04-09 17:31:18 -07:00
Serhiy Storchaka da0847048a
bpo-36431: Use PEP 448 dict unpacking for merging two dicts. (GH-12553) 2019-03-27 08:02:28 +02:00
Steve Dower 16e6f7dee7
bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201) 2019-03-07 08:02:26 -08:00
Boštjan Mejak 158695817d closes bpo-35309: cpath should be capath (GH-10699) 2018-11-25 12:32:50 -06:00
matthewbelisle-wf 209144831b bpo-34866: Adding max_num_fields to cgi.FieldStorage (GH-9660)
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by
limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.
2018-10-19 03:52:59 -07:00
Christopher Beacham 5db5c0669e bpo-21475: Support the Sitemap extension in robotparser (GH-6883) 2018-05-16 10:52:07 -04:00
Michael Lazar bd08a0af2d bpo-32861: urllib.robotparser fix incomplete __str__ methods. (GH-5711)
The urllib.robotparser's __str__ representation now includes wildcard
entries and the "Crawl-delay" and "Request-rate" fields. Also removes extra
newlines that were being appended to the end of the string.
2018-05-14 17:10:41 +03:00
Cheryl Sabella 0250de4819 bpo-27485: Rename and deprecate undocumented functions in urllib.parse (GH-2205) 2018-04-25 16:51:54 -07:00
Matt Eaton 2cb4661707 bpo-33034: Improve exception message when cast fails for {Parse,Split}Result.port (GH-6078) 2018-03-20 09:41:37 +03:00
Serhiy Storchaka 3f2e6f15d6
Revert unneccessary changes made in bpo-30296 and apply other improvements. (GH-2624) 2018-02-26 16:50:11 +02:00
INADA Naoki 579e0b80b9
urllib.request: Remove unused import (GH-5268) 2018-01-22 16:45:31 +09:00
Коренберг Марк fbd605151f bpo-32323: urllib.parse.urlsplit() must not lowercase() IPv6 scope value (#4867) 2017-12-21 14:16:17 +02:00
Berker Peksag 3df02dbc8e bpo-31325: Fix usage of namedtuple in RobotFileParser.parse() (#4529) 2017-11-23 15:40:26 -08:00
Oren Milman 8df44ee8e0 remove a redundant lower in urllib.parse.urlsplit (#3008) 2017-09-02 21:51:39 -07:00
postmasters 90e01e50ef urllib: Simplify splithost by calling into urlparse. (#1849)
The current regex based splitting produces a wrong result. For example::

  http://abc#@def

Web browsers parse that URL as ``http://abc/#@def``, that is, the host
is ``abc``, the path is ``/``, and the fragment is ``#@def``.
2017-06-20 15:02:44 +02:00
Jon Dufresne 3972628de3 bpo-30296 Remove unnecessary tuples, lists, sets, and dicts (#1489)
* Replaced list(<generator expression>) with list comprehension
* Replaced dict(<generator expression>) with dict comprehension
* Replaced set(<list literal>) with set literal
* Replaced builtin func(<list comprehension>) with func(<generator
  expression>) when supported (e.g. any(), all(), tuple(), min(), &
  max())
2017-05-18 07:35:54 -07:00
Senthil Kumaran 906f5330b9 bpo-29976: urllib.parse clarify '' in scheme values. (GH-984) 2017-05-17 21:48:59 -07:00
Serhiy Storchaka 55fe1ae970 bpo-30022: Get rid of using EnvironmentError and IOError (except test… (#1051) 2017-04-16 10:46:38 +03:00
Senthil Kumaran 6fab78e902 Remove superfluous comment in urllib.error. (#1076) 2017-04-10 21:08:35 -07:00
Senthil Kumaran 6dfcc81f6b Remove OSError related comment in urllib.request. (#1070) 2017-04-09 19:49:34 -07:00
Senthil Kumaran a2a9ddd923 Remove invalid comment in urllib.request. (#1054) 2017-04-08 23:27:25 -07:00
Senthil Kumaran 257b980b31 correct parse_qs and parse_qsl test case descriptions. (#968)
* correct parse_qs and parse_qsl test case descriptions.
2017-04-04 21:19:43 -07:00
Ratnadeep Debnath 21024f0662 bpo-16285: Update urllib quoting to RFC 3986 (#173)
* bpo-16285: Update urllib quoting to RFC 3986

urllib.parse.quote is now based on RFC 3986, and hence
includes `'~'` in the set of characters that is not escaped
by default.

Patch by Christian Theune and Ratnadeep Debnath.
2017-02-25 19:00:28 +10:00
Xiang Zhang 04c15d5bdc Issue #29142: Merge 3.6. 2017-01-09 11:52:10 +08:00
Xiang Zhang c44d58a77a Issue #29142: Merge 3.5. 2017-01-09 11:50:02 +08:00
Xiang Zhang 959ff7f1c6 Issue #29142: Fix suffixes in no_proxy handling in urllib.
In urllib.request, suffixes in no_proxy environment variable with
leading dots could match related hostnames again (e.g. .b.c matches a.b.c).
Patch by Milan Oberkirch.
2017-01-09 11:47:55 +08:00
Serhiy Storchaka 8cbd3df3ce Issue #28992: Use bytes.fromhex(). 2016-12-21 12:59:28 +02:00
Serhiy Storchaka 70d28a184c Remove unused imports. 2016-12-16 20:00:15 +02:00
Berker Peksag 9a7bbb2e3f Issue #25400: RobotFileParser now correctly returns default values for crawl_delay and request_rate
Initial patch by Peter Wirtz.
2016-09-18 20:17:58 +03:00
Berker Peksag f8479eeb34 Issue #25895: Merge from 3.5 2016-09-16 14:45:15 +03:00
Berker Peksag f676748a05 Issue #25895: Enable WebSocket URL schemes in urllib.parse.urljoin
Patch by Gergely Imreh and Markus Holtermann.
2016-09-16 14:43:58 +03:00
Christian Heimes d04863771b Issue #28022: Deprecate ssl-related arguments in favor of SSLContext.
The deprecation include manual creation of SSLSocket and certfile/keyfile
(or similar) in ftplib, httplib, imaplib, smtplib, poplib and urllib.

ssl.wrap_socket() is not marked as deprecated yet.
2016-09-10 23:23:33 +02:00
Raymond Hettinger b7f3c944d1 Merge 2016-09-09 16:44:53 -07:00
Raymond Hettinger ae9e5f032d Issue #22450: Use "Accept: */*" in the default headers for urllib.request 2016-09-09 16:43:48 -07:00
Martin Panter 3c0d0baf2b Issue #12319: Support for chunked encoding of HTTP request bodies
When the body object is a file, its size is no longer determined with
fstat(), since that can report the wrong result (e.g. reading from a pipe).
Instead, determine the size using seek(), or fall back to chunked encoding
for unseekable files.

Also, change the logic for detecting text files to check for TextIOBase
inheritance, rather than inspecting the “mode” attribute, which may not
exist (e.g. BytesIO and StringIO).  The Content-Length for text files is no
longer determined ahead of time, because the original logic could have been
wrong depending on the codec and newline translation settings.

Patch by Demian Brecht and Rolf Krahl, with a few tweaks by me.
2016-08-24 06:33:33 +00:00
Senthil Kumaran cde03fa038 [merge from 3.5] - Prevent HTTPoxy attack (CVE-2016-1000110)
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.

Issue #27568 Reported and patch contributed by Rémi Rampin.
2016-07-30 23:51:13 -07:00
Senthil Kumaran 17742f2d45 [merge from 3.4] - Prevent HTTPoxy attack (CVE-2016-1000110)
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.

Issue #27568 Reported and patch contributed by Rémi Rampin.
2016-07-30 23:39:06 -07:00
Senthil Kumaran 436fe5a447 [merge from 3.3] Prevent HTTPoxy attack (CVE-2016-1000110)
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.

Issue #27568 Reported and patch contributed by Rémi Rampin.
2016-07-30 23:34:34 -07:00
Senthil Kumaran 4cbb23f8f2 Prevent HTTPoxy attack (CVE-2016-1000110)
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.

Issue #27568 Reported and patch contributed by Rémi Rampin.
2016-07-30 23:24:16 -07:00
Martin Panter 29f256909f Issue #22797: Synchronize urlopen() doc string with RST documentation 2016-06-04 05:06:34 +00:00
Martin Panter 0f29ad1be5 More typo fixes for 3.6 2016-06-04 05:06:25 +00:00
R David Murray d2367c651e Clean up urlopen doc string.
Clarifies what is returned when and that the methods are common between the two.

Patch by Alexander Liu as part of #22797.
2016-06-03 20:16:06 -04:00
Martin Panter 0b39a556e8 Issue #14132, Issue #17214: Merge two redirect handling fixes from 3.5 2016-05-16 07:45:28 +00:00
Martin Panter e6f060903c Issue #17214: Percent-encode non-ASCII bytes in redirect targets
Some servers send Location header fields with non-ASCII bytes, but "http.
client" requires the request target to be ASCII-encodable, otherwise a
UnicodeEncodeError is raised. Based on patch by Christian Heimes.

Python 2 does not suffer any problem because it allows non-ASCII bytes in the
HTTP request target.
2016-05-16 01:14:20 +00:00
Martin Panter ce6e06874b Issue #14132: Fix redirect handling when target is just a query string 2016-05-16 01:07:13 +00:00
Senthil Kumaran 5d1110a952 merge from 3.5
Issue #26892: Honor debuglevel flag in urllib.request.HTTPHandler.

Patch contributed by Chi Hsuan Yen.
2016-05-13 01:35:29 -07:00
Senthil Kumaran 9642eedc0a Issue #26892: Honor debuglevel flag in urllib.request.HTTPHandler.
Patch contributed by Chi Hsuan Yen.
2016-05-13 01:32:42 -07:00
Martin Panter 1ce738e08f Merge typo fixes from 3.5 2016-05-08 14:02:35 +00:00
Martin Panter f0564164ba Fix typos in comments, documentation and test method names 2016-05-08 13:48:10 +00:00
Martin Panter 51b697b7f3 Issue #26864: Merge no_proxy fixes from 3.5 2016-04-30 01:30:57 +00:00
Martin Panter aa27982ffc Issue #26864: Fix case insensitivity and suffix comparison with no_proxy
Patch by Xiang Zhang.
2016-04-30 01:03:40 +00:00
Senthil Kumaran 0996fa3bd8 merge 3.5
Issue #26804: urllib.request will prefer lower_case proxy environment variables
over UPPER_CASE or Mixed_Case ones.

Patch contributed by Hans-Peter Jansen. Reviewed by Martin Panter and Senthil Kumaran.
2016-04-25 08:18:07 -07:00
Senthil Kumaran a7c0ff2f0b Issue #26804: urllib.request will prefer lower_case proxy environment variables
over UPPER_CASE or Mixed_Case ones.

Patch contributed by Hans-Peter Jansen. Reviewed by Martin Panter and Senthil Kumaran.
2016-04-25 08:16:23 -07:00
Berker Peksag 48238c7e37 Issue #2202: Fix UnboundLocalError in AbstractDigestAuthHandler.get_algorithm_impls
Raise ValueError if algorithm is not MD5 or SHA.

Initial patch by Mathieu Dupuy.
2016-03-06 16:17:47 +02:00
Berker Peksag e88dd1c32c Issue #2202: Fix UnboundLocalError in AbstractDigestAuthHandler.get_algorithm_impls
Raise ValueError if algorithm is not MD5 or SHA.

Initial patch by Mathieu Dupuy.
2016-03-06 16:16:40 +02:00
Serhiy Storchaka 885bdc4946 Issue #25985: sys.version_info is now used instead of sys.version
to format short Python version.
2016-02-11 13:10:36 +02:00
Martin Panter a3643c280f Issue #12923: Merge FancyURLopener fix from 3.5 2016-02-06 01:08:40 +00:00
Martin Panter a03702252f Issue #12923: Reset FancyURLopener's redirect counter even on exception
Based on patches by Brian Brazil and Daniel Rocco.
2016-02-04 06:01:35 +00:00
Senthil Kumaran 0b57f0adde merge from 3.5
Remove unnecessary test case comment in urllib.parse.py. These are asserted as test cases.
2016-01-25 18:54:37 -08:00
Senthil Kumaran d4e51f45a9 Remove unnecessary test case comment in urllib.parse.py. These are asserted as test cases. 2016-01-25 18:53:34 -08:00
Senthil Kumaran 86f7109dad Issue #25822: Add docstrings to the fields of urllib.parse results.
Patch contributed by Swati Jaiswal.
2016-01-14 00:11:39 -08:00
Serhiy Storchaka 3fd4a735d8 Issue #25899: Converted non-ASCII characters in docstrings and manpage
to ASCII replacements.  Removed UTF-8 BOM from Misc/NEWS.
Original patch by Chris Angelico.
2015-12-18 13:10:37 +02:00
Martin Panter f65dd1d4db Issue #25576: Apply fix to new urlopen() doc string 2015-11-24 23:00:37 +00:00
Berker Peksag 960e848f0d Issue #16099: RobotFileParser now supports Crawl-delay and Request-rate
extensions.

Patch by Nikolay Bogoychev.
2015-10-08 12:27:06 +03:00
Raymond Hettinger 507343a2ef Add missing docstring 2015-08-18 00:35:52 -07:00
Robert Collins dfa95c9a8f Issue #20059: urllib.parse raises ValueError on all invalid ports.
Patch by Martin Panter.
2015-08-10 09:53:30 +12:00
Robert Collins 1f9a29f31b Issue #24021: docstring for urllib.urlcleanup.
Patch from Daniel Andrade Groppe and Peter Lovett
2015-08-04 12:52:43 +12:00
Robert Collins 2fee5c9367 Issue #24021: docstring for urllib.urlcleanup.
Patch from Daniel Andrade Groppe and Peter Lovett
2015-08-04 12:52:06 +12:00
R David Murray c17686f071 Issue #13866: add *quote_via* argument to urlencode.
Patch by samwyse, completed by Arnon Yaari, and reviewed by
Martin Panter.
2015-05-17 20:44:50 -04:00
Facundo Batista 244afcf26c Issue #23887: urllib.error.HTTPError now has a proper repr() representation. 2015-04-22 18:35:54 -03:00
R David Murray 4c7f995e80 #7159: generalize urllib prior auth support.
This fix is a superset of the functionality introduced by the issue #19494
enhancement, and supersedes that fix.  Instead of a new handler, we have a new
password manager that tracks whether we should send the auth for a given uri.
This allows us to say "always send", satisfying #19494, or track that we've
succeeded in auth and send the creds right away on every *subsequent* request.
The support for using the password manager is added to AbstractBasicAuth,
which means the proxy handler also now can handle prior auth if passed
the new password manager.

Patch by Akshit Khurana, docs mostly by me.
2015-04-16 16:36:18 -04:00
Berker Peksag 20416f7994 Issue #23703: Fix a regression in urljoin() introduced in 901e4e52b20a.
Patch by Demian Brecht.
2015-04-16 02:31:14 +03:00
Serhiy Storchaka 7e7a3dba5f Issue #23865: close() methods in multiple modules now are idempotent and more
robust at shutdown. If needs to release multiple resources, they are released
even if errors are occured.
2015-04-10 13:24:41 +03:00
Serhiy Storchaka 2116b12da5 Issue #23865: close() methods in multiple modules now are idempotent and more
robust at shutdown. If needs to release multiple resources, they are released
even if errors are occured.
2015-04-10 13:29:28 +03:00
Serhiy Storchaka 1515450440 Issue #23411: Added DefragResult, ParseResult, SplitResult, DefragResultBytes,
ParseResultBytes, and SplitResultBytes to urllib.parse.__all__.
Patch by Martin Panter.
2015-04-07 19:09:01 +03:00
Victor Stinner a9dd680d23 (Merge 3.4) Issue #23881: urllib.request.ftpwrapper constructor now closes the
socket if the FTP connection failed to fix a ResourceWarning.
2015-04-07 12:50:24 +02:00
Victor Stinner ab73e65032 Issue #23881: urllib.request.ftpwrapper constructor now closes the socket if
the FTP connection failed to fix a ResourceWarning.
2015-04-07 12:49:27 +02:00
Serhiy Storchaka 44eceb6e2a Issue #23563: Optimized utility functions in urllib.parse. 2015-03-03 20:21:35 +02:00
R David Murray 3ab6ba4744 Merge: #23040: Clarify treatment of encoding and errors when component is bytes. 2014-12-24 21:24:07 -05:00
R David Murray 8c4e112afc #23040: Clarify treatment of encoding and errors when component is bytes.
Patch by Wojtek Ruszczewski.
2014-12-24 21:23:18 -05:00
Benjamin Peterson b666697fa8 use context's check_hostname attribute rather than the HTTPSHandler check_hostname parameter 2014-12-07 13:46:02 -05:00
Benjamin Peterson 074b95da48 merge 3.4 2014-12-07 13:47:39 -05:00
Nick Coghlan c216c48699 Close #19494: add urrlib.request.HTTPBasicPriorAuthHandler
This auth handler adds the Authorization header to the first
HTTP request rather than waiting for a HTTP 401 Unauthorized
response from the server as the default HTTPBasicAuthHandler
does.

This allows working with websites like https://api.github.com which do
not follow the strict interpretation of RFC, but more the dicta in the
end of section 2 of RFC 2617:

    > A client MAY preemptively send the corresponding Authorization
    > header with requests for resources in that space without receipt
    > of another challenge from the server.  Similarly, when a client
    > sends a request to a proxy, it may reuse a userid and password in
    > the Proxy-Authorization header field without receiving another
    > challenge from the proxy server. See section 4 for security
    > considerations associated with Basic authentication.

Patch by Matej Cepl.
2014-11-12 23:33:50 +10:00
Senthil Kumaran a66e3885fb Issue #22278: Fix urljoin problem with relative urls, a regression observed
after changes to issue22118 were submitted.

Patch contributed by Demian Brecht and reviewed by Antoine Pitrou.
2014-09-22 15:49:16 +08:00
Senthil Kumaran 8b7e161ac3 backport context argument of urlopen (#22366) for pep 476 2014-09-19 15:23:30 +08:00
Senthil Kumaran a5c85b3f5f Issue #22366: urllib.request.urlopen will accept a context object (SSLContext)
as an argument which will then used be for HTTPS connection.

Patch by Alex Gaynor.
2014-09-19 15:23:30 +08:00
Serhiy Storchaka 91453026ff Issue #19524: Fixed resource leak in the HTTP connection when an invalid
response is received.  Patch by Martin Panter.
2014-09-06 21:43:49 +03:00
Serhiy Storchaka f54c350160 Issue #19524: Fixed resource leak in the HTTP connection when an invalid
response is received.  Patch by Martin Panter.
2014-09-06 21:41:39 +03:00
Antoine Pitrou 55ac5b3f7b Issue #22118: Switch urllib.parse to use RFC 3986 semantics for the resolution of relative URLs, rather than RFCs 1808 and 2396.
Patch by Demian Brecht.
2014-08-21 19:16:17 -04:00
Senthil Kumaran 2b7ccbda90 merge from 3.4
Fix Issue #8797: Raise HTTPError on failed Basic Authentication immediately. Initial patch by Sam Bull.
2014-08-20 07:55:53 +05:30
Senthil Kumaran 783737625d Fix Issue #8797: Raise HTTPError on failed Basic Authentication immediately. Initial patch by Sam Bull. 2014-08-20 07:53:58 +05:30
Senthil Kumaran e2953e5146 merge 3.4; backout changeset 3435c5865cfc due to buildbot failures. Ref #8797 2014-08-16 22:54:24 +05:30
Senthil Kumaran 402df0975c backout changeset 3435c5865cfc due to buildbot failures. Ref #8797 2014-08-16 22:52:37 +05:30