Commit Graph

402 Commits

Author SHA1 Message Date
Miss Islington (bot) 590ed09a5b
bpo-25068: urllib.request.ProxyHandler now lowercases the dict keys (GH-13489)
(cherry picked from commit b761e3aed1)

Co-authored-by: Zackery Spytz <zspytz@gmail.com>
2019-09-13 07:25:51 -07:00
Miss Islington (bot) 58a1a76bae
bpo-35922: Fix RobotFileParser when robots.txt has no relevant crawl delay or request rate (GH-11791)
Co-Authored-By: Tal Einat <taleinat+github@gmail.com>
(cherry picked from commit 8047e0e1c6)

Co-authored-by: Rémi Lapeyre <remi.lapeyre@henki.fr>
2019-06-16 00:07:54 -07:00
Steve Dower 8d0ef0b5ed bpo-36742: Corrects fix to handle decomposition in usernames (#13812) 2019-06-04 17:55:29 +02:00
Rémi Lapeyre 674ee12600 bpo-35397: Remove deprecation and document urllib.parse.unwrap (GH-11481) 2019-05-27 09:43:45 -04:00
Steve Dower b82e17e626
bpo-36842: Implement PEP 578 (GH-12613)
Adds sys.audit, sys.addaudithook, io.open_code, and associated C APIs.
2019-05-23 08:45:22 -07:00
Victor Stinner 0c2b6a3943
bpo-35907, CVE-2019-9948: urllib rejects local_file:// scheme (GH-13474)
CVE-2019-9948: Avoid file reading as disallowing the unnecessary URL
scheme in URLopener().open() and URLopener().retrieve()
of urllib.request.

Co-Authored-By: SH <push0ebp@gmail.com>
2019-05-22 22:15:01 +02:00
Xtreak c661b30f89 bpo-36948: Fix NameError in urllib.request.URLopener.retrieve (GH-13389) 2019-05-19 16:40:05 +03:00
Steve Dower d537ab0ff9
bpo-36742: Fixes handling of pre-normalization characters in urlsplit() (GH-13017) 2019-04-30 12:03:02 +00:00
Jörn Hees 750d74fac5 bpo-12910: update and correct quote docstring (#2568)
Fixes some mistakes and misleadings in the quote function docstring:
- reserved chars are never actually used by quote code, unreserved chars are
- reserved chars were wrong and incomplete
- mentioned that use-case is not minimal quoting wrt. RFC, but cautious quoting
2019-04-09 17:31:18 -07:00
Serhiy Storchaka da0847048a
bpo-36431: Use PEP 448 dict unpacking for merging two dicts. (GH-12553) 2019-03-27 08:02:28 +02:00
Steve Dower 16e6f7dee7
bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201) 2019-03-07 08:02:26 -08:00
Boštjan Mejak 158695817d closes bpo-35309: cpath should be capath (GH-10699) 2018-11-25 12:32:50 -06:00
matthewbelisle-wf 209144831b bpo-34866: Adding max_num_fields to cgi.FieldStorage (GH-9660)
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by
limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.
2018-10-19 03:52:59 -07:00
Christopher Beacham 5db5c0669e bpo-21475: Support the Sitemap extension in robotparser (GH-6883) 2018-05-16 10:52:07 -04:00
Michael Lazar bd08a0af2d bpo-32861: urllib.robotparser fix incomplete __str__ methods. (GH-5711)
The urllib.robotparser's __str__ representation now includes wildcard
entries and the "Crawl-delay" and "Request-rate" fields. Also removes extra
newlines that were being appended to the end of the string.
2018-05-14 17:10:41 +03:00
Cheryl Sabella 0250de4819 bpo-27485: Rename and deprecate undocumented functions in urllib.parse (GH-2205) 2018-04-25 16:51:54 -07:00
Matt Eaton 2cb4661707 bpo-33034: Improve exception message when cast fails for {Parse,Split}Result.port (GH-6078) 2018-03-20 09:41:37 +03:00
Serhiy Storchaka 3f2e6f15d6
Revert unneccessary changes made in bpo-30296 and apply other improvements. (GH-2624) 2018-02-26 16:50:11 +02:00
INADA Naoki 579e0b80b9
urllib.request: Remove unused import (GH-5268) 2018-01-22 16:45:31 +09:00
Коренберг Марк fbd605151f bpo-32323: urllib.parse.urlsplit() must not lowercase() IPv6 scope value (#4867) 2017-12-21 14:16:17 +02:00
Berker Peksag 3df02dbc8e bpo-31325: Fix usage of namedtuple in RobotFileParser.parse() (#4529) 2017-11-23 15:40:26 -08:00
Oren Milman 8df44ee8e0 remove a redundant lower in urllib.parse.urlsplit (#3008) 2017-09-02 21:51:39 -07:00
postmasters 90e01e50ef urllib: Simplify splithost by calling into urlparse. (#1849)
The current regex based splitting produces a wrong result. For example::

  http://abc#@def

Web browsers parse that URL as ``http://abc/#@def``, that is, the host
is ``abc``, the path is ``/``, and the fragment is ``#@def``.
2017-06-20 15:02:44 +02:00
Jon Dufresne 3972628de3 bpo-30296 Remove unnecessary tuples, lists, sets, and dicts (#1489)
* Replaced list(<generator expression>) with list comprehension
* Replaced dict(<generator expression>) with dict comprehension
* Replaced set(<list literal>) with set literal
* Replaced builtin func(<list comprehension>) with func(<generator
  expression>) when supported (e.g. any(), all(), tuple(), min(), &
  max())
2017-05-18 07:35:54 -07:00
Senthil Kumaran 906f5330b9 bpo-29976: urllib.parse clarify '' in scheme values. (GH-984) 2017-05-17 21:48:59 -07:00
Serhiy Storchaka 55fe1ae970 bpo-30022: Get rid of using EnvironmentError and IOError (except test… (#1051) 2017-04-16 10:46:38 +03:00
Senthil Kumaran 6fab78e902 Remove superfluous comment in urllib.error. (#1076) 2017-04-10 21:08:35 -07:00
Senthil Kumaran 6dfcc81f6b Remove OSError related comment in urllib.request. (#1070) 2017-04-09 19:49:34 -07:00
Senthil Kumaran a2a9ddd923 Remove invalid comment in urllib.request. (#1054) 2017-04-08 23:27:25 -07:00
Senthil Kumaran 257b980b31 correct parse_qs and parse_qsl test case descriptions. (#968)
* correct parse_qs and parse_qsl test case descriptions.
2017-04-04 21:19:43 -07:00
Ratnadeep Debnath 21024f0662 bpo-16285: Update urllib quoting to RFC 3986 (#173)
* bpo-16285: Update urllib quoting to RFC 3986

urllib.parse.quote is now based on RFC 3986, and hence
includes `'~'` in the set of characters that is not escaped
by default.

Patch by Christian Theune and Ratnadeep Debnath.
2017-02-25 19:00:28 +10:00
Xiang Zhang 04c15d5bdc Issue #29142: Merge 3.6. 2017-01-09 11:52:10 +08:00
Xiang Zhang c44d58a77a Issue #29142: Merge 3.5. 2017-01-09 11:50:02 +08:00
Xiang Zhang 959ff7f1c6 Issue #29142: Fix suffixes in no_proxy handling in urllib.
In urllib.request, suffixes in no_proxy environment variable with
leading dots could match related hostnames again (e.g. .b.c matches a.b.c).
Patch by Milan Oberkirch.
2017-01-09 11:47:55 +08:00
Serhiy Storchaka 8cbd3df3ce Issue #28992: Use bytes.fromhex(). 2016-12-21 12:59:28 +02:00
Serhiy Storchaka 70d28a184c Remove unused imports. 2016-12-16 20:00:15 +02:00
Berker Peksag 9a7bbb2e3f Issue #25400: RobotFileParser now correctly returns default values for crawl_delay and request_rate
Initial patch by Peter Wirtz.
2016-09-18 20:17:58 +03:00
Berker Peksag f8479eeb34 Issue #25895: Merge from 3.5 2016-09-16 14:45:15 +03:00
Berker Peksag f676748a05 Issue #25895: Enable WebSocket URL schemes in urllib.parse.urljoin
Patch by Gergely Imreh and Markus Holtermann.
2016-09-16 14:43:58 +03:00
Christian Heimes d04863771b Issue #28022: Deprecate ssl-related arguments in favor of SSLContext.
The deprecation include manual creation of SSLSocket and certfile/keyfile
(or similar) in ftplib, httplib, imaplib, smtplib, poplib and urllib.

ssl.wrap_socket() is not marked as deprecated yet.
2016-09-10 23:23:33 +02:00
Raymond Hettinger b7f3c944d1 Merge 2016-09-09 16:44:53 -07:00
Raymond Hettinger ae9e5f032d Issue #22450: Use "Accept: */*" in the default headers for urllib.request 2016-09-09 16:43:48 -07:00
Martin Panter 3c0d0baf2b Issue #12319: Support for chunked encoding of HTTP request bodies
When the body object is a file, its size is no longer determined with
fstat(), since that can report the wrong result (e.g. reading from a pipe).
Instead, determine the size using seek(), or fall back to chunked encoding
for unseekable files.

Also, change the logic for detecting text files to check for TextIOBase
inheritance, rather than inspecting the “mode” attribute, which may not
exist (e.g. BytesIO and StringIO).  The Content-Length for text files is no
longer determined ahead of time, because the original logic could have been
wrong depending on the codec and newline translation settings.

Patch by Demian Brecht and Rolf Krahl, with a few tweaks by me.
2016-08-24 06:33:33 +00:00
Senthil Kumaran cde03fa038 [merge from 3.5] - Prevent HTTPoxy attack (CVE-2016-1000110)
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.

Issue #27568 Reported and patch contributed by Rémi Rampin.
2016-07-30 23:51:13 -07:00
Senthil Kumaran 17742f2d45 [merge from 3.4] - Prevent HTTPoxy attack (CVE-2016-1000110)
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.

Issue #27568 Reported and patch contributed by Rémi Rampin.
2016-07-30 23:39:06 -07:00
Senthil Kumaran 436fe5a447 [merge from 3.3] Prevent HTTPoxy attack (CVE-2016-1000110)
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.

Issue #27568 Reported and patch contributed by Rémi Rampin.
2016-07-30 23:34:34 -07:00
Senthil Kumaran 4cbb23f8f2 Prevent HTTPoxy attack (CVE-2016-1000110)
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.

Issue #27568 Reported and patch contributed by Rémi Rampin.
2016-07-30 23:24:16 -07:00
Martin Panter 29f256909f Issue #22797: Synchronize urlopen() doc string with RST documentation 2016-06-04 05:06:34 +00:00
Martin Panter 0f29ad1be5 More typo fixes for 3.6 2016-06-04 05:06:25 +00:00
R David Murray d2367c651e Clean up urlopen doc string.
Clarifies what is returned when and that the methods are common between the two.

Patch by Alexander Liu as part of #22797.
2016-06-03 20:16:06 -04:00