* bpo-27657: Fix urlparse() with numeric paths
Revert parsing decision from bpo-754016 in favor of the documented
consensus in bpo-16932 of how to treat strings without a // to
designate the netloc.
* bpo-22891: Remove urlsplit() optimization for 'http' prefixed inputs.
(cherry picked from commit 5a88d50ff0)
Co-authored-by: Tim Graham <timograham@gmail.com>
CVE-2019-9948: Avoid file reading as disallowing the unnecessary URL
scheme in URLopener().open() and URLopener().retrieve()
of urllib.request.
Co-Authored-By: SH <push0ebp@gmail.com>
(cherry picked from commit 0c2b6a3943)
Fixes some mistakes and misleadings in the quote function docstring:
- reserved chars are never actually used by quote code, unreserved chars are
- reserved chars were wrong and incomplete
- mentioned that use-case is not minimal quoting wrt. RFC, but cautious quoting
(cherry picked from commit 750d74fac5)
Co-authored-by: Jörn Hees <joernhees@users.noreply.github.com>
Adding `max_num_fields` to `cgi.FieldStorage` to make DOS attacks harder by
limiting the number of `MiniFieldStorage` objects created by `FieldStorage`.
(cherry picked from commit 209144831b)
Co-authored-by: matthewbelisle-wf <matthew.belisle@workiva.com>
The urllib.robotparser's __str__ representation now includes wildcard
entries and the "Crawl-delay" and "Request-rate" fields.
(cherry picked from commit bd08a0af2d)
Co-authored-by: Michael Lazar <lazar.michael22@gmail.com>
The current regex based splitting produces a wrong result. For example::
http://abc#@def
Web browsers parse that URL as ``http://abc/#@def``, that is, the host
is ``abc``, the path is ``/``, and the fragment is ``#@def``.
* bpo-16285: Update urllib quoting to RFC 3986
urllib.parse.quote is now based on RFC 3986, and hence
includes `'~'` in the set of characters that is not escaped
by default.
Patch by Christian Theune and Ratnadeep Debnath.
In urllib.request, suffixes in no_proxy environment variable with
leading dots could match related hostnames again (e.g. .b.c matches a.b.c).
Patch by Milan Oberkirch.
The deprecation include manual creation of SSLSocket and certfile/keyfile
(or similar) in ftplib, httplib, imaplib, smtplib, poplib and urllib.
ssl.wrap_socket() is not marked as deprecated yet.
When the body object is a file, its size is no longer determined with
fstat(), since that can report the wrong result (e.g. reading from a pipe).
Instead, determine the size using seek(), or fall back to chunked encoding
for unseekable files.
Also, change the logic for detecting text files to check for TextIOBase
inheritance, rather than inspecting the “mode” attribute, which may not
exist (e.g. BytesIO and StringIO). The Content-Length for text files is no
longer determined ahead of time, because the original logic could have been
wrong depending on the codec and newline translation settings.
Patch by Demian Brecht and Rolf Krahl, with a few tweaks by me.
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.
Issue #27568 Reported and patch contributed by Rémi Rampin.
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.
Issue #27568 Reported and patch contributed by Rémi Rampin.
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.
Issue #27568 Reported and patch contributed by Rémi Rampin.
Ignore the HTTP_PROXY variable when REQUEST_METHOD environment is set, which
indicates that the script is in CGI mode.
Issue #27568 Reported and patch contributed by Rémi Rampin.
Some servers send Location header fields with non-ASCII bytes, but "http.
client" requires the request target to be ASCII-encodable, otherwise a
UnicodeEncodeError is raised. Based on patch by Christian Heimes.
Python 2 does not suffer any problem because it allows non-ASCII bytes in the
HTTP request target.