Fixing a note on encoding declaration, its usage in urlopen based on review
comments from RDM and Ezio.
This commit is contained in:
parent
5e73a819ca
commit
0c2d8b8e51
|
@ -1072,30 +1072,37 @@ HTTPErrorProcessor Objects
|
||||||
Examples
|
Examples
|
||||||
--------
|
--------
|
||||||
|
|
||||||
This example gets the python.org main page and displays the first 100 bytes of
|
This example gets the python.org main page and displays the first 300 bytes of
|
||||||
it. ::
|
it. ::
|
||||||
|
|
||||||
>>> import urllib.request
|
>>> import urllib.request
|
||||||
>>> f = urllib.request.urlopen('http://www.python.org/')
|
>>> f = urllib.request.urlopen('http://www.python.org/')
|
||||||
>>> print(f.read(100))
|
>>> print(f.read(300))
|
||||||
b'<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
||||||
<?xml-stylesheet href="./css/ht2html'
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
|
||||||
|
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
|
||||||
|
<meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
|
||||||
|
<title>Python Programming '
|
||||||
|
|
||||||
Note that in Python 3, urlopen returns a bytes object by default. In many
|
Note that urlopen returns a bytes object. This is because there is no way
|
||||||
circumstances, you might expect the output of urlopen to be a string. This
|
for urlopen to automatically determine the encoding of the byte stream
|
||||||
might be a carried over expectation from Python 2, where urlopen returned
|
it receives from the http server. In general, a program will decode
|
||||||
string or it might even the common usecase. In those cases, you should
|
the returned bytes object to string once it determines or guesses
|
||||||
explicitly decode the bytes to string.
|
the appropriate encoding.
|
||||||
|
|
||||||
In the examples below, we have chosen *utf-8* encoding for demonstration, you
|
The following W3C document, http://www.w3.org/International/O-charset , lists
|
||||||
might choose the encoding which is suitable for the webpage you are
|
the various ways in which a (X)HTML or a XML document could have specified its
|
||||||
requesting::
|
encoding information.
|
||||||
|
|
||||||
|
As python.org website uses *utf-8* encoding as specified in it's meta tag, we
|
||||||
|
will use same for decoding the bytes object. ::
|
||||||
|
|
||||||
>>> import urllib.request
|
>>> import urllib.request
|
||||||
>>> f = urllib.request.urlopen('http://www.python.org/')
|
>>> f = urllib.request.urlopen('http://www.python.org/')
|
||||||
>>> print(f.read(100).decode('utf-8')
|
>>> print(fp.read(100).decode('utf-8'))
|
||||||
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
||||||
<?xml-stylesheet href="./css/ht2html
|
"http://www.w3.org/TR/xhtml1/DTD/xhtm
|
||||||
|
|
||||||
|
|
||||||
In the following example, we are sending a data-stream to the stdin of a CGI
|
In the following example, we are sending a data-stream to the stdin of a CGI
|
||||||
and reading the data it returns to us. Note that this example will only work
|
and reading the data it returns to us. Note that this example will only work
|
||||||
|
|
Loading…
Reference in New Issue