Fixing a note on encoding declaration, its usage in urlopen based on review

comments from RDM and Ezio.
This commit is contained in:
Senthil Kumaran 2010-04-22 10:53:30 +00:00
parent 5e73a819ca
commit 0c2d8b8e51
1 changed files with 22 additions and 15 deletions

View File

@ -1072,30 +1072,37 @@ HTTPErrorProcessor Objects
Examples Examples
-------- --------
This example gets the python.org main page and displays the first 100 bytes of This example gets the python.org main page and displays the first 300 bytes of
it. :: it. ::
>>> import urllib.request >>> import urllib.request
>>> f = urllib.request.urlopen('http://www.python.org/') >>> f = urllib.request.urlopen('http://www.python.org/')
>>> print(f.read(100)) >>> print(f.read(300))
b'<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
<?xml-stylesheet href="./css/ht2html' "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
<meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
<title>Python Programming '
Note that in Python 3, urlopen returns a bytes object by default. In many Note that urlopen returns a bytes object. This is because there is no way
circumstances, you might expect the output of urlopen to be a string. This for urlopen to automatically determine the encoding of the byte stream
might be a carried over expectation from Python 2, where urlopen returned it receives from the http server. In general, a program will decode
string or it might even the common usecase. In those cases, you should the returned bytes object to string once it determines or guesses
explicitly decode the bytes to string. the appropriate encoding.
In the examples below, we have chosen *utf-8* encoding for demonstration, you The following W3C document, http://www.w3.org/International/O-charset , lists
might choose the encoding which is suitable for the webpage you are the various ways in which a (X)HTML or a XML document could have specified its
requesting:: encoding information.
As python.org website uses *utf-8* encoding as specified in it's meta tag, we
will use same for decoding the bytes object. ::
>>> import urllib.request >>> import urllib.request
>>> f = urllib.request.urlopen('http://www.python.org/') >>> f = urllib.request.urlopen('http://www.python.org/')
>>> print(f.read(100).decode('utf-8') >>> print(fp.read(100).decode('utf-8'))
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
<?xml-stylesheet href="./css/ht2html "http://www.w3.org/TR/xhtml1/DTD/xhtm
In the following example, we are sending a data-stream to the stdin of a CGI In the following example, we are sending a data-stream to the stdin of a CGI
and reading the data it returns to us. Note that this example will only work and reading the data it returns to us. Note that this example will only work