Note that Python 3.x isn't covered; add forward ref. for UTF-8; note error in 2.5 and up
This commit is contained in:
parent
801923681c
commit
08982665b7
|
@ -2,10 +2,12 @@
|
|||
Unicode HOWTO
|
||||
*****************
|
||||
|
||||
:Release: 1.02
|
||||
:Release: 1.03
|
||||
|
||||
This HOWTO discusses Python's support for Unicode, and explains various problems
|
||||
that people commonly encounter when trying to work with Unicode.
|
||||
This HOWTO discusses Python 2.x's support for Unicode, and explains
|
||||
various problems that people commonly encounter when trying to work
|
||||
with Unicode. (This HOWTO has not yet been updated to cover the 3.x
|
||||
versions of Python.)
|
||||
|
||||
Introduction to Unicode
|
||||
=======================
|
||||
|
@ -144,8 +146,9 @@ problems.
|
|||
4. Many Internet standards are defined in terms of textual data, and can't
|
||||
handle content with embedded zero bytes.
|
||||
|
||||
Generally people don't use this encoding, instead choosing other encodings that
|
||||
are more efficient and convenient.
|
||||
Generally people don't use this encoding, instead choosing other
|
||||
encodings that are more efficient and convenient. UTF-8 is probably
|
||||
the most commonly supported encoding; it will be discussed below.
|
||||
|
||||
Encodings don't have to handle every possible Unicode character, and most
|
||||
encodings don't. For example, Python's default encoding is the 'ascii'
|
||||
|
@ -222,8 +225,8 @@ Wikipedia entries are often helpful; see the entries for "character encoding"
|
|||
<http://en.wikipedia.org/wiki/UTF-8>, for example.
|
||||
|
||||
|
||||
Python's Unicode Support
|
||||
========================
|
||||
Python 2.x's Unicode Support
|
||||
============================
|
||||
|
||||
Now that you've learned the rudiments of Unicode, we can look at Python's
|
||||
Unicode features.
|
||||
|
@ -272,7 +275,7 @@ Unicode result). The following examples show the differences::
|
|||
>>> unicode('\x80abc', errors='ignore')
|
||||
u'abc'
|
||||
|
||||
Encodings are specified as strings containing the encoding's name. Python 2.4
|
||||
Encodings are specified as strings containing the encoding's name. Python 2.7
|
||||
comes with roughly 100 different encodings; see the Python Library Reference at
|
||||
:ref:`standard-encodings` for a list. Some encodings
|
||||
have multiple names; for example, 'latin-1', 'iso_8859_1' and '8859' are all
|
||||
|
@ -427,11 +430,19 @@ encoding declaration::
|
|||
|
||||
When you run it with Python 2.4, it will output the following warning::
|
||||
|
||||
amk:~$ python p263.py
|
||||
amk:~$ python2.4 p263.py
|
||||
sys:1: DeprecationWarning: Non-ASCII character '\xe9'
|
||||
in file p263.py on line 2, but no encoding declared;
|
||||
see http://www.python.org/peps/pep-0263.html for details
|
||||
|
||||
Python 2.5 and higher are stricter and will produce a syntax error::
|
||||
|
||||
amk:~$ python2.5 p263.py
|
||||
File "/tmp/p263.py", line 2
|
||||
SyntaxError: Non-ASCII character '\xc3' in file /tmp/p263.py
|
||||
on line 2, but no encoding declared; see
|
||||
http://www.python.org/peps/pep-0263.html for details
|
||||
|
||||
|
||||
Unicode Properties
|
||||
------------------
|
||||
|
@ -693,7 +704,11 @@ several links.
|
|||
|
||||
Version 1.02: posted August 16 2005. Corrects factual errors.
|
||||
|
||||
Version 1.03: posted June 20 2010. Notes that Python 3.x is not covered,
|
||||
and that the HOWTO only covers 2.x.
|
||||
|
||||
|
||||
.. comment Describe Python 3.x support (new section? new document?)
|
||||
.. comment Additional topic: building Python w/ UCS2 or UCS4 support
|
||||
.. comment Describe obscure -U switch somewhere?
|
||||
.. comment Describe use of codecs.StreamRecoder and StreamReaderWriter
|
||||
|
|
Loading…
Reference in New Issue