Fill out the Unicode section, somewhat uncertainly

This commit is contained in:
Andrew M. Kuchling 2001-07-19 01:48:08 +00:00
parent 8cfa9055cf
commit f5fec3c88a
1 changed files with 23 additions and 6 deletions

View File

@ -340,11 +340,21 @@ and Tim Peters, with other fixes from the Python Labs crew.}
Python's Unicode support has been enhanced a bit in 2.2. Unicode
strings are usually stored as UCS-2, as 16-bit unsigned integers.
Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned integers
by supplying \longprogramopt{enable-unicode=ucs4} to the configure script.
XXX explain surrogates? I have to figure out what the changes mean to users.
Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned
integers, as its internal encoding by supplying
\longprogramopt{enable-unicode=ucs4} to the configure script. When
built to use UCS-4, in theory Python could handle Unicode characters
from U-00000000 to U-7FFFFFFF. Being able to use UCS-4 internally is
a necessary step to do that, but it's not the only step, and in Python
2.2alpha1 the work isn't complete yet. For example, the
\function{unichr()} function still only accepts values from 0 to
65535, and there's no \code{\e U} notation for embedding characters
greater than 65535 in a Unicode string literal. All this is the
province of the still-unimplemented PEP 261, ``Support for `wide'
Unicode characters''; consult it for further details, and please offer
comments and suggestions on the proposal it describes.
Another change is much simpler to explain.
Since their introduction, Unicode strings have supported an
\method{encode()} method to convert the string to a selected encoding
such as UTF-8 or Latin-1. A symmetric
@ -375,9 +385,16 @@ end
'furrfu'
\end{verbatim}
References: http://mail.python.org/pipermail/i18n-sig/2001-June/001107.html
and following thread.
\method{encode()} and \method{decode()} were implemented by
Marc-Andr\'e Lemburg. The changes to support using UCS-4 internally
were implemented by Fredrik Lundh and Martin von L\"owis.
\begin{seealso}
\seepep{261}{Support for `wide' Unicode characters}{PEP written by
Paul Prescod. Not yet accepted or fully implemented.}
\end{seealso}
%======================================================================
\section{PEP 227: Nested Scopes}