From f5fec3c88a2fbb025fba7f1625240058a875c8ea Mon Sep 17 00:00:00 2001 From: "Andrew M. Kuchling" Date: Thu, 19 Jul 2001 01:48:08 +0000 Subject: [PATCH] Fill out the Unicode section, somewhat uncertainly --- Doc/whatsnew/whatsnew22.tex | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/Doc/whatsnew/whatsnew22.tex b/Doc/whatsnew/whatsnew22.tex index f181dcfd1c4..96b0972ae13 100644 --- a/Doc/whatsnew/whatsnew22.tex +++ b/Doc/whatsnew/whatsnew22.tex @@ -340,11 +340,21 @@ and Tim Peters, with other fixes from the Python Labs crew.} Python's Unicode support has been enhanced a bit in 2.2. Unicode strings are usually stored as UCS-2, as 16-bit unsigned integers. -Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned integers -by supplying \longprogramopt{enable-unicode=ucs4} to the configure script. - -XXX explain surrogates? I have to figure out what the changes mean to users. +Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned +integers, as its internal encoding by supplying +\longprogramopt{enable-unicode=ucs4} to the configure script. When +built to use UCS-4, in theory Python could handle Unicode characters +from U-00000000 to U-7FFFFFFF. Being able to use UCS-4 internally is +a necessary step to do that, but it's not the only step, and in Python +2.2alpha1 the work isn't complete yet. For example, the +\function{unichr()} function still only accepts values from 0 to +65535, and there's no \code{\e U} notation for embedding characters +greater than 65535 in a Unicode string literal. All this is the +province of the still-unimplemented PEP 261, ``Support for `wide' +Unicode characters''; consult it for further details, and please offer +comments and suggestions on the proposal it describes. +Another change is much simpler to explain. Since their introduction, Unicode strings have supported an \method{encode()} method to convert the string to a selected encoding such as UTF-8 or Latin-1. A symmetric @@ -375,9 +385,16 @@ end 'furrfu' \end{verbatim} -References: http://mail.python.org/pipermail/i18n-sig/2001-June/001107.html -and following thread. +\method{encode()} and \method{decode()} were implemented by +Marc-Andr\'e Lemburg. The changes to support using UCS-4 internally +were implemented by Fredrik Lundh and Martin von L\"owis. +\begin{seealso} + +\seepep{261}{Support for `wide' Unicode characters}{PEP written by +Paul Prescod. Not yet accepted or fully implemented.} + +\end{seealso} %====================================================================== \section{PEP 227: Nested Scopes}