discussed recently in python-dev:
In _locale module:
- bind_textdomain_codeset() binding
In gettext module:
- bind_textdomain_codeset() function
- lgettext(), lngettext(), ldgettext(), ldngettext(),
which return translated strings encoded in
preferred system encoding, if
bind_textdomain_codeset() was not used.
- Added equivalent functionality in translate()
function and catalog classes.
Every change was also documented.
__init__(): Removed since we no longer need the coerce flag.
Message ids and strings are now always coerced to Unicode, /if/
the catalog specified a charset parameter.
gettext(), ngettext(): Since the message strings are Unicodes in
the catalog, coerce back to encoded 8-bit strings on return.
ugettext(), ungettext(): Coerce the message ids to Unicode when
there's no entry for the id in the catalog.
Minor code cleanups; use booleans where appropriate.
to iso-8859-1.
GNUTranslations._parse(): Back out the addition of a test for
Project-ID-Version in the metadata. This was deliberately removed in
response to SF patch #700839.
Also, re-organize the code in _parse() so we parse the metadata header
containing the charset parameter before we try to decode any strings
using charset.
- Expose NullTranslations and GNUTranslations to __all__
- Set the default charset to iso-8859-1. It used to be None, which
would cause problems with .ugettext() if the file had no charset
parameter. Arguably, the po/mo file would be broken, but I still think
iso-8859-1 is a reasonable default.
- Add a "coerce" default argument to GNUTranslations's constructor. The
reason for this is that in Zope, we want all msgids and msgstrs to be
Unicode. For the latter, we could use .ugettext() but there isn't
currently a mechanism for Unicode-ifying msgids.
The plan then is that the charset parameter specifies the encoding for
both the msgids and msgstrs, and both are decoded to Unicode when read.
For example, we might encode po files with utf-8. I think the GNU
gettext tools don't care.
Since this could potentially break code [*] that wants to use the
encoded interface .gettext(), the constructor flag is added, defaulting
to False. Most code I suspect will want to set this to True and use
.ugettext().
- A few other minor changes from the Zope project, including asserting
that a zero-length msgid must have a Project-ID-Version header for it to
be counted as the metadata record.
Betlehem, verified by Peter Funk. Fixes preservation of language
search order lost due to use of dictionary keys instead of a list.
Closes SF bug #116964.