1997-12-17 11:30:07 -04:00
|
|
|
\section{Standard Module \sectcode{locale}}
|
1997-11-20 17:04:27 -04:00
|
|
|
\stmodindex{locale}
|
|
|
|
|
|
|
|
\label{module-locale}
|
|
|
|
|
1998-02-09 16:27:12 -04:00
|
|
|
The \code{locale} module opens access to the \POSIX{} locale database
|
|
|
|
and functionality. The \POSIX{} locale mechanism allows applications
|
|
|
|
to integrate certain cultural aspects into an applications, without
|
1997-11-20 17:04:27 -04:00
|
|
|
requiring the programmer to know all the specifics of each country
|
|
|
|
where the software is executed.
|
|
|
|
|
|
|
|
The \code{locale} module is implemented on top of the \code{_locale}
|
1997-12-17 11:30:07 -04:00
|
|
|
module, which in turn uses an ANSI \C{} locale implementation if
|
|
|
|
available.
|
|
|
|
\refbimodindex{_locale}
|
1997-11-20 17:04:27 -04:00
|
|
|
|
|
|
|
The \code{locale} module defines the following functions:
|
|
|
|
|
1998-02-13 02:58:54 -04:00
|
|
|
\setindexsubitem{(in module locale)}
|
1997-11-20 17:04:27 -04:00
|
|
|
|
|
|
|
\begin{funcdesc}{setlocale}{category\optional{\, value}}
|
|
|
|
If \var{value} is specified, modifies the locale setting for the
|
|
|
|
\var{category}. The available categories are listed in the data
|
|
|
|
description below. The value is the name of a locale. An empty string
|
|
|
|
specifies the user's default settings. If the modification of the
|
|
|
|
locale fails, the exception \code{locale.Error} is
|
|
|
|
raised. If successful, the new locale setting is returned.
|
|
|
|
|
|
|
|
If no \var{value} is specified, the current setting for the
|
|
|
|
\var{category} is returned.
|
|
|
|
|
1997-12-17 11:30:07 -04:00
|
|
|
\code{setlocale()} is not thread safe on most systems. Applications
|
1997-11-20 17:04:27 -04:00
|
|
|
typically start with a call of
|
1998-02-13 02:58:54 -04:00
|
|
|
\begin{verbatim}
|
1997-11-20 17:04:27 -04:00
|
|
|
import locale
|
|
|
|
locale.setlocale(locale.LC_ALL,"")
|
1998-02-13 02:58:54 -04:00
|
|
|
\end{verbatim}
|
1997-11-20 17:04:27 -04:00
|
|
|
This sets the locale for all categories to the user's default setting
|
|
|
|
(typically specified in the \code{LANG} environment variable). If the
|
|
|
|
locale is not changed thereafter, using multithreading should not
|
|
|
|
cause problems.
|
|
|
|
\end{funcdesc}
|
|
|
|
|
|
|
|
\begin{funcdesc}{localeconv}{}
|
|
|
|
Returns the database of of the local conventions as a dictionary. This
|
|
|
|
dictionary has the following strings as keys:
|
|
|
|
\begin{itemize}
|
|
|
|
\item \code{decimal_point} specifies the decimal point used in
|
|
|
|
floating point number representations for the \code{LC_NUMERIC}
|
|
|
|
category.
|
|
|
|
\item \code{grouping} is a sequence of numbers specifying at which
|
|
|
|
relative positions the \code{thousands_sep} is expected. If the
|
|
|
|
sequence is terminated with \code{locale.CHAR_MAX}, no further
|
1997-12-17 11:30:07 -04:00
|
|
|
grouping is performed. If the sequence terminates with a \code{0}, the last
|
1997-11-20 17:04:27 -04:00
|
|
|
group size is repeatedly used.
|
|
|
|
\item \code{thousands_sep} is the character used between groups.
|
|
|
|
\item \code{int_curr_symbol} specifies the international currency
|
|
|
|
symbol from the \code{LC_MONETARY} category.
|
|
|
|
\item \code{currency_symbol} is the local currency symbol.
|
|
|
|
\item \code{mon_decimal_point} is the decimal point used in monetary
|
|
|
|
values.
|
|
|
|
\item \code{mon_thousands_sep} is the separator for grouping of
|
|
|
|
monetary values.
|
|
|
|
\item \code{mon_grouping} has the same format as the \code{grouping}
|
|
|
|
key; it is used for monetary values.
|
|
|
|
\item \code{positive_sign} and \code{negative_sign} gives the sign
|
|
|
|
used for positive and negative monetary quantities.
|
|
|
|
\item \code{int_frac_digits} and \code{frac_digits} specify the number
|
|
|
|
of fractional digits used in the international and local formatting
|
|
|
|
of monetary values.
|
|
|
|
\item \code{p_cs_precedes} and \code{n_cs_precedes} specifies whether
|
|
|
|
the currency symbol precedes the value for positive or negative
|
|
|
|
values.
|
|
|
|
\item \code{p_sep_by_space} and \code{n_sep_by_space} specifies
|
|
|
|
whether there is a space between the positive or negative value and
|
|
|
|
the currency symbol.
|
|
|
|
\item \code{p_sign_posn} and \code{n_sign_posn} indicate how the
|
|
|
|
sign should be placed for positive and negative monetary values.
|
|
|
|
\end{itemize}
|
|
|
|
The possible values for \code{p_sign_posn} and \code{n_sign_posn}
|
|
|
|
are given below.
|
|
|
|
\begin{itemize}
|
|
|
|
\item 0 - Currency and value are surrounded by parentheses.
|
|
|
|
\item 1 - The sign should precede the value and currency symbol.
|
|
|
|
\item 2 - The sign should follow the value and currency symbol.
|
|
|
|
\item 3 - The sign should immediately precede the value.
|
|
|
|
\item 4 - The sign should immediately follow the value.
|
|
|
|
\item LC_MAX - nothing is specified in this locale.
|
|
|
|
\end{itemize}
|
|
|
|
\end{funcdesc}
|
|
|
|
|
|
|
|
\begin{funcdesc}{strcoll}{string1,string2}
|
1997-12-17 11:30:07 -04:00
|
|
|
Compares two strings according to the current \code{LC_COLLATE}
|
|
|
|
setting. As any other compare function, returns a negative, or a
|
|
|
|
positive value, or \code{0}, depending on whether \var{string1}
|
|
|
|
collates before or after \var{string2} or is equal to it.
|
1997-11-20 17:04:27 -04:00
|
|
|
\end{funcdesc}
|
|
|
|
|
|
|
|
\begin{funcdesc}{strxfrm}{string}
|
|
|
|
Transforms a string to one that can be used for the builtin function
|
1997-12-17 11:30:07 -04:00
|
|
|
\code{cmp()}, and still returns locale-aware results. This function can be
|
1997-11-20 17:04:27 -04:00
|
|
|
used when the same string is compared repeatedly, e.g. when collating
|
|
|
|
a sequence of strings.
|
|
|
|
\end{funcdesc}
|
|
|
|
|
|
|
|
\begin{funcdesc}{format}{format,val\optional{grouping=0}}
|
1997-12-17 11:30:07 -04:00
|
|
|
Formats a number \var{val} according to the current \code{LC_NUMERIC}
|
|
|
|
setting. The format follows the conventions of the \code{\%} operator. For
|
1997-11-20 17:04:27 -04:00
|
|
|
floating point values, the decimal point is modified if
|
|
|
|
appropriate. If \var{grouping} is true, also takes the grouping into
|
|
|
|
account.
|
|
|
|
\end{funcdesc}
|
|
|
|
|
|
|
|
\begin{funcdesc}{str}{float}
|
1997-12-17 11:30:07 -04:00
|
|
|
Formats a floating point number using the same format as the built-in
|
|
|
|
function \code{str(\var{float})}, but takes the decimal point into
|
|
|
|
account.
|
1997-11-20 17:04:27 -04:00
|
|
|
\end{funcdesc}
|
|
|
|
|
|
|
|
\begin{funcdesc}{atof}{string}
|
1997-12-17 11:30:07 -04:00
|
|
|
Converts a string to a floating point number, following the \code{LC_NUMERIC}
|
1997-11-20 17:04:27 -04:00
|
|
|
settings.
|
|
|
|
\end{funcdesc}
|
|
|
|
|
|
|
|
\begin{funcdesc}{atoi}{string}
|
1997-12-17 11:30:07 -04:00
|
|
|
Converts a string to an integer, following the \code{LC_NUMERIC} conventions.
|
1997-11-20 17:04:27 -04:00
|
|
|
\end{funcdesc}
|
|
|
|
|
|
|
|
\begin{datadesc}{LC_CTYPE}
|
1997-12-17 11:30:07 -04:00
|
|
|
\refstmodindex{string}
|
1997-11-20 17:04:27 -04:00
|
|
|
Locale category for the character type functions. Depending on the
|
|
|
|
settings of this category, the functions of module \code{string}
|
|
|
|
dealing with case change their behaviour.
|
|
|
|
\end{datadesc}
|
|
|
|
|
|
|
|
\begin{datadesc}{LC_COLLATE}
|
1997-12-17 11:30:07 -04:00
|
|
|
Locale category for sorting strings. The functions \code{strcoll()} and
|
|
|
|
\code{strxfrm()} of the \code{locale} module are affected.
|
1997-11-20 17:04:27 -04:00
|
|
|
\end{datadesc}
|
|
|
|
|
|
|
|
\begin{datadesc}{LC_TIME}
|
|
|
|
Locale category for the formatting of time. The function
|
1997-12-17 11:30:07 -04:00
|
|
|
\code{time.strftime()} follows these conventions.
|
1997-11-20 17:04:27 -04:00
|
|
|
\end{datadesc}
|
|
|
|
|
|
|
|
\begin{datadesc}{LC_MONETARY}
|
|
|
|
Locale category for formatting of monetary values. The available
|
1997-12-17 11:30:07 -04:00
|
|
|
options are available from the \code{localeconv()} function.
|
1997-11-20 17:04:27 -04:00
|
|
|
\end{datadesc}
|
|
|
|
|
|
|
|
\begin{datadesc}{LC_MESSAGES}
|
|
|
|
Locale category for message display. Python currently does not support
|
|
|
|
application specific locale-aware messages. Messages displayed by the
|
1997-12-17 11:30:07 -04:00
|
|
|
operating system, like those returned by \code{posix.strerror()} might
|
1997-11-20 17:04:27 -04:00
|
|
|
be affected by this category.
|
|
|
|
\end{datadesc}
|
|
|
|
|
|
|
|
\begin{datadesc}{LC_NUMERIC}
|
|
|
|
Locale category for formatting numbers. The functions
|
1997-12-17 11:30:07 -04:00
|
|
|
\code{format()}, \code{atoi()}, \code{atof()} and \code{str()} of the
|
|
|
|
\code{locale} module are affected by that category. All other numeric
|
|
|
|
formatting operations are not affected.
|
1997-11-20 17:04:27 -04:00
|
|
|
\end{datadesc}
|
|
|
|
|
|
|
|
\begin{datadesc}{LC_ALL}
|
|
|
|
Combination of all locale settings. If this flag is used when the
|
|
|
|
locale is changed, setting the locale for all categories is
|
|
|
|
attempted. If that fails for any category, no category is changed at
|
|
|
|
all. When the locale is retrieved using this flag, a string indicating
|
|
|
|
the setting for all categories is returned. This string can be later
|
|
|
|
used to restore the settings.
|
|
|
|
\end{datadesc}
|
|
|
|
|
|
|
|
\begin{datadesc}{CHAR_MAX}
|
|
|
|
This is a symbolic constant used for different values returned by
|
1997-12-17 11:30:07 -04:00
|
|
|
\code{localeconv()}.
|
1997-11-20 17:04:27 -04:00
|
|
|
\end{datadesc}
|
|
|
|
|
|
|
|
\begin{excdesc}{Error}
|
1997-12-17 11:30:07 -04:00
|
|
|
Exception raised when \code{setlocale()} fails.
|
1997-11-20 17:04:27 -04:00
|
|
|
\end{excdesc}
|
|
|
|
|
|
|
|
Example:
|
|
|
|
|
1998-02-13 02:58:54 -04:00
|
|
|
\begin{verbatim}
|
1997-11-20 17:04:27 -04:00
|
|
|
>>> import locale
|
1998-02-22 00:41:51 -04:00
|
|
|
>>> loc = locale.setlocale(locale.LC_ALL) # get current locale
|
|
|
|
>>> locale.setlocale(locale.LC_ALL, "de") # use German locale
|
|
|
|
>>> locale.strcoll("f\344n", "foo") # compare a string containing an umlaut
|
|
|
|
>>> locale.setlocale(locale.LC_ALL, "") # use user's preferred locale
|
|
|
|
>>> locale.setlocale(locale.LC_ALL, "C") # use default (C) locale
|
|
|
|
>>> locale.setlocale(locale.LC_ALL, loc) # restore saved locale
|
1998-02-13 02:58:54 -04:00
|
|
|
\end{verbatim}
|
1998-02-22 00:23:51 -04:00
|
|
|
|
|
|
|
\subsection{Background, details, hints, tips and caveats}
|
|
|
|
|
|
|
|
The C standard defines the locale as a program-wide property that may
|
|
|
|
be relatively expensive to change. On top of that, some
|
|
|
|
implementation are broken in such a way that frequent locale changes
|
|
|
|
may cause core dumps. This makes the locale somewhat painful to use
|
|
|
|
correctly.
|
|
|
|
|
|
|
|
Initially, when a program is started, the locale is the "C" locale, no
|
|
|
|
matter what the user's preferred locale is. The program must
|
|
|
|
explicitly say that it wants the user's preferred locale settings by
|
|
|
|
calling \code{setlocale(LC_ALL, "")}.
|
|
|
|
|
|
|
|
It is generally a bad idea to call \code{setlocale()} in some library
|
|
|
|
routine, since as a side effect it affects the entire program. Saving
|
|
|
|
and restoring it is almost as bad: it is expensive and affects other
|
|
|
|
threads that happen to run before the settings have been restored.
|
|
|
|
|
|
|
|
If, when coding a module for general use, you need a locale
|
|
|
|
independent version of an operation that is affected by the locale
|
|
|
|
(e.g. \code{string.lower()}, or certain formats used with
|
|
|
|
\code{time.strftime()})), you will have to find a way to do it without
|
|
|
|
using the standard library routine. Even better is convincing
|
|
|
|
yourself that using locale settings is okay. Only as a last should
|
|
|
|
you document that your module is not compatible with non-C locale
|
|
|
|
settings.
|
|
|
|
|
|
|
|
The case conversion functions in the \code{string} and \code{strop}
|
|
|
|
modules are affected by the locale settings. When a call to the
|
|
|
|
\code{setlocale()} function changes the \code{LC_CTYPE} settings, the
|
|
|
|
variables \code{string.lowercase}, \code{string.uppercase} and
|
|
|
|
\code{string.letters} (and their counterparts in \code{strop}) are
|
|
|
|
recalculated. Note that this code that uses these variable through
|
|
|
|
\code{from ... import ...}, e.g. \code{from string import letters}, is
|
|
|
|
not affected by subsequent \code{setlocale()} calls.
|
|
|
|
|
|
|
|
The only way to perform numeric operations according to the locale
|
|
|
|
is to use the special functions defined by this module:
|
|
|
|
\code{atof()}, \code{atoi()}, \code{format()}, \code{str()}.
|
|
|
|
|
|
|
|
\code{For extension writers and programs that embed Python}
|
|
|
|
|
|
|
|
Extension modules should never call \code{setlocale()}, except to find
|
|
|
|
out what the current locale is. But since the return value can only
|
|
|
|
be used portably to restore it, that is not very useful (except
|
|
|
|
perhaps to find out whether or not the locale is ``C'').
|
|
|
|
|
|
|
|
When Python is embedded in an application, if the application sets the
|
|
|
|
locale to something specific before initializing Python, that is
|
|
|
|
generally okay, and Python will use whatever locale is set,
|
|
|
|
\strong{except} that the \code{LC_NUMERIC} locale should always be
|
|
|
|
``C''.
|
|
|
|
|
|
|
|
The \code{setlocale()} function in the \code{locale} module contains
|
|
|
|
gives the Python progammer the impression that you can manipulate the
|
|
|
|
\code{LC_NUMERIC} locale setting, but this not the case at the C
|
|
|
|
level: C code will always find that the \code{LC_NUMERIC} locale
|
|
|
|
setting is ``C''. This is because too much would break when the
|
|
|
|
decimal point character is set to something else than a period
|
|
|
|
(e.g. the Python parser would break). Caveat: threads that run
|
|
|
|
without holding Python's global interpreter lock may occasionally find
|
|
|
|
that the numeric locale setting differs; this is because the only
|
|
|
|
portable way to implement this feature is to set the numeric locale
|
|
|
|
settings to what the user requests, extract the relevant
|
|
|
|
characteristics, and then restore the ``C'' numeric locale.
|
|
|
|
|
|
|
|
When Python code uses the \code{locale} module to change the locale,
|
|
|
|
this also affect the embedding application. If the embedding
|
|
|
|
application doesn't want this to happen, it should remove the
|
|
|
|
\code{_locale} extension module (which does all the work) from the
|
|
|
|
table of built-in modules in the \code{config.c} file, and make sure
|
|
|
|
that the \code{_locale} module is not accessible as a shared library.
|