Commit Graph

20 Commits

Author SHA1 Message Date
Marc-André Lemburg c83dddf7fe Let the default encodings search function lookup aliases before trying the codec import. This allows applications to install codecs which override (non-special-cased) builtin codecs. 2004-01-20 09:40:14 +00:00
Marc-André Lemburg 2820125935 Remove usage of re module from encodings package search function. 2003-05-16 17:07:51 +00:00
Tim Peters 0eadaac7dc Whitespace normalization. 2003-04-24 16:02:54 +00:00
Martin v. Löwis 7fb697b5d2 Revert Patch #670715: iconv support. 2003-04-03 04:49:12 +00:00
Neal Norwitz 6156a2d07c Handle iconv initialization erorrs 2003-02-28 20:00:42 +00:00
Martin v. Löwis 9789aefa61 Patch #670715: Universal Unicode Codec for POSIX iconv. 2003-01-26 11:30:36 +00:00
Tim Peters 6578dc925f Whitespace normalization. 2002-12-24 18:31:27 +00:00
Guido van Rossum c8c6065231 When looking for an alias, first look for the normalized name (which
still may contain dots), then if that doesn't exist look for the name
with dots replaced by underscores.  This is a little more forgiving.
2002-10-04 20:49:05 +00:00
Marc-André Lemburg 7012673d67 Extending the encoding name normalization to handle more non-alphanumeric
characters.
2002-10-04 11:45:38 +00:00
Tim Peters 469cdad822 Whitespace normalization. 2002-08-08 20:19:19 +00:00
Martin v. Löwis b9e0764d8b Revert #571603 since it is ok to import codecs that are not subdirectories
of encodings. Skip modules that don't have a getregentry function.
2002-07-29 14:05:24 +00:00
Martin v. Löwis fc4c24c142 Patch #571603: Refer to encodings package explicitly. 2002-07-28 11:31:33 +00:00
Marc-André Lemburg a0af63b242 Corrected import behaviour for codecs which live outside the encodings
package.
2002-02-11 17:43:46 +00:00
Marc-André Lemburg 462004e90a Add IANA character set aliases to the encodings alias dictionary
and make alias lookup lazy.

Note that only those IANA character set aliases were added for which
we actually have codecs in the encodings package.
2002-02-10 21:36:20 +00:00
Marc-André Lemburg 816a1b75b7 Fixed search function error reporting in the encodings package
__init__.py module to raise errors which can be catched as LookupErrors
as well as SystemErrors.

Modified the error messages to include more information about the
failing module.
2001-09-19 11:52:07 +00:00
Marc-André Lemburg 988ad2bdff Changed .getaliases() support to register the new aliases in the
encodings package aliases mapping dictionary rather than in the
internal cache used by the search function.

This enables aliases to take advantage of the full normalization
process applied to encoding names which was previously not available.

The patch restricts alias registration to new aliases. Existing
aliases cannot be overridden anymore.
2000-12-12 14:45:35 +00:00
Marc-André Lemburg 7ebb92ea66 Marc-Andre Lemburg <mal@lemburg.com>:
Removed import of string module -- use string methods directly.
Thanks to Finn Bock.
2000-06-13 12:04:05 +00:00
Guido van Rossum 9e896b37c7 Marc-Andre's third try at this bulk patch seems to work (except that
his copy of test_contains.py seems to be broken -- the lines he
deleted were already absent).  Checkin messages:


New Unicode support for int(), float(), complex() and long().

- new APIs PyInt_FromUnicode() and PyLong_FromUnicode()
- added support for Unicode to PyFloat_FromString()
- new encoding API PyUnicode_EncodeDecimal() which converts
  Unicode to a decimal char* string (used in the above new
  APIs)
- shortcuts for calls like int(<int object>) and float(<float obj>)
- tests for all of the above

Unicode compares and contains checks:
- comparing Unicode and non-string types now works; TypeErrors
  are masked, all other errors such as ValueError during
  Unicode coercion are passed through (note that PyUnicode_Compare
  does not implement the masking -- PyObject_Compare does this)
- contains now works for non-string types too; TypeErrors are
  masked and 0 returned; all other errors are passed through

Better testing support for the standard codecs.

Misc minor enhancements, such as an alias dbcs for the mbcs codec.

Changes:
- PyLong_FromString() now applies the same error checks as
  does PyInt_FromString(): trailing garbage is reported
  as error and not longer silently ignored. The only characters
  which may be trailing the digits are 'L' and 'l' -- these
  are still silently ignored.
- string.ato?() now directly interface to int(), long() and
  float(). The error strings are now a little different, but
  the type still remains the same. These functions are now
  ready to get declared obsolete ;-)
- PyNumber_Int() now also does a check for embedded NULL chars
  in the input string; PyNumber_Long() already did this (and
  still does)

Followed by:

Looks like I've gone a step too far there... (and test_contains.py
seem to have a bug too).

I've changed back to reporting all errors in PyUnicode_Contains()
and added a few more test cases to test_contains.py (plus corrected
the join() NameError).
2000-04-05 20:11:21 +00:00
Barry Warsaw 51ac58039f On 17-Mar-2000, Marc-Andre Lemburg said:
Attached you find an update of the Unicode implementation.

    The patch is against the current CVS version. I would appreciate
    if someone with CVS checkin permissions could check the changes
    in.

    The patch contains all bugs and patches sent this week and also
    fixes a leak in the codecs code and a bug in the free list code
    for Unicode objects (which only shows up when compiling Python
    with Py_DEBUG; thanks to MarkH for spotting this one).
2000-03-20 16:36:48 +00:00
Guido van Rossum 0229bf6001 Marc-Andre Lemburg: Unicode encodings. 2000-03-10 23:17:24 +00:00