Commit Graph

61 Commits

Author SHA1 Message Date
Marc-André Lemburg 361d66de5d Fix wrong character mapping in koi8_u: SF bug #902501. 2004-02-23 09:00:43 +00:00
Marc-André Lemburg c83dddf7fe Let the default encodings search function lookup aliases before trying the codec import. This allows applications to install codecs which override (non-special-cased) builtin codecs. 2004-01-20 09:40:14 +00:00
Marc-André Lemburg 5c94d33077 Add some more code page aliases needed for completeness. 2004-01-20 09:38:52 +00:00
Hye-Shik Chang b619e4b36c Fix a typo: s/iso_3022/iso2022/ 2004-01-20 09:33:30 +00:00
Hye-Shik Chang 3e2a306920 Add CJK codecs support as discussed on python-dev. (SF #873597)
Several style fixes are suggested by Martin v. Loewis and
Marc-Andre Lemburg. Thanks!
2004-01-17 14:29:29 +00:00
Raymond Hettinger 0ad142aba0 Revert previous change. MAL preferred the old version. 2003-12-01 13:26:46 +00:00
Raymond Hettinger a45517065a Simplifed the code. 2003-12-01 10:41:02 +00:00
Raymond Hettinger 9edae346dd Fix typo in the comments. 2003-09-24 03:57:36 +00:00
Raymond Hettinger 9a80c5dbc4 Added codec for bz2 compression. 2003-09-23 20:21:01 +00:00
Martin v. Löwis 0d8e16c7ad Support trailing dots in DNS names. Fixes #782510. Will backport to 2.3. 2003-08-05 06:19:47 +00:00
Skip Montanaro 5d6ceb4aae more generic reference to python interpreter 2003-07-22 14:37:42 +00:00
Marc-André Lemburg 2820125935 Remove usage of re module from encodings package search function. 2003-05-16 17:07:51 +00:00
Tim Peters 0eadaac7dc Whitespace normalization. 2003-04-24 16:02:54 +00:00
Martin v. Löwis 2548c730c1 Implement IDNA (Internationalized Domain Names in Applications). 2003-04-18 10:39:54 +00:00
Martin v. Löwis 7fb697b5d2 Revert Patch #670715: iconv support. 2003-04-03 04:49:12 +00:00
Neal Norwitz 6156a2d07c Handle iconv initialization erorrs 2003-02-28 20:00:42 +00:00
Martin v. Löwis 9789aefa61 Patch #670715: Universal Unicode Codec for POSIX iconv. 2003-01-26 11:30:36 +00:00
Tim Peters 6578dc925f Whitespace normalization. 2002-12-24 18:31:27 +00:00
Neal Norwitz d8407a7031 Add new encoding for Ukrainian Cyrillic 2002-10-17 22:15:33 +00:00
Guido van Rossum c8c6065231 When looking for an alias, first look for the normalized name (which
still may contain dots), then if that doesn't exist look for the name
with dots replaced by underscores.  This is a little more forgiving.
2002-10-04 20:49:05 +00:00
Marc-André Lemburg 8dc5ff2e5a Undo the removal. Guido mentioned that the encoding name is in active
by some email headers.
2002-10-04 16:30:42 +00:00
Marc-André Lemburg 68fc27385d Remove unneeded alias. 2002-10-04 15:57:03 +00:00
Marc-André Lemburg a40ea75625 Fix doc-string. 2002-10-04 11:58:24 +00:00
Marc-André Lemburg 9d158bb66f Adapt lookup names to new more general encoding name normalization
scheme.
2002-10-04 11:51:39 +00:00
Marc-André Lemburg 7012673d67 Extending the encoding name normalization to handle more non-alphanumeric
characters.
2002-10-04 11:45:38 +00:00
Guido van Rossum 479f3d3d2a Oops, must convert hyphens to underscores in keys of aliases dict. 2002-09-26 20:08:23 +00:00
Guido van Rossum b7a88e533d Add yet another alias for ASCII found in the field. Will backport to
2.2.2.
2002-09-25 16:44:34 +00:00
Tim Peters 280488b9a3 Whitespace normalization. 2002-08-23 18:19:30 +00:00
Martin v. Löwis 8a8da798a5 Patch #505705: Remove eval in pickle and cPickle. 2002-08-14 07:46:28 +00:00
Tim Peters 469cdad822 Whitespace normalization. 2002-08-08 20:19:19 +00:00
Martin v. Löwis b9e0764d8b Revert #571603 since it is ok to import codecs that are not subdirectories
of encodings. Skip modules that don't have a getregentry function.
2002-07-29 14:05:24 +00:00
Martin v. Löwis fc4c24c142 Patch #571603: Refer to encodings package explicitly. 2002-07-28 11:31:33 +00:00
Marc-André Lemburg a83ffa89f2 Palm OS encoding from Sjoerd Mullender 2002-07-12 14:36:22 +00:00
Marc-André Lemburg 3ccb09cba3 Fix for bug #222395: UTF-16 et al. don't handle .readline().
They now raise an NotImplementedError to hint to the truth ;-)
2002-04-05 12:12:00 +00:00
Marc-André Lemburg a0af63b242 Corrected import behaviour for codecs which live outside the encodings
package.
2002-02-11 17:43:46 +00:00
Marc-André Lemburg 462004e90a Add IANA character set aliases to the encodings alias dictionary
and make alias lookup lazy.

Note that only those IANA character set aliases were added for which
we actually have codecs in the encodings package.
2002-02-10 21:36:20 +00:00
Martin v. Löwis 79d802d58c Patch #487275: Add windows-1251 charset alias. 2001-12-02 12:24:19 +00:00
Marc-André Lemburg 35b0cb09d7 Python part of the UTF-7 codec by Brian Quinlan. 2001-09-20 12:56:14 +00:00
Marc-André Lemburg c60e6f7771 Patch #435971: UTF-7 codec by Brian Quinlan. 2001-09-20 10:35:46 +00:00
Marc-André Lemburg 26e3b681b2 Patch #462635 by Andrew Kuchling correcting bugs in the new
codecs -- the self argument does matter for Python functions (it
does not for C functions which most other codecs use).
2001-09-20 10:33:38 +00:00
Marc-André Lemburg 816a1b75b7 Fixed search function error reporting in the encodings package
__init__.py module to raise errors which can be catched as LookupErrors
as well as SystemErrors.

Modified the error messages to include more information about the
failing module.
2001-09-19 11:52:07 +00:00
Andrew M. Kuchling fd6608bcea Fix typo (PyChecker) 2001-08-13 13:48:55 +00:00
Martin v. Löwis 9b75dca192 Expose nl_langinfo through locale where available. 2001-08-10 13:58:50 +00:00
Marc-André Lemburg 92b550cdd8 This patch by Martin v. Loewis changes the UTF-16 codec to only
write a BOM at the start of the stream and also to only read it as
BOM at the start of a stream.

Subsequent reading/writing of BOMs will read/write the BOM as ZWNBSP
character. This is in sync with the Unicode specifications.

Note that UTF-16 files will now *have* to start with a BOM mark
in order to be readable by the codec.
2001-06-19 20:07:51 +00:00
Martin v. Löwis 13b8bc5478 Patch #429957: Add support for cp1140, which is identical to cp037,
with the addition of the euro character.
Also added a few EDBDIC aliases.
2001-06-07 19:39:25 +00:00
Mark Hammond 194bfb2805 Add some useful Windows encodings - patch #423221. 2001-06-04 02:31:23 +00:00
Marc-André Lemburg 716cf91839 Moved the encoding map building logic from the individual mapping
codec files to codecs.py and added logic so that multi mappings
in the decoding maps now result in mappings to None (undefined mapping)
in the encoding maps.
2001-05-16 09:41:45 +00:00
Guido van Rossum acfdf156aa Add quoted-printable codec 2001-05-15 15:34:07 +00:00
Marc-André Lemburg 2d9204199f This patch changes the way the string .encode() method works slightly
and introduces a new method .decode().

The major change is that strg.encode() will no longer try to convert
Unicode returns from the codec into a string, but instead pass along
the Unicode object as-is. The same is now true for all other codec
return types. The underlying C APIs were changed accordingly.

Note that even though this does have the potential of breaking
existing code, the chances are low since conversion from Unicode
previously took place using the default encoding which is normally
set to ASCII rendering this auto-conversion mechanism useless for
most Unicode encodings.

The good news is that you can now use .encode() and .decode() with
much greater ease and that the door was opened for better accessibility
of the builtin codecs.

As demonstration of the new feature, the patch includes a few new
codecs which allow string to string encoding and decoding (rot13,
hex, zip, uu, base64).

Written by Marc-Andre Lemburg. Copyright assigned to the PSF.
2001-05-15 12:00:02 +00:00
Marc-André Lemburg a866df806d This patch changes the default behaviour of the builtin charmap
codec to not apply Latin-1 mappings for keys which are not found
in the mapping dictionaries, but instead treat them as undefined
mappings.

The patch was originally written by Martin v. Loewis with some
additional (cosmetic) changes and an updated test script
by Marc-Andre Lemburg.

The standard codecs were recreated from the most current files
available at the Unicode.org site using the Tools/scripts/gencodec.py
tool.

This patch closes the bugs #116285 and #119960.
2001-01-03 21:29:14 +00:00