Commit Graph

98 Commits

Author SHA1 Message Date
Walter Dörwald 78a0be6ab3 Add a BufferedIncrementalEncoder class that can be used for implementing
an incremental encoder that must retain part of the data between calls
to the encode() method.

Fix the incremental encoder and decoder for the IDNA encoding.

This closes SF patch #1453235.
2006-04-14 18:25:39 +00:00
Walter Dörwald a40cf31de6 Make error message less misleading for u"a..b".encode("idna"). 2006-04-14 17:00:36 +00:00
Walter Dörwald 6493699c0d Make raise statements PEP 8 compatible. 2006-04-14 15:22:27 +00:00
Walter Dörwald a8da934069 Whitespace. 2006-03-27 09:02:04 +00:00
Hye-Shik Chang e2ac4abd01 Patch #1443155: Add the incremental codecs support for CJK codecs.
(reviewed by Walter Dörwald)
2006-03-26 02:34:59 +00:00
Guido van Rossum f8480a7856 Instead of relative imports, use (implicitly) absolute ones. 2006-03-15 23:08:13 +00:00
Tim Peters f99b8162a2 Whitespace normalization. 2006-03-15 18:08:37 +00:00
Walter Dörwald 13ed60b504 Fix typo. 2006-03-15 13:36:50 +00:00
Walter Dörwald abb02e5994 Patch #1436130: codecs.lookup() now returns a CodecInfo object (a subclass
of tuple) that provides incremental decoders and encoders (a way to use
stateful codecs without the stream API). Functions
codecs.getincrementaldecoder() and codecs.getincrementalencoder() have
been added.
2006-03-15 11:35:15 +00:00
Guido van Rossum 87de069e4e Use relative imports in a few places where I noticed the need.
(Ideally, all packages in Python 2.5 will use the relative import
syntax for all their relative import needs.)
2006-03-15 04:33:54 +00:00
Martin v. Löwis 5bd7c02298 Avoid forward-declaring the methods array.
Rename unicodedata.db* to unicodedata.ucd*
2006-03-10 11:20:04 +00:00
Martin v. Löwis 480f1bb67b Update Unicode database to Unicode 4.1. 2006-03-09 23:38:20 +00:00
Marc-André Lemburg fe4b34cc4b Fix the encodings package codec search function to only search
inside its own package. Fixes problem reported in patch #1433198.

Add codec search function for codec test codec.
2006-02-19 15:22:22 +00:00
Martin v. Löwis 412ed3b8a7 Patch #1177307: UTF-8-Sig codec. 2006-01-08 10:45:39 +00:00
Tim Peters 536cf99536 Whitespace normalization. 2005-12-25 23:18:31 +00:00
Marc-André Lemburg d9cf593b49 Cosmetic change: make all hex literals use upper case hex so that they
look more like the Unicode Consortium files.

Add ending new-line to all source files.
2005-10-24 12:14:59 +00:00
Marc-André Lemburg 3c72ded23d Removed the decoding_map from the codecs where this is possible.
Replaced the tis_620, cp1140 and koi8_u codecs with new ones
based on custom mapping files.
2005-10-24 12:07:49 +00:00
Marc-André Lemburg 0f00ba8bd8 Replace the old EBCDIC codecs with new ones using the decoding table. 2005-10-21 14:35:35 +00:00
Marc-André Lemburg 7797be7b3b Alias iso8859_1 to latin_1 which is the same encoding, but has
a much faster codec implementation.
2005-10-21 14:02:28 +00:00
Marc-André Lemburg 75c9e8392e Add a few more Mac OS encodings. The mapping tables for these are
available at ftp.unicode.org.
2005-10-21 13:58:32 +00:00
Marc-André Lemburg a1129f4b9b Replace the old charmap codecs with new ones generated from the current
mapping tables available at ftp.unicode.org.

These new codecs include and use character decoding tables which speeds
up decoding by a few factors.
2005-10-21 13:49:12 +00:00
Walter Dörwald 007f8dfde2 Bug #1245379: Add "unicode-1-1-utf-7" as an alias for "utf-7" as specified
by RFC 1642.
2005-10-09 19:42:27 +00:00
Neal Norwitz 4ce69a5b06 No need to import exceptions, they are builtins 2005-09-01 00:45:28 +00:00
Martin v. Löwis 8b59514e57 Make IDNA return an empty string when the input is empty. Fixes #1163178.
Will backport to 2.4.
2005-08-25 11:03:38 +00:00
Walter Dörwald 729c31f5c3 Reset internal buffers when seek() is called. This fixes SF bug #1156259. 2005-03-14 19:06:30 +00:00
Walter Dörwald e1a0391b49 Fix wrong variable name. 2004-12-29 13:11:10 +00:00
Marc-André Lemburg 9ab8818c87 Rearranged mappings to value sorting order. 2004-12-10 21:54:35 +00:00
Walter Dörwald 69652035bc SF patch #998993: The UTF-8 and the UTF-16 stateful decoders now support
decoding incomplete input (when the input stream is temporarily exhausted).
codecs.StreamReader now implements buffering, which enables proper
readline support for the UTF-16 decoders. codecs.StreamReader.read()
has a new argument chars which specifies the number of characters to
return. codecs.StreamReader.readline() and codecs.StreamReader.readlines()
have a new argument keepends. Trailing "\n"s will be stripped from the lines
if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and
PyUnicode_DecodeUTF16Stateful.
2004-09-07 20:24:22 +00:00
Tim Peters d1b7827216 Whitespace normalization. 2004-08-07 06:03:09 +00:00
Marc-André Lemburg c759f070ef Added new codecs and aliases for ISO_8859-11, ISO_8859-16 and
TIS-620.

Closes SF bug #1001895: Adding missing ISO 8859 codecs, especially Thai.
2004-08-05 12:43:30 +00:00
Tim Peters c0cbc8611b Whitespace normalization. 2004-07-31 21:17:37 +00:00
Marc-André Lemburg 17b6d28c64 New codec: [ 996067 ] hp-roman8 codec 2004-07-28 15:37:54 +00:00
Marc-André Lemburg cd8a4cb3d3 Added new codec hp-roman8 submitted as patch [ 996067 ] hp-roman8 codec. 2004-07-28 15:35:29 +00:00
Hye-Shik Chang 2bb146f2f4 Bring CJKCodecs 1.1 into trunk. This completely reorganizes source
and installed layouts to make maintenance simple and easy.  And it
also adds four new codecs; big5hkscs, euc-jis-2004, shift-jis-2004
and iso2022-jp-2004.
2004-07-18 03:06:29 +00:00
Tim Peters 4e0e1b6a54 Whitespace normalization. 2004-07-07 20:54:48 +00:00
Martin v. Löwis 708b4dacf4 Convert input to a string object. Fixes #909230.
Backported 2.3.
2004-03-23 23:40:36 +00:00
Hye-Shik Chang 5c5316f111 Add a new unicode codec: ptcp154 (Kazakh) 2004-03-19 08:06:07 +00:00
Marc-André Lemburg 361d66de5d Fix wrong character mapping in koi8_u: SF bug #902501. 2004-02-23 09:00:43 +00:00
Marc-André Lemburg c83dddf7fe Let the default encodings search function lookup aliases before trying the codec import. This allows applications to install codecs which override (non-special-cased) builtin codecs. 2004-01-20 09:40:14 +00:00
Marc-André Lemburg 5c94d33077 Add some more code page aliases needed for completeness. 2004-01-20 09:38:52 +00:00
Hye-Shik Chang b619e4b36c Fix a typo: s/iso_3022/iso2022/ 2004-01-20 09:33:30 +00:00
Hye-Shik Chang 3e2a306920 Add CJK codecs support as discussed on python-dev. (SF #873597)
Several style fixes are suggested by Martin v. Loewis and
Marc-Andre Lemburg. Thanks!
2004-01-17 14:29:29 +00:00
Raymond Hettinger 0ad142aba0 Revert previous change. MAL preferred the old version. 2003-12-01 13:26:46 +00:00
Raymond Hettinger a45517065a Simplifed the code. 2003-12-01 10:41:02 +00:00
Raymond Hettinger 9edae346dd Fix typo in the comments. 2003-09-24 03:57:36 +00:00
Raymond Hettinger 9a80c5dbc4 Added codec for bz2 compression. 2003-09-23 20:21:01 +00:00
Martin v. Löwis 0d8e16c7ad Support trailing dots in DNS names. Fixes #782510. Will backport to 2.3. 2003-08-05 06:19:47 +00:00
Skip Montanaro 5d6ceb4aae more generic reference to python interpreter 2003-07-22 14:37:42 +00:00
Marc-André Lemburg 2820125935 Remove usage of re module from encodings package search function. 2003-05-16 17:07:51 +00:00
Tim Peters 0eadaac7dc Whitespace normalization. 2003-04-24 16:02:54 +00:00