Walter Dörwald
f481598cd0
Backport r59049:
...
Fix for #1444 : utf_8_sig.StreamReader was (indirectly through decode())
calling codecs.utf_8_decode() with final==True, which falled with incomplete
byte sequences. Fix and test by James G. Sack.
2007-11-19 12:43:39 +00:00
Walter Dörwald
93a3603c67
Backport r54786:
...
Fix utf-8-sig incremental decoder, which didn't recognise a BOM when the
first chunk fed to the decoder started with a BOM, but was longer than 3 bytes.
2007-04-21 10:31:43 +00:00
Walter Dörwald
9ff1d39402
Backport checkin:
...
Change decode() so that it works with a buffer (i.e. unicode(..., 'utf-8-sig'))
SF bug #1601501 .
2006-11-23 05:06:31 +00:00
Georg Brandl
c68d2cc3f2
Bug #1586613 : fix zlib and bz2 codecs' incremental en/decoders.
...
(backport from rev. 52529)
2006-10-29 14:39:13 +00:00
Georg Brandl
1206a933cc
Bug #1446043 : correctly raise a LookupError if an encoding name given
...
to encodings.search_function() contains a dot.
(backport from rev. 52075)
2006-09-30 11:22:35 +00:00
Martin v. Löwis
961b91bd3c
Correction of patch #1455898 : In the mbcs decoder, set final=False
...
for stream decoder, but final=True for the decode function.
2006-08-02 13:53:55 +00:00
Martin v. Löwis
0eac11826a
Make import/lookup of mbcs fail on non-Windows systems.
2006-06-15 06:45:05 +00:00
Martin v. Löwis
d825143be1
Patch #1455898 : Incremental mode for "mbcs" codec.
2006-06-14 05:21:04 +00:00
Walter Dörwald
c6f5b3ad6c
errors is an attribute in the incremental decoder
...
not an argument.
2006-06-13 12:04:43 +00:00
Walter Dörwald
6b6e2bb8b1
Fix passing errors to the encoder and decoder functions.
2006-06-13 12:02:12 +00:00
Tim Peters
c7d14452a4
Whitespace normalization.
2006-06-04 23:43:53 +00:00
Martin v. Löwis
3f767795f6
Patch #1359618 : Speed-up charmap encoder.
2006-06-04 19:36:28 +00:00
Walter Dörwald
78a0be6ab3
Add a BufferedIncrementalEncoder class that can be used for implementing
...
an incremental encoder that must retain part of the data between calls
to the encode() method.
Fix the incremental encoder and decoder for the IDNA encoding.
This closes SF patch #1453235 .
2006-04-14 18:25:39 +00:00
Walter Dörwald
a40cf31de6
Make error message less misleading for u"a..b".encode("idna").
2006-04-14 17:00:36 +00:00
Walter Dörwald
6493699c0d
Make raise statements PEP 8 compatible.
2006-04-14 15:22:27 +00:00
Walter Dörwald
a8da934069
Whitespace.
2006-03-27 09:02:04 +00:00
Hye-Shik Chang
e2ac4abd01
Patch #1443155 : Add the incremental codecs support for CJK codecs.
...
(reviewed by Walter Dörwald)
2006-03-26 02:34:59 +00:00
Guido van Rossum
f8480a7856
Instead of relative imports, use (implicitly) absolute ones.
2006-03-15 23:08:13 +00:00
Tim Peters
f99b8162a2
Whitespace normalization.
2006-03-15 18:08:37 +00:00
Walter Dörwald
13ed60b504
Fix typo.
2006-03-15 13:36:50 +00:00
Walter Dörwald
abb02e5994
Patch #1436130 : codecs.lookup() now returns a CodecInfo object (a subclass
...
of tuple) that provides incremental decoders and encoders (a way to use
stateful codecs without the stream API). Functions
codecs.getincrementaldecoder() and codecs.getincrementalencoder() have
been added.
2006-03-15 11:35:15 +00:00
Guido van Rossum
87de069e4e
Use relative imports in a few places where I noticed the need.
...
(Ideally, all packages in Python 2.5 will use the relative import
syntax for all their relative import needs.)
2006-03-15 04:33:54 +00:00
Martin v. Löwis
5bd7c02298
Avoid forward-declaring the methods array.
...
Rename unicodedata.db* to unicodedata.ucd*
2006-03-10 11:20:04 +00:00
Martin v. Löwis
480f1bb67b
Update Unicode database to Unicode 4.1.
2006-03-09 23:38:20 +00:00
Marc-André Lemburg
fe4b34cc4b
Fix the encodings package codec search function to only search
...
inside its own package. Fixes problem reported in patch #1433198 .
Add codec search function for codec test codec.
2006-02-19 15:22:22 +00:00
Martin v. Löwis
412ed3b8a7
Patch #1177307 : UTF-8-Sig codec.
2006-01-08 10:45:39 +00:00
Tim Peters
536cf99536
Whitespace normalization.
2005-12-25 23:18:31 +00:00
Marc-André Lemburg
d9cf593b49
Cosmetic change: make all hex literals use upper case hex so that they
...
look more like the Unicode Consortium files.
Add ending new-line to all source files.
2005-10-24 12:14:59 +00:00
Marc-André Lemburg
3c72ded23d
Removed the decoding_map from the codecs where this is possible.
...
Replaced the tis_620, cp1140 and koi8_u codecs with new ones
based on custom mapping files.
2005-10-24 12:07:49 +00:00
Marc-André Lemburg
0f00ba8bd8
Replace the old EBCDIC codecs with new ones using the decoding table.
2005-10-21 14:35:35 +00:00
Marc-André Lemburg
7797be7b3b
Alias iso8859_1 to latin_1 which is the same encoding, but has
...
a much faster codec implementation.
2005-10-21 14:02:28 +00:00
Marc-André Lemburg
75c9e8392e
Add a few more Mac OS encodings. The mapping tables for these are
...
available at ftp.unicode.org.
2005-10-21 13:58:32 +00:00
Marc-André Lemburg
a1129f4b9b
Replace the old charmap codecs with new ones generated from the current
...
mapping tables available at ftp.unicode.org.
These new codecs include and use character decoding tables which speeds
up decoding by a few factors.
2005-10-21 13:49:12 +00:00
Walter Dörwald
007f8dfde2
Bug #1245379 : Add "unicode-1-1-utf-7" as an alias for "utf-7" as specified
...
by RFC 1642.
2005-10-09 19:42:27 +00:00
Neal Norwitz
4ce69a5b06
No need to import exceptions, they are builtins
2005-09-01 00:45:28 +00:00
Martin v. Löwis
8b59514e57
Make IDNA return an empty string when the input is empty. Fixes #1163178 .
...
Will backport to 2.4.
2005-08-25 11:03:38 +00:00
Walter Dörwald
729c31f5c3
Reset internal buffers when seek() is called. This fixes SF bug #1156259 .
2005-03-14 19:06:30 +00:00
Walter Dörwald
e1a0391b49
Fix wrong variable name.
2004-12-29 13:11:10 +00:00
Marc-André Lemburg
9ab8818c87
Rearranged mappings to value sorting order.
2004-12-10 21:54:35 +00:00
Walter Dörwald
69652035bc
SF patch #998993 : The UTF-8 and the UTF-16 stateful decoders now support
...
decoding incomplete input (when the input stream is temporarily exhausted).
codecs.StreamReader now implements buffering, which enables proper
readline support for the UTF-16 decoders. codecs.StreamReader.read()
has a new argument chars which specifies the number of characters to
return. codecs.StreamReader.readline() and codecs.StreamReader.readlines()
have a new argument keepends. Trailing "\n"s will be stripped from the lines
if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and
PyUnicode_DecodeUTF16Stateful.
2004-09-07 20:24:22 +00:00
Tim Peters
d1b7827216
Whitespace normalization.
2004-08-07 06:03:09 +00:00
Marc-André Lemburg
c759f070ef
Added new codecs and aliases for ISO_8859-11, ISO_8859-16 and
...
TIS-620.
Closes SF bug #1001895 : Adding missing ISO 8859 codecs, especially Thai.
2004-08-05 12:43:30 +00:00
Tim Peters
c0cbc8611b
Whitespace normalization.
2004-07-31 21:17:37 +00:00
Marc-André Lemburg
17b6d28c64
New codec: [ 996067 ] hp-roman8 codec
2004-07-28 15:37:54 +00:00
Marc-André Lemburg
cd8a4cb3d3
Added new codec hp-roman8 submitted as patch [ 996067 ] hp-roman8 codec.
2004-07-28 15:35:29 +00:00
Hye-Shik Chang
2bb146f2f4
Bring CJKCodecs 1.1 into trunk. This completely reorganizes source
...
and installed layouts to make maintenance simple and easy. And it
also adds four new codecs; big5hkscs, euc-jis-2004, shift-jis-2004
and iso2022-jp-2004.
2004-07-18 03:06:29 +00:00
Tim Peters
4e0e1b6a54
Whitespace normalization.
2004-07-07 20:54:48 +00:00
Martin v. Löwis
708b4dacf4
Convert input to a string object. Fixes #909230 .
...
Backported 2.3.
2004-03-23 23:40:36 +00:00
Hye-Shik Chang
5c5316f111
Add a new unicode codec: ptcp154 (Kazakh)
2004-03-19 08:06:07 +00:00
Marc-André Lemburg
361d66de5d
Fix wrong character mapping in koi8_u: SF bug #902501 .
2004-02-23 09:00:43 +00:00