Serhiy Storchaka
74e449fe6a
Add tests for raw-unicode-escape codec.
2013-01-29 11:39:44 +02:00
Serhiy Storchaka
7277f9d099
Clean up escape-decode decoder tests.
2013-01-29 11:06:28 +02:00
Serhiy Storchaka
c8e58126a2
Issue #16979 : Fix error handling bugs in the unicode-escape-decode decoder.
2013-01-29 10:20:34 +02:00
Serhiy Storchaka
01b3a08f5e
Issue #16975 : Fix error handling bug in the escape-decode decoder.
2013-01-25 23:30:50 +02:00
Serhiy Storchaka
9599745e2c
Issue #14850 : Now a chamap decoder treates U+FFFE as "undefined mapping"
...
in any mapping, not only in an unicode string.
2013-01-15 14:42:59 +02:00
Serhiy Storchaka
c4b82c037e
Issue #11461 : Fix the incremental UTF-16 decoder. Original patch by
...
Amaury Forgeot d'Arc. Added tests for partial decoding of non-BMP
characters.
2013-01-08 23:12:00 +02:00
Antoine Pitrou
e3ae321222
Issue #15379 : Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings).
...
Patch by Serhiy Storchaka.
2012-11-17 21:14:58 +01:00
Antoine Pitrou
715a63b783
Issue #14579 : Fix error handling bug in the utf-16 decoder.
...
Patch by Serhiy Storchaka.
2012-07-21 00:52:06 +02:00
Antoine Pitrou
4cfae027b3
Issue #1813 : Fix codec lookup and setting/getting locales under Turkish locales.
2011-07-24 02:51:01 +02:00
Victor Stinner
6c603c4593
test_codecs now removes the temporay file (created by the test)
2011-05-23 16:19:31 +02:00
Ezio Melotti
2623a37852
Merged revisions 86596 via svnmerge from
...
svn+ssh://pythondev@svn.python.org/python/branches/py3k
........
r86596 | ezio.melotti | 2010-11-20 21:04:17 +0200 (Sat, 20 Nov 2010) | 1 line
#9424 : Replace deprecated assert* methods in the Python test suite.
........
2010-11-21 13:34:58 +00:00
Antoine Pitrou
cca3a3f396
Issue #8941 : decoding big endian UTF-32 data in UCS-2 builds could crash
...
the interpreter with characters outside the Basic Multilingual Plane
(higher than 0x10000).
2010-06-11 21:42:26 +00:00
Georg Brandl
f0757a2937
#8016 : add the CP858 codec (approved by Benjamin). (Also add CP720 to the tests, it was missing there.)
2010-05-24 21:29:07 +00:00
Victor Stinner
7df55dad3b
Issue #6268 : More bugfixes about BOM, UTF-16 and UTF-32
...
* Fix seek() method of codecs.open(), don't write the BOM twice after seek(0)
* Fix reset() method of codecs, UTF-16, UTF-32 and StreamWriter classes
* test_codecs: use "w+" mode instead of "wt+". "t" mode is not supported by
Solaris or Windows, but does it really exist? I found it the in the issue.
2010-05-22 13:37:56 +00:00
Victor Stinner
262be5e70b
Issue #6268 : Fix seek() method of codecs.open(), don't read the BOM twice
...
after seek(0)
2010-05-22 02:11:07 +00:00
Philip Jenvey
034b0acdd3
fix escape_encode to return the correct consumed size
2010-04-05 02:51:51 +00:00
Florent Xicluna
f4b6186d9c
#691291 : codecs.open() should not convert end of lines on reading and writing.
2010-02-26 10:40:58 +00:00
Ezio Melotti
b0f5adc3f4
use assert[Not]IsInstance where appropriate
2010-01-24 16:58:36 +00:00
Georg Brandl
e9741f3ed8
Issue #6922 : Fix an infinite loop when trying to decode an invalid
...
UTF-32 stream with a non-raising error handler like "replace" or "ignore".
2009-09-17 11:28:09 +00:00
Benjamin Peterson
5c8da86f3a
convert usage of fail* to assert*
2009-06-30 22:57:08 +00:00
Walter Dörwald
a7fb408a02
Issue 3739: The unicode-internal encoder now reports the number of *characters*
...
consumed like any other encoder (instead of the number of bytes).
2009-05-06 14:28:24 +00:00
Amaury Forgeot d'Arc
5087980c1e
The incremental decoder for utf-7 must preserve its state between calls.
...
Solves issue1460.
Might not be a backport candidate: a new API function was added,
and some code may rely on details in utf-7.py.
2007-11-20 23:31:27 +00:00
Walter Dörwald
183744d6b9
Fix for #1444 : utf_8_sig.StreamReader was (indirectly through decode())
...
calling codecs.utf_8_decode() with final==True, which falled with incomplete
byte sequences. Fix and test by James G. Sack.
2007-11-19 12:41:10 +00:00
Walter Dörwald
fc7e72d1c6
Fix typo in comment.
2007-11-19 12:14:05 +00:00
Walter Dörwald
6e39080649
Backport r57105 and r57145 from the py3k branch: UTF-32 codecs.
2007-08-17 16:41:28 +00:00
Walter Dörwald
4234827e99
Fix utf-8-sig incremental decoder, which didn't recognise a BOM when the
...
first chunk fed to the decoder started with a BOM, but was longer than 3 bytes.
2007-04-12 10:35:00 +00:00
Walter Dörwald
39b8b6afb5
Change decode() so that it works with a buffer (i.e. unicode(..., 'utf-8-sig'))
...
SF bug #1601501 .
2006-11-23 05:03:56 +00:00
Tim Peters
abd8a336a3
Whitespace normalization.
2006-11-03 02:32:46 +00:00
Neal Norwitz
1ead698494
I'm assuming this is correct, it fixes the tests so they pass again
2006-10-29 23:58:36 +00:00
Walter Dörwald
98c70acf47
Add tests for incremental codecs with an errors
...
argument.
2006-10-29 23:02:27 +00:00
Georg Brandl
2c9838e30f
Bug #1586613 : fix zlib and bz2 codecs' incremental en/decoders.
2006-10-29 14:39:09 +00:00
Georg Brandl
5b4e1c2530
Fix the new EncodedFile test to work with big endian platforms.
2006-10-29 09:32:16 +00:00
Georg Brandl
8f99f81dfc
Fix codecs.EncodedFile which did not use file_encoding in 2.5.0, and
...
fix all codecs file wrappers to work correctly with the "with"
statement (bug #1586513 ).
2006-10-29 08:39:22 +00:00
Neal Norwitz
6d3d339d21
Verify the crash due to EncodingMap not initialized does not return
2006-06-13 08:41:06 +00:00
Walter Dörwald
78a0be6ab3
Add a BufferedIncrementalEncoder class that can be used for implementing
...
an incremental encoder that must retain part of the data between calls
to the encode() method.
Fix the incremental encoder and decoder for the IDNA encoding.
This closes SF patch #1453235 .
2006-04-14 18:25:39 +00:00
Walter Dörwald
15be5ec100
Call encode()/decode() with final==True as the last call in the
...
incremental codec tests.
2006-04-14 14:03:55 +00:00
Walter Dörwald
9ae019bf5b
Add tests for the C APIs PyCodec_IncrementalEncoder() and
...
PyCodec_IncrementalDecoder().
2006-03-18 14:22:26 +00:00
Walter Dörwald
abb02e5994
Patch #1436130 : codecs.lookup() now returns a CodecInfo object (a subclass
...
of tuple) that provides incremental decoders and encoders (a way to use
stateful codecs without the stream API). Functions
codecs.getincrementaldecoder() and codecs.getincrementalencoder() have
been added.
2006-03-15 11:35:15 +00:00
Walter Dörwald
ca199432c2
If size is specified, try to read at least size characters.
...
This is a alternative version of patch #1379332 .
2006-03-06 22:39:12 +00:00
Martin v. Löwis
412ed3b8a7
Patch #1177307 : UTF-8-Sig codec.
2006-01-08 10:45:39 +00:00
Walter Dörwald
690402ff17
Add tests to increase code coverage in Python/codecs.c and Python/exceptions.c.
2005-11-17 18:51:34 +00:00
Walter Dörwald
e22d339dc5
Add tests for various error cases and for readbuffer_encode() and
...
charbuffer_encode(). This increases code coverage in Modules/_codecsmodule.c
from 83% to 95%.
2005-11-17 08:52:34 +00:00
Walter Dörwald
d1c1e10f70
Part of SF patch #1313939 : Speedup charmap decoding by extending
...
PyUnicode_DecodeCharmap() the accept a unicode string as the mapping
argument which is used as a mapping table.
This code isn't used by any of the codecs yet.
2005-10-06 20:29:57 +00:00
Walter Dörwald
a47d1c08d0
SF bug #1251300 : On UCS-4 builds the "unicode-internal" codec will now complain
...
about illegal code points. The codec now supports PEP 293 style error handlers.
(This is a variant of the Nik Haldimann's patch that detects truncated data)
2005-08-30 10:23:14 +00:00
Martin v. Löwis
8b59514e57
Make IDNA return an empty string when the input is empty. Fixes #1163178 .
...
Will backport to 2.4.
2005-08-25 11:03:38 +00:00
Walter Dörwald
c9878e1b22
Make attributes and local variables in the StreamReader str objects instead
...
of unicode objects, so that codecs that do a str->str decoding won't promote
the result to unicode. This fixes SF bug #1241507 .
2005-07-20 22:15:39 +00:00
Walter Dörwald
43148c8413
Update test to the current readline() behaviour.
2005-04-21 21:45:36 +00:00
Walter Dörwald
7a6dc139de
Fix for SF bug #1175396 : readline() will now read one more character, if
...
the last character read is "\r" (and size is None, i.e. we're allowed to
call read() multiple times), so that we can return the correct line ending
(this additional character might be a "\n").
If the stream is temporarily exhausted, we might return the wrong line ending
(if the last character read is "\r" and the next one (after the byte stream
provides more data) is "\n", but at least the atcr member ensure that we
get the correct number of lines (i.e. this "\n" will not be treated as
another line ending.)
2005-04-04 21:38:47 +00:00
Walter Dörwald
729c31f5c3
Reset internal buffers when seek() is called. This fixes SF bug #1156259 .
2005-03-14 19:06:30 +00:00
Walter Dörwald
a9620d1e2b
Fix stupid typo: Don't read from a writer.
2005-02-08 10:10:01 +00:00