Commit Graph

43 Commits

Author SHA1 Message Date
Neal Norwitz 44dab0ab2f Whitespace normalization 2007-04-25 06:42:41 +00:00
Walter Dörwald 93a3603c67 Backport r54786:
Fix utf-8-sig incremental decoder, which didn't recognise a BOM when the
first chunk fed to the decoder started with a BOM, but was longer than 3 bytes.
2007-04-21 10:31:43 +00:00
Walter Dörwald 9ff1d39402 Backport checkin:
Change decode() so that it works with a buffer (i.e. unicode(..., 'utf-8-sig'))
SF bug #1601501.
2006-11-23 05:06:31 +00:00
Georg Brandl f96b162b68 I thought I had already fixed this error in the test. 2006-10-29 15:22:43 +00:00
Georg Brandl c68d2cc3f2 Bug #1586613: fix zlib and bz2 codecs' incremental en/decoders.
(backport from rev. 52529)
2006-10-29 14:39:13 +00:00
Georg Brandl b8205a1188 Fix the new EncodedFile test to work with big endian platforms.
(backport from rev. 52527)
2006-10-29 09:32:19 +00:00
Georg Brandl 2a5a3027f2 Fix codecs.EncodedFile which did not use file_encoding in 2.5.0, and
fix all codecs file wrappers to work correctly with the "with"
statement (bug #1586513).
 (backport from rev. 52517)
2006-10-29 08:39:27 +00:00
Neal Norwitz 6d3d339d21 Verify the crash due to EncodingMap not initialized does not return 2006-06-13 08:41:06 +00:00
Walter Dörwald 78a0be6ab3 Add a BufferedIncrementalEncoder class that can be used for implementing
an incremental encoder that must retain part of the data between calls
to the encode() method.

Fix the incremental encoder and decoder for the IDNA encoding.

This closes SF patch #1453235.
2006-04-14 18:25:39 +00:00
Walter Dörwald 15be5ec100 Call encode()/decode() with final==True as the last call in the
incremental codec tests.
2006-04-14 14:03:55 +00:00
Walter Dörwald 9ae019bf5b Add tests for the C APIs PyCodec_IncrementalEncoder() and
PyCodec_IncrementalDecoder().
2006-03-18 14:22:26 +00:00
Walter Dörwald abb02e5994 Patch #1436130: codecs.lookup() now returns a CodecInfo object (a subclass
of tuple) that provides incremental decoders and encoders (a way to use
stateful codecs without the stream API). Functions
codecs.getincrementaldecoder() and codecs.getincrementalencoder() have
been added.
2006-03-15 11:35:15 +00:00
Walter Dörwald ca199432c2 If size is specified, try to read at least size characters.
This is a alternative version of patch #1379332.
2006-03-06 22:39:12 +00:00
Martin v. Löwis 412ed3b8a7 Patch #1177307: UTF-8-Sig codec. 2006-01-08 10:45:39 +00:00
Walter Dörwald 690402ff17 Add tests to increase code coverage in Python/codecs.c and Python/exceptions.c. 2005-11-17 18:51:34 +00:00
Walter Dörwald e22d339dc5 Add tests for various error cases and for readbuffer_encode() and
charbuffer_encode(). This increases code coverage in Modules/_codecsmodule.c
from 83% to 95%.
2005-11-17 08:52:34 +00:00
Walter Dörwald d1c1e10f70 Part of SF patch #1313939: Speedup charmap decoding by extending
PyUnicode_DecodeCharmap() the accept a unicode string as the mapping
argument which is used as a mapping table.

This code isn't used by any of the codecs yet.
2005-10-06 20:29:57 +00:00
Walter Dörwald a47d1c08d0 SF bug #1251300: On UCS-4 builds the "unicode-internal" codec will now complain
about illegal code points. The codec now supports PEP 293 style error handlers.
(This is a variant of the Nik Haldimann's patch that detects truncated data)
2005-08-30 10:23:14 +00:00
Martin v. Löwis 8b59514e57 Make IDNA return an empty string when the input is empty. Fixes #1163178.
Will backport to 2.4.
2005-08-25 11:03:38 +00:00
Walter Dörwald c9878e1b22 Make attributes and local variables in the StreamReader str objects instead
of unicode objects, so that codecs that do a str->str decoding won't promote
the result to unicode. This fixes SF bug #1241507.
2005-07-20 22:15:39 +00:00
Walter Dörwald 43148c8413 Update test to the current readline() behaviour. 2005-04-21 21:45:36 +00:00
Walter Dörwald 7a6dc139de Fix for SF bug #1175396: readline() will now read one more character, if
the last character read is "\r" (and size is None, i.e. we're allowed to
call read() multiple times), so that we can return the correct line ending
(this additional character might be a "\n").

If the stream is temporarily exhausted, we might return the wrong line ending
(if the last character read is "\r" and the next one (after the byte stream
provides more data) is "\n", but at least the atcr member ensure that we
get the correct number of lines (i.e. this "\n" will not be treated as
another line ending.)
2005-04-04 21:38:47 +00:00
Walter Dörwald 729c31f5c3 Reset internal buffers when seek() is called. This fixes SF bug #1156259. 2005-03-14 19:06:30 +00:00
Walter Dörwald a9620d1e2b Fix stupid typo: Don't read from a writer. 2005-02-08 10:10:01 +00:00
Walter Dörwald 1f1d252f51 Add a test for UTF-16 reading where the byte sequence doesn't start with
a BOM.
2005-02-04 14:15:34 +00:00
Walter Dörwald 9fa0946771 Fix and test for SF bug #1098990: codec readline() splits lines apart. 2005-01-10 12:01:39 +00:00
Walter Dörwald ee1d24703f Add a test that checks the basic functionality of every encoding. 2004-12-29 16:04:38 +00:00
Walter Dörwald e57d7b179a The changes to the stateful codecs in 2.4 resulted in StreamReader.readline()
trying to return a complete line even if a size parameter was given (see
http://www.python.org/sf/1076985). This leads to buffer overflows with long
source lines under Windows if e.g. cp1252 is used as the source encoding.
This patch reverts the behaviour of readline() to something that behaves more
like Python 2.3: If a size parameter is given, read() is called only once.

As a side effect of this, readline() now supports all types of linebreaks
supported by unicode.splitlines().

Note that the tokenizer is still broken and it's possible to provoke segfaults
(see http://www.python.org/sf/1089395).
2004-12-21 22:24:00 +00:00
Walter Dörwald 063e1e846d Trigger a few error cases in Modules/_codecsmodule.c. 2004-10-28 13:04:26 +00:00
Hye-Shik Chang af5c7cff56 SF #1048865: Fix a trivial typo that breaks StreamReader.readlines() 2004-10-17 23:51:21 +00:00
Walter Dörwald 69652035bc SF patch #998993: The UTF-8 and the UTF-16 stateful decoders now support
decoding incomplete input (when the input stream is temporarily exhausted).
codecs.StreamReader now implements buffering, which enables proper
readline support for the UTF-16 decoders. codecs.StreamReader.read()
has a new argument chars which specifies the number of characters to
return. codecs.StreamReader.readline() and codecs.StreamReader.readlines()
have a new argument keepends. Trailing "\n"s will be stripped from the lines
if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and
PyUnicode_DecodeUTF16Stateful.
2004-09-07 20:24:22 +00:00
Marc-André Lemburg 3f41974525 Add generic codecs.encode() and .decode() APIs that don't impose
any restriction on the return type (like unicode.encode() et al. do).
2004-07-10 12:06:10 +00:00
Tim Peters 27f883687b Whitespace normalization. 2004-07-08 04:22:35 +00:00
Martin v. Löwis a1dde13389 Add test case for unicode(somestring, "idna"). 2004-03-24 16:48:24 +00:00
Walter Dörwald 21d3a32b99 Combine the functionality of test_support.run_unittest()
and test_support.run_classtests() into run_unittest()
and use it wherever possible.

Also don't use "from test.test_support import ...", but
"from test import test_support" in a few spots.

From SF patch #662807.
2003-05-01 17:45:56 +00:00
Tim Peters 0eadaac7dc Whitespace normalization. 2003-04-24 16:02:54 +00:00
Martin v. Löwis b5c4b7be3f Skip nameprep test 3.43, as we do allow unassigned characters. The test
fails only in UCS-2 mode, since it tests a non-BMP character.
2003-04-18 20:21:00 +00:00
Martin v. Löwis 2548c730c1 Implement IDNA (Internationalized Domain Names in Applications). 2003-04-18 10:39:54 +00:00
Marc-André Lemburg 29273c87da Fix for [ 543344 ] Interpreter crashes when recoding; suggested
by Michael Stone (mbrierst).

Python 2.1.4, 2.2.2 candidate.
2003-02-04 19:35:03 +00:00
Walter Dörwald 8709a420c4 Check whether a string resize is necessary at the end
of PyString_DecodeEscape(). This prevents a call to
_PyString_Resize() for the empty string, which would
result in a PyErr_BadInternalCall(), because the
empty string has more than one reference.

This closes SF bug http://www.python.org/sf/603937
2002-09-03 13:53:40 +00:00
Barry Warsaw 04f357cffe Get rid of relative imports in all unittests. Now anything that
imports e.g. test_support must do so using an absolute package name
such as "import test.test_support" or "from test import test_support".

This also updates the README in Lib/test, and gets rid of the
duplicate data dirctory in Lib/test/data (replaced by
Lib/email/test/data).

Now Tim and Jack can have at it. :)
2002-07-23 19:04:11 +00:00
Fred Drake 2e2be3760c Change the PyUnit-based tests to use the test_main() approach. This
allows using the tests with unittest.py as a script.  The tests will
still run when run as a script themselves.
2001-09-20 21:33:42 +00:00
Marc-André Lemburg a37171dd86 Test by Martin v. Loewis for the new UTF-16 codec handling of BOM
marks.
2001-06-19 20:09:28 +00:00