cpython

Commit Graph

Author	SHA1	Message	Date
Antoine Pitrou	4cfae027b3	Issue #1813 : Fix codec lookup and setting/getting locales under Turkish locales.	2011-07-24 02:51:01 +02:00
Victor Stinner	6c603c4593	test_codecs now removes the temporay file (created by the test)	2011-05-23 16:19:31 +02:00
Ezio Melotti	2623a37852	Merged revisions 86596 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/py3k ........ r86596 \| ezio.melotti \| 2010-11-20 21:04:17 +0200 (Sat, 20 Nov 2010) \| 1 line #9424: Replace deprecated assert* methods in the Python test suite. ........	2010-11-21 13:34:58 +00:00
Antoine Pitrou	cca3a3f396	Issue #8941 : decoding big endian UTF-32 data in UCS-2 builds could crash the interpreter with characters outside the Basic Multilingual Plane (higher than 0x10000).	2010-06-11 21:42:26 +00:00
Georg Brandl	f0757a2937	#8016 : add the CP858 codec (approved by Benjamin). (Also add CP720 to the tests, it was missing there.)	2010-05-24 21:29:07 +00:00
Victor Stinner	7df55dad3b	Issue #6268 : More bugfixes about BOM, UTF-16 and UTF-32 * Fix seek() method of codecs.open(), don't write the BOM twice after seek(0) * Fix reset() method of codecs, UTF-16, UTF-32 and StreamWriter classes * test_codecs: use "w+" mode instead of "wt+". "t" mode is not supported by Solaris or Windows, but does it really exist? I found it the in the issue.	2010-05-22 13:37:56 +00:00
Victor Stinner	262be5e70b	Issue #6268 : Fix seek() method of codecs.open(), don't read the BOM twice after seek(0)	2010-05-22 02:11:07 +00:00
Philip Jenvey	034b0acdd3	fix escape_encode to return the correct consumed size	2010-04-05 02:51:51 +00:00
Florent Xicluna	f4b6186d9c	#691291 : codecs.open() should not convert end of lines on reading and writing.	2010-02-26 10:40:58 +00:00
Ezio Melotti	b0f5adc3f4	use assert[Not]IsInstance where appropriate	2010-01-24 16:58:36 +00:00
Georg Brandl	e9741f3ed8	Issue #6922 : Fix an infinite loop when trying to decode an invalid UTF-32 stream with a non-raising error handler like "replace" or "ignore".	2009-09-17 11:28:09 +00:00
Benjamin Peterson	5c8da86f3a	convert usage of fail* to assert*	2009-06-30 22:57:08 +00:00
Walter Dörwald	a7fb408a02	Issue 3739: The unicode-internal encoder now reports the number of characters consumed like any other encoder (instead of the number of bytes).	2009-05-06 14:28:24 +00:00
Amaury Forgeot d'Arc	5087980c1e	The incremental decoder for utf-7 must preserve its state between calls. Solves issue1460. Might not be a backport candidate: a new API function was added, and some code may rely on details in utf-7.py.	2007-11-20 23:31:27 +00:00
Walter Dörwald	183744d6b9	Fix for #1444 : utf_8_sig.StreamReader was (indirectly through decode()) calling codecs.utf_8_decode() with final==True, which falled with incomplete byte sequences. Fix and test by James G. Sack.	2007-11-19 12:41:10 +00:00
Walter Dörwald	fc7e72d1c6	Fix typo in comment.	2007-11-19 12:14:05 +00:00
Walter Dörwald	6e39080649	Backport r57105 and r57145 from the py3k branch: UTF-32 codecs.	2007-08-17 16:41:28 +00:00
Walter Dörwald	4234827e99	Fix utf-8-sig incremental decoder, which didn't recognise a BOM when the first chunk fed to the decoder started with a BOM, but was longer than 3 bytes.	2007-04-12 10:35:00 +00:00
Walter Dörwald	39b8b6afb5	Change decode() so that it works with a buffer (i.e. unicode(..., 'utf-8-sig')) SF bug #1601501.	2006-11-23 05:03:56 +00:00
Tim Peters	abd8a336a3	Whitespace normalization.	2006-11-03 02:32:46 +00:00
Neal Norwitz	1ead698494	I'm assuming this is correct, it fixes the tests so they pass again	2006-10-29 23:58:36 +00:00
Walter Dörwald	98c70acf47	Add tests for incremental codecs with an errors argument.	2006-10-29 23:02:27 +00:00
Georg Brandl	2c9838e30f	Bug #1586613 : fix zlib and bz2 codecs' incremental en/decoders.	2006-10-29 14:39:09 +00:00
Georg Brandl	5b4e1c2530	Fix the new EncodedFile test to work with big endian platforms.	2006-10-29 09:32:16 +00:00
Georg Brandl	8f99f81dfc	Fix codecs.EncodedFile which did not use file_encoding in 2.5.0, and fix all codecs file wrappers to work correctly with the "with" statement (bug #1586513).	2006-10-29 08:39:22 +00:00
Neal Norwitz	6d3d339d21	Verify the crash due to EncodingMap not initialized does not return	2006-06-13 08:41:06 +00:00
Walter Dörwald	78a0be6ab3	Add a BufferedIncrementalEncoder class that can be used for implementing an incremental encoder that must retain part of the data between calls to the encode() method. Fix the incremental encoder and decoder for the IDNA encoding. This closes SF patch #1453235.	2006-04-14 18:25:39 +00:00
Walter Dörwald	15be5ec100	Call encode()/decode() with final==True as the last call in the incremental codec tests.	2006-04-14 14:03:55 +00:00
Walter Dörwald	9ae019bf5b	Add tests for the C APIs PyCodec_IncrementalEncoder() and PyCodec_IncrementalDecoder().	2006-03-18 14:22:26 +00:00
Walter Dörwald	abb02e5994	Patch #1436130 : codecs.lookup() now returns a CodecInfo object (a subclass of tuple) that provides incremental decoders and encoders (a way to use stateful codecs without the stream API). Functions codecs.getincrementaldecoder() and codecs.getincrementalencoder() have been added.	2006-03-15 11:35:15 +00:00
Walter Dörwald	ca199432c2	If size is specified, try to read at least size characters. This is a alternative version of patch #1379332.	2006-03-06 22:39:12 +00:00
Martin v. Löwis	412ed3b8a7	Patch #1177307 : UTF-8-Sig codec.	2006-01-08 10:45:39 +00:00
Walter Dörwald	690402ff17	Add tests to increase code coverage in Python/codecs.c and Python/exceptions.c.	2005-11-17 18:51:34 +00:00
Walter Dörwald	e22d339dc5	Add tests for various error cases and for readbuffer_encode() and charbuffer_encode(). This increases code coverage in Modules/_codecsmodule.c from 83% to 95%.	2005-11-17 08:52:34 +00:00
Walter Dörwald	d1c1e10f70	Part of SF patch #1313939 : Speedup charmap decoding by extending PyUnicode_DecodeCharmap() the accept a unicode string as the mapping argument which is used as a mapping table. This code isn't used by any of the codecs yet.	2005-10-06 20:29:57 +00:00
Walter Dörwald	a47d1c08d0	SF bug #1251300 : On UCS-4 builds the "unicode-internal" codec will now complain about illegal code points. The codec now supports PEP 293 style error handlers. (This is a variant of the Nik Haldimann's patch that detects truncated data)	2005-08-30 10:23:14 +00:00
Martin v. Löwis	8b59514e57	Make IDNA return an empty string when the input is empty. Fixes #1163178 . Will backport to 2.4.	2005-08-25 11:03:38 +00:00
Walter Dörwald	c9878e1b22	Make attributes and local variables in the StreamReader str objects instead of unicode objects, so that codecs that do a str->str decoding won't promote the result to unicode. This fixes SF bug #1241507.	2005-07-20 22:15:39 +00:00
Walter Dörwald	43148c8413	Update test to the current readline() behaviour.	2005-04-21 21:45:36 +00:00
Walter Dörwald	7a6dc139de	Fix for SF bug #1175396 : readline() will now read one more character, if the last character read is "\r" (and size is None, i.e. we're allowed to call read() multiple times), so that we can return the correct line ending (this additional character might be a "\n"). If the stream is temporarily exhausted, we might return the wrong line ending (if the last character read is "\r" and the next one (after the byte stream provides more data) is "\n", but at least the atcr member ensure that we get the correct number of lines (i.e. this "\n" will not be treated as another line ending.)	2005-04-04 21:38:47 +00:00
Walter Dörwald	729c31f5c3	Reset internal buffers when seek() is called. This fixes SF bug #1156259 .	2005-03-14 19:06:30 +00:00
Walter Dörwald	a9620d1e2b	Fix stupid typo: Don't read from a writer.	2005-02-08 10:10:01 +00:00
Walter Dörwald	1f1d252f51	Add a test for UTF-16 reading where the byte sequence doesn't start with a BOM.	2005-02-04 14:15:34 +00:00
Walter Dörwald	9fa0946771	Fix and test for SF bug #1098990 : codec readline() splits lines apart.	2005-01-10 12:01:39 +00:00
Walter Dörwald	ee1d24703f	Add a test that checks the basic functionality of every encoding.	2004-12-29 16:04:38 +00:00
Walter Dörwald	e57d7b179a	The changes to the stateful codecs in 2.4 resulted in StreamReader.readline() trying to return a complete line even if a size parameter was given (see http://www.python.org/sf/1076985). This leads to buffer overflows with long source lines under Windows if e.g. cp1252 is used as the source encoding. This patch reverts the behaviour of readline() to something that behaves more like Python 2.3: If a size parameter is given, read() is called only once. As a side effect of this, readline() now supports all types of linebreaks supported by unicode.splitlines(). Note that the tokenizer is still broken and it's possible to provoke segfaults (see http://www.python.org/sf/1089395).	2004-12-21 22:24:00 +00:00
Walter Dörwald	063e1e846d	Trigger a few error cases in Modules/_codecsmodule.c.	2004-10-28 13:04:26 +00:00
Hye-Shik Chang	af5c7cff56	SF #1048865 : Fix a trivial typo that breaks StreamReader.readlines()	2004-10-17 23:51:21 +00:00
Walter Dörwald	69652035bc	SF patch #998993 : The UTF-8 and the UTF-16 stateful decoders now support decoding incomplete input (when the input stream is temporarily exhausted). codecs.StreamReader now implements buffering, which enables proper readline support for the UTF-16 decoders. codecs.StreamReader.read() has a new argument chars which specifies the number of characters to return. codecs.StreamReader.readline() and codecs.StreamReader.readlines() have a new argument keepends. Trailing "\n"s will be stripped from the lines if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and PyUnicode_DecodeUTF16Stateful.	2004-09-07 20:24:22 +00:00
Marc-André Lemburg	3f41974525	Add generic codecs.encode() and .decode() APIs that don't impose any restriction on the return type (like unicode.encode() et al. do).	2004-07-10 12:06:10 +00:00

1 2

61 Commits