cpython

Commit Graph

Author	SHA1	Message	Date
Victor Stinner	7df55dad3b	Issue #6268 : More bugfixes about BOM, UTF-16 and UTF-32 * Fix seek() method of codecs.open(), don't write the BOM twice after seek(0) * Fix reset() method of codecs, UTF-16, UTF-32 and StreamWriter classes * test_codecs: use "w+" mode instead of "wt+". "t" mode is not supported by Solaris or Windows, but does it really exist? I found it the in the issue.	2010-05-22 13:37:56 +00:00
Victor Stinner	262be5e70b	Issue #6268 : Fix seek() method of codecs.open(), don't read the BOM twice after seek(0)	2010-05-22 02:11:07 +00:00
Florent Xicluna	f4b6186d9c	#691291 : codecs.open() should not convert end of lines on reading and writing.	2010-02-26 10:40:58 +00:00
Christian Heimes	1a6387e683	Merged revisions 61750,61752,61754,61756,61760,61763,61768,61772,61775,61805,61809,61812,61819,61917,61920,61930,61933-61934 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/trunk-bytearray ........ r61750 \| christian.heimes \| 2008-03-22 20:47:44 +0100 (Sat, 22 Mar 2008) \| 1 line Copied files from py3k w/o modifications ........ r61752 \| christian.heimes \| 2008-03-22 20:53:20 +0100 (Sat, 22 Mar 2008) \| 7 lines Take One * Added initialization code, warnings, flags etc. to the appropriate places * Added new buffer interface to string type * Modified tests * Modified Makefile.pre.in to compile the new files * Added bytesobject.c to Python.h ........ r61754 \| christian.heimes \| 2008-03-22 21:22:19 +0100 (Sat, 22 Mar 2008) \| 2 lines Disabled bytearray.extend for now since it causes an infinite recursion Fixed serveral unit tests ........ r61756 \| christian.heimes \| 2008-03-22 21:43:38 +0100 (Sat, 22 Mar 2008) \| 5 lines Added PyBytes support to several places: str + bytearray ord(bytearray) bytearray(str, encoding) ........ r61760 \| christian.heimes \| 2008-03-22 21:56:32 +0100 (Sat, 22 Mar 2008) \| 1 line Fixed more unit tests related to type('') is not unicode ........ r61763 \| christian.heimes \| 2008-03-22 22:20:28 +0100 (Sat, 22 Mar 2008) \| 2 lines Fixed more unit tests Fixed bytearray.extend ........ r61768 \| christian.heimes \| 2008-03-22 22:40:50 +0100 (Sat, 22 Mar 2008) \| 1 line Implemented old buffer interface for bytearray ........ r61772 \| christian.heimes \| 2008-03-22 23:24:52 +0100 (Sat, 22 Mar 2008) \| 1 line Added backport of the io module ........ r61775 \| christian.heimes \| 2008-03-23 03:50:49 +0100 (Sun, 23 Mar 2008) \| 1 line Fix str assignement to bytearray. Assignment of a str of size 1 is interpreted as a single byte ........ r61805 \| christian.heimes \| 2008-03-23 19:33:48 +0100 (Sun, 23 Mar 2008) \| 3 lines Fixed more tests Fixed bytearray() comparsion with unicode() Fixed iterator assignment of bytearray ........ r61809 \| christian.heimes \| 2008-03-23 21:02:21 +0100 (Sun, 23 Mar 2008) \| 2 lines str(bytesarray()) now returns the bytes and not the representation of the bytearray object Enabled and fixed more unit tests ........ r61812 \| christian.heimes \| 2008-03-23 21:53:08 +0100 (Sun, 23 Mar 2008) \| 3 lines Clear error PyNumber_AsSsize_t() fails Use CHARMASK for ob_svall access disabled a test with memoryview again ........ r61819 \| christian.heimes \| 2008-03-23 23:05:57 +0100 (Sun, 23 Mar 2008) \| 1 line Untested updates to the PCBuild directory ........ r61917 \| christian.heimes \| 2008-03-26 00:57:06 +0100 (Wed, 26 Mar 2008) \| 1 line The type system of Python 2.6 has subtle differences to 3.0's. I've removed the Py_TPFLAGS_BASETYPE flags from bytearray for now. bytearray can't be subclasses until the issues with bytearray subclasses are fixed. ........ r61920 \| christian.heimes \| 2008-03-26 01:44:08 +0100 (Wed, 26 Mar 2008) \| 2 lines Disabled last failing test I don't understand what the test is testing and how it suppose to work. Ka-Ping, please check it out. ........ r61930 \| christian.heimes \| 2008-03-26 12:46:18 +0100 (Wed, 26 Mar 2008) \| 1 line Re-enabled bytes warning code ........ r61933 \| christian.heimes \| 2008-03-26 13:20:46 +0100 (Wed, 26 Mar 2008) \| 1 line Fixed a bug in the new buffer protocol. The buffer slots weren't copied into a subclass. ........ r61934 \| christian.heimes \| 2008-03-26 13:25:09 +0100 (Wed, 26 Mar 2008) \| 1 line Re-enabled bytearray subclassing - all tests are passing. ........	2008-03-26 12:49:49 +00:00
Georg Brandl	8f99f81dfc	Fix codecs.EncodedFile which did not use file_encoding in 2.5.0, and fix all codecs file wrappers to work correctly with the "with" statement (bug #1586513).	2006-10-29 08:39:22 +00:00
Walter Dörwald	78a0be6ab3	Add a BufferedIncrementalEncoder class that can be used for implementing an incremental encoder that must retain part of the data between calls to the encode() method. Fix the incremental encoder and decoder for the IDNA encoding. This closes SF patch #1453235.	2006-04-14 18:25:39 +00:00
Walter Dörwald	b17f12bbc6	Fix wrong attribute name.	2006-04-14 15:40:54 +00:00
Walter Dörwald	6a7ec7c3e2	Change raise statement to PEP 8 style.	2006-03-18 16:35:17 +00:00
Neal Norwitz	6bed1c1fab	Add some versionadded info to new incremental codec docs and fix doco nits.	2006-03-16 07:49:19 +00:00
Walter Dörwald	abb02e5994	Patch #1436130 : codecs.lookup() now returns a CodecInfo object (a subclass of tuple) that provides incremental decoders and encoders (a way to use stateful codecs without the stream API). Functions codecs.getincrementaldecoder() and codecs.getincrementalencoder() have been added.	2006-03-15 11:35:15 +00:00
Walter Dörwald	ca199432c2	If size is specified, try to read at least size characters. This is a alternative version of patch #1379332.	2006-03-06 22:39:12 +00:00
Tim Peters	536cf99536	Whitespace normalization.	2005-12-25 23:18:31 +00:00
Martin v. Löwis	4ed673877d	Patch #1268314 : Cache lines in StreamReader.readlines for performance. Will backport to Python 2.4.	2005-09-18 08:34:39 +00:00
Walter Dörwald	c5238b8288	SF bug #1235646 : codecs.StreamRecoder.next() now reencodes the data it reads from the input stream, so that the output is a byte string in the correct encoding instead of a unicode string.	2005-09-01 11:56:53 +00:00
Martin v. Löwis	56066d2e55	Return complete lines from codec stream readers even if there is an exception in later lines, resulting in correct line numbers for decoding errors in source code. Fixes #1178484. Will backport to 2.4.	2005-08-24 07:38:12 +00:00
Walter Dörwald	c9878e1b22	Make attributes and local variables in the StreamReader str objects instead of unicode objects, so that codecs that do a str->str decoding won't promote the result to unicode. This fixes SF bug #1241507.	2005-07-20 22:15:39 +00:00
Walter Dörwald	a4eb2d56a4	Fix comment.	2005-04-21 21:42:35 +00:00
Walter Dörwald	bc8e642c1b	If the data read from the bytestream in readline() ends in a '\r' read one more byte, even if the user has passed a size parameter. This extra byte shouldn't cause a buffer overflow in the tokenizer. The original plan was to return a line ending in '\r', which might be recognizable as a complete line and skip any '\n' that was read afterwards. Unfortunately this didn't work, as the tokenizer only recognizes '\n' as line ends, which in turn lead to joined lines and SyntaxErrors, so this special treatment of a split '\r\n' has been dropped. (It can only happen with a temporarily exhausted bytestream now anyway.) Fixes parts of SF bugs #1163244 and #1175396.	2005-04-21 21:32:03 +00:00
Walter Dörwald	714f87821f	Fix typos.	2005-04-04 21:42:22 +00:00
Walter Dörwald	7a6dc139de	Fix for SF bug #1175396 : readline() will now read one more character, if the last character read is "\r" (and size is None, i.e. we're allowed to call read() multiple times), so that we can return the correct line ending (this additional character might be a "\n"). If the stream is temporarily exhausted, we might return the wrong line ending (if the last character read is "\r" and the next one (after the byte stream provides more data) is "\n", but at least the atcr member ensure that we get the correct number of lines (i.e. this "\n" will not be treated as another line ending.)	2005-04-04 21:38:47 +00:00
Skip Montanaro	9f5f9d943d	typo	2005-03-16 03:51:56 +00:00
Walter Dörwald	71fd90da87	Add default value for "whence" argument.	2005-03-14 19:25:41 +00:00
Walter Dörwald	729c31f5c3	Reset internal buffers when seek() is called. This fixes SF bug #1156259 .	2005-03-14 19:06:30 +00:00
Martin v. Löwis	e2713becd8	Build with --disable-unicode again. Fixes #1158607 . Will backport to 2.4.	2005-03-08 15:03:08 +00:00
Walter Dörwald	9fa0946771	Fix and test for SF bug #1098990 : codec readline() splits lines apart.	2005-01-10 12:01:39 +00:00
Walter Dörwald	e57d7b179a	The changes to the stateful codecs in 2.4 resulted in StreamReader.readline() trying to return a complete line even if a size parameter was given (see http://www.python.org/sf/1076985). This leads to buffer overflows with long source lines under Windows if e.g. cp1252 is used as the source encoding. This patch reverts the behaviour of readline() to something that behaves more like Python 2.3: If a size parameter is given, read() is called only once. As a side effect of this, readline() now supports all types of linebreaks supported by unicode.splitlines(). Note that the tokenizer is still broken and it's possible to provoke segfaults (see http://www.python.org/sf/1089395).	2004-12-21 22:24:00 +00:00
Hye-Shik Chang	af5c7cff56	SF #1048865 : Fix a trivial typo that breaks StreamReader.readlines()	2004-10-17 23:51:21 +00:00
Walter Dörwald	69652035bc	SF patch #998993 : The UTF-8 and the UTF-16 stateful decoders now support decoding incomplete input (when the input stream is temporarily exhausted). codecs.StreamReader now implements buffering, which enables proper readline support for the UTF-16 decoders. codecs.StreamReader.read() has a new argument chars which specifies the number of characters to return. codecs.StreamReader.readline() and codecs.StreamReader.readlines() have a new argument keepends. Trailing "\n"s will be stripped from the lines if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and PyUnicode_DecodeUTF16Stateful.	2004-09-07 20:24:22 +00:00
Marc-André Lemburg	d594849c42	Ignore sizehint argument. Fixes SF #844561 .	2004-02-26 15:22:17 +00:00
Walter Dörwald	7f3ed74643	Fix typos.	2003-02-02 23:08:27 +00:00
Neal Norwitz	6ec0a8ab93	sys was already imported, remove second import	2002-12-30 23:36:02 +00:00
Marc-André Lemburg	b28de0d79f	Patch to make _codecs a builtin module. This is necessary since Python 2.3 will support source code encodings which rely on the builtin codecs being available to the parser. Remove struct dependency from codecs.py	2002-12-12 17:37:50 +00:00
Walter Dörwald	7f82f7955e	Add missing documentation for the PEP 293 functionality to the codecs docstrings.	2002-11-19 21:42:53 +00:00
Walter Dörwald	4dbf192f2b	Add next() and __iter__() methods to StreamReader, StreamReaderWriter and StreamRecoder. This closes SF bug #634246.	2002-11-06 16:53:44 +00:00
Walter Dörwald	3aeb632c31	PEP 293 implemention (from SF patch http://www.python.org/sf/432401 )	2002-09-02 13:14:32 +00:00
Walter Dörwald	474458da48	Add constants BOM_UTF8, BOM_UTF16, BOM_UTF16_LE, BOM_UTF16_BE, BOM_UTF32, BOM_UTF32_LE and BOM_UTF32_BE that represent the Byte Order Mark in UTF-8, UTF-16 and UTF-32 encodings for little and big endian systems. The old names BOM32_* and BOM64_* were off by a factor of 2. This closes SF bug http://www.python.org/sf/555360	2002-06-04 15:16:29 +00:00
Raymond Hettinger	54f0222547	SF 563203. Replaced 'has_key()' with 'in'.	2002-06-01 14:18:47 +00:00
Martin v. Löwis	b786e61b1c	Set default value for readlines.sizehint to None. Change needed for 2.2.1 as well.	2002-03-05 15:46:38 +00:00
Marc-André Lemburg	aa32c5aa7c	Added new helpers for easy access to codecs. Docs will follow.	2001-09-19 11:24:48 +00:00
Andrew M. Kuchling	97c56357b1	Fix typo in comment	2001-09-18 20:29:48 +00:00
Martin v. Löwis	02d893cfae	Patch #444359 : Remove unused imports.	2001-08-02 07:15:29 +00:00
Martin v. Löwis	6cd441d129	Add dead imports of modules that are "magically" imported.	2001-07-31 08:54:55 +00:00
Tim Peters	3a2ab1ab69	Whitespace normalization.	2001-05-29 06:06:54 +00:00
Marc-André Lemburg	716cf91839	Moved the encoding map building logic from the individual mapping codec files to codecs.py and added logic so that multi mappings in the decoding maps now result in mappings to None (undefined mapping) in the encoding maps.	2001-05-16 09:41:45 +00:00
Tim Peters	30324a7363	Just changed "x,y" to "x, y" everywhere (i.e., inserted horizontal space after commas that didn't have any).	2001-05-15 17:19:16 +00:00
Skip Montanaro	e99d5ea25b	added __all__ lists to a number of Python modules added test script and expected output file as well this closes patch 103297. __all__ attributes will be added to other modules without first submitting a patch, just adding the necessary line to the test script to verify more-or-less correct implementation.	2001-01-20 19:54:20 +00:00
Tim Peters	88869f9787	Whitespace normalization.	2001-01-14 23:36:06 +00:00
Marc-André Lemburg	a866df806d	This patch changes the default behaviour of the builtin charmap codec to not apply Latin-1 mappings for keys which are not found in the mapping dictionaries, but instead treat them as undefined mappings. The patch was originally written by Martin v. Loewis with some additional (cosmetic) changes and an updated test script by Marc-Andre Lemburg. The standard codecs were recreated from the most current files available at the Unicode.org site using the Tools/scripts/gencodec.py tool. This patch closes the bugs #116285 and #119960.	2001-01-03 21:29:14 +00:00
Andrew M. Kuchling	c6c2838403	(Patch #102698 ) Fix for a bug reported by Wade Leftwich: StreamReader ignores the 'errors' parameter passed to its constructor	2000-12-10 15:12:14 +00:00
Fred Drake	d254c0095c	Remove redundent information from a docstring.	2000-10-02 22:11:47 +00:00

1 2

61 Commits